I will keep updating the post to make it easier to understand, ideally with some animated graphs. Explaining it is much more challenging than studying it myself. Let me know on twitter if you find anything incorrect or have any suggestion.

It’s been two days since a hacker stole 2 million BNB from the Binance Bridge. There are already some vulnerability analyses around. The earliest PoC was posted by @samczsun.

As someone unfamiliar with blockchain data structures, I stared at samczsun’s PoC for an hour and failed to understand how it works. After studying the basics and testing the IAVL, I finally have better understanding of the vulnerability and the exploit. With this post, I share what I have learned so far and hope to help newbies like me understand this interesting attack.

### What’s IAVL?

To understand this vulnerability, you don't really need to understand IAVL, or the AVL tree. You only need to know it's a balanced binary tree that is designed to be immutable. It is used in many blockchains, including the Bitcoin, to guarantee that no one can forge the data inside a block very efficiently. To that end, it is more importantly for you to understand the concept of the Merkle tree.

The above shows a much-simplified version of the Merkle tree. Each node contains a

`leftHash`

and a `rightHash`

, which are set to the hash of the corresponding child node. In this way, any bit of data change in a node propagates upward to the root node, resulting in a totally different root hash. When a new transaction is added onto the blockchain, the transaction data is appended to the tree as a new node. The node hash values along the tree, including the root hash, are re-calculated. When the block is chained (with Proof-of-Whatever), the root hash is included, thus making the tree immutable.

To update a particular node, the path from the node to the root is included for everyone else to verify the step-by-step hash calculations. This subtree is the

`proof`

. For instance, to update node 7, we need to provide updated hash values for (4, 6, 7). Anyone attempts to update the tree without changing the root hash must provide a `proof`

that can convince others. By the design of Merkle tree, this is impossible. However, a bug in IAVL’s proof code makes this possible.### IAVL’s Range Proof Algorithm

In BSC’s implementation, the

*range proof*of IAVL is used, which supports proofing multiple leaves at once. The detail of the concept is well explained in IAVL’s doc. I drew some diagrams to illustrate the*range proof*algorithm implemented in IAVL.The

`RangeProof`

is a subtree containing every path from target leaf nodes to the root. The nodes only contains information useful for hash validation. Let’s say we want to verify node 7 in our first tree diagram, then we first have a leaf proof node `p7`

which contains the hash of data node 7. Each node along the path to the root (4, 6) has a corresponding `InnerNode`

that contains a left hash and a right hash (in IAVL, only leaf nodes contain data hash). In the proof tree, if the node has a right child, then the `Right`

field is set to `nil`

, since the hash should be computed from the child, which means the value is not useful for validation. The root hash is calculated by `p4.Left`

and hash of `p6`

. Yes, the proof tree is different from the original Merkle tree where the `Right`

is the hash of the `p7`

. In the second example, we want to prove two leaves (5, 7) at once. The

`RangeProof`

tree contains two paths. The algorithm starts from the leftmost path (p4, p6, p5). It then drops a node in the path from the bottom each time and checks if the new tail node is linked has right descendent. If so, it recursively verifies the subtree with the right child node as a tree root. Basically, it verifies paths from left to right. In this case, if `p7`

is altered, `p6.Right`

changes, which affects the hash of `p6`

, and thus the root hash. Everything is perfectly secure, right?### Understand the Bug

Hold on. Remember in the first tree where the proof is only for

`p7`

? `p6.Right`

is `nil`

since it’s not needed, the subtree is on the left. What if the attacker changes the value of `Right`

? Does the hash of `p6`

changes because of that? The node hash calculation logic of proof trees can be summarized as follows:

- If
`Left`

is`nil`

, it means there’s a subtree on the right, left hash is calculated from the child. Right hash use the`Right`

value directly, which is verified to be the root hash of the subtree on the right.

- Otherwise, the subtree is on the left. The left hash is just
`Left`

(had been verified as the root hash of the subtree), the right hash is calculated from the child node.

Can you spot the problem?

When the

`Left`

is not `nil`

(the “Otherwise” branch), the `Right`

is assumed to be `nil`

and calculated from `p7`

, so the value of `Right`

doesn’t matter. That means we can insert a leave node or a subtree and set its hash to `Right`

, which yields a valid subtree. This is shown in the third tree. The subtree in red is forged but valid by itself. Data in green is untouched. Looking at it, the hash of

`p6`

should be different to reflect the change of `Right`

, but it doesn’t. You may notice something strange in the forged proof tree:

`p6`

already has a left child `p5`

and a right child `p7`

. How can it have another right child `p8`

? Isn’t it a binary tree?Remember that the proof tree is only for validation? There’s no enforcement that the actual tree that stores the data has the same structure as the proof tree. So, after implanting the forged proof node

`p8`

, the attacker can insert the forged node 8 as the right child of node 7 in the original tree. While the structure doesn’t match, the proof verification algorithm doesn’t care. It only checks if the proof tree is valid and the corresponding leave node is in the range (`p8`

corresponding to node 8 is one of the leaves in the `RangeProof`

tree). Even though it implicitly means that `p6`

has three child nodes, the proof is valid.### Q&A

- What is the line
`forgedLeafNode.Key[13] = 255`

in samczsun’s PoC for?

The place of

`forgedLeafNode`

depends on its `Key`

. Set the last byte of the `Key`

to `FF`

makes sure that forged node correctly ordered on the right of the original one. Otherwise, the program will fail to retrieve the node using binary search. Try to change `255`

to `1`

and see what happens.- Is the choice of the legit-proof-to-forge arbitrary?

No. For example, you cannot use the PoC to forge node 5 in the first tree diagram. Since the proof path of node 5 results in

`p6`

with `Left = nil`

, the code would not get into the vulnerable branch. The node to forge need to be on the right branch.- Why inserting a
`InnerNode`

with the value`nil`

in the PoC? According to the design of`RangeProof`

data structure, the tree is represented by a`LeftPath`

,`InnerNodes`

, and`Leaves`

. The tree with a new forged right leaf results in a new path, thus a new`InnerNode`

. The value is`nil`

since nodes`p4`

and`p6`

in the path have been traversed (as`InnerNode`

for`p7`

). It is actually not adding any node to the tree.

### References

bsc-hack-analysis-2022-10-06

emilianobonassi • Updated Jun 9, 2023