RLP (Recursive-Length Prefix)
How the Blockchain Development Kit (BDK) uses RLP to encode and decode transactions.
Transactions coming from the network are (de)serialized through a standard called Recursive-Length Prefix, also known as RLP and used extensively by Ethereum. As per their definition:
"RLP standardizes the transfer of data between nodes in a space-efficient-format. The only purpose of RLP is to encode structure and arbitrarily nested arrays of binary data."
Rules for decoding
First of all, the first byte of the data string defines what exactly the serialized string is storing. This can be broken down as follows:
A:
byte < 0x80
means the value is the byte itselfe.g.
0x55
means the value is exactly0x55
B:
0x80 > byte < 0xb7
means the value is between 0 and 55 bytes, calculated by subtracting0x80
from ite.g.
0x90
means the next 16 bytes after0x90
contains the payload for that index (0x90 - 0x80 = 0x10 -> 16 bytes in decimal
)0x900102030405060708090a0b0c0d0e0f10
-> value is the next 16 bytes after0x90
, which would be0x0102030405060708090a0b0c0d0e0f10
C:
0xb7 > byte < 0xc0
means the value is longer than 55 bytes, calculated by subtracting0xb7
from ite.g.
0xb9012c4e6bffdadac55ac51ec4718aeb6ea75db805d673c3b98128ff02558106170b7e66f38c573131bc0a1a0826fa90b7063a66a6a5f9a793d2295f874769349ffda6fc30e9154a787dbd604524497214cf97f1b56997107429718f6cc92804698972664af8a8c782da4f32a83ce20db38bf09f01a662fa598435a80780912daaa7e4a37be2feabcb90de0b717128c199cf8d18e31fa7bffd
0xb9 - 0xb7 = 0x02 -> 2 bytes in decimal
-> size is the next 2 bytes after0xb9
, which would be0x012c = 300 bytes
-> value is the next 300 bytes after0x012c
D:
byte > 0xc0
means the value is a list of 0 to 55 bytes, calculated by subtracting0xc0
from ite.g.
0xc50000000000
->0xc5 - 0xc0 = 0x05 -> 5 bytes in decimal
-> value is the next 5 bytes after0xc5
, which would be0x0000000000
This is not a valid list, as we don't use lists lower than 55 bytes when deserializing a transaction, since transactions are always larger than 55 bytes
E:
byte > 0xf7
means the value is a list longer than 55 bytes, calculated by subtracting0xf7
from ite.g.
0xf8a90c8504a817c80082c160944fabb145d64652a948d72533023f6e7a623c7c5380b844a9059cbb0000000000000000000000006b71dcaa3fb9a4901491b748074a314dad9e980b000000000000000000000000000000000000000000000029e7ab336ae0b5000025a0ef2f3450e6860289dce618af68ebc7d518c3cb3ea4d1641cb2fe7c7251ff31d4a0540dcf1500630a1b0d0d0670eee012e2cf2c64cf3288d122e0efb0d3deb0340f
0xf8 - 0xf7 = 0x01 -> 1 byte in decimal
-> size is the next 1 byte after0xf7
, which would be0xa9 = 169 bytes
-> value is the next 169 bytes after0xa9
Decoding a transaction
Let's take an example of a serialized and signed transaction and decode it:
f8a90c8504a817c80082c160944fabb145d64652a948d72533023f6e7a623c7c5380b844a9059cbb0000000000000000000000006b71dcaa3fb9a4901491b748074a314dad9e980b000000000000000000000000000000000000000000000029e7ab336ae0b5000025a0ef2f3450e6860289dce618af68ebc7d518c3cb3ea4d1641cb2fe7c7251ff31d4a0540dcf1500630a1b0d0d0670eee012e2cf2c64cf3288d122e0efb0d3deb0340f
We process the first byte:
0xf8
- this tells us the RLP payload is a list0xf8 - 0xf7 = 0x01 -> 1 byte
- the size of the payload is the next 1 byte (0xa9
)0xa9 = 169 bytes
-> the payload is the next 169 bytes
When serializing a transaction to the database, we also append the from
address to the end of the string, OUTSIDE of the list so it won't mess up the transaction data. Keep that in mind when deserializing from the database.
From here, we'll decode the rest of the data, in this order (following after 0xf8a9
):
1 - Nonce -> can be either a single byte (rule A) or a payload under 55 bytes (rule B)
0x0c
-> less than0x80
, thus rule A -> nonce is0x0c = 12 in decimal
2 - Gas Price -> always a payload under 55 bytes (rule B)
0x85
->0x85 - 0x80 = 0x05 -> 5 bytes
-> gas price is0x04a817c800 = 20000000000 in decimal (uint256)
3 - Gas Limit -> always a payload under 55 bytes (rule B), because the minimum gas limit for an Ethereum transaction is 21000
0x82
->0x82 - 0x80 = 0x02 -> 2 bytes
-> gas limit is0xc160 = 49504 in decimal (uint256)
4 - Receiver address ("to") -> always a payload under 55 bytes (rule B), because we know an address is always 20 bytes
0x94
->0x94 - 0x80 = 0x14 -> 20 bytes
-> receiver address is0x4fabb145d64652a948d72533023f6e7a623c7c53
5 - Value -> always a payload under 55 bytes (rule B)
0x80
->0x80 - 0x80 = 0 bytes
-> value is 0
6 - Arbitrary data -> can be either a value between 0 and 55 (rule B) or longer (rule C)
0xb8
-> more than0xb7
, thus rule C ->0xb8 - 0xb7 = 1 byte
-> size is0x44 = 68 bytes
-> data is0xa9059cbb0000000000000000000000006b71dcaa3fb9a4901491b748074a314dad9e980b000000000000000000000000000000000000000000000029e7ab336ae0b50000
7 - ECDSA signature recovery ID ("v") -> can be either a single byte (rule A) or a payload under 55 bytes (rule B)
0x25
-> less than0x80
, thus rule A -> v is0x25 = 37 in decimal
8 - ECDSA signature first half ("r") -> always a payload under 55 bytes (rule B)
0xa0
-0xa0 - 0x80 = 0x20 -> 32 bytes
-> r is0xef2f3450e6860289dce618af68ebc7d518c3cb3ea4d1641cb2fe7c7251ff31d4
9 - ECDSA signature second half ("s") -> always a payload under 55 bytes (rule B)
0xa0
-0xa0 - 0x80 = 0x20 -> 32 bytes
-> s is0x540dcf1500630a1b0d0d0670eee012e2cf2c64cf3288d122e0efb0d3deb0340f
Last updated