Database

How Blockchain Development Kit's (BDK) internal database is structured and how data is stored in it.

BDK validators use an in-disk database for storing data about themselves and other nodes in the network, such as block/transaction data, contract data, node metadata, etc. Depending on the component (e.g. Storage), they might have their own database folder reserved just for them.

The database itself is an abstraction of a Speedb database - a simple key/value database, but handled in a different way: keys use prefixes, which makes it possible to batch read and write, so we can get around the "simple key/value" limitation and divide data into sectors.

The database requires a filesystem path to open it (if it already exists) or create it on the spot (if it doesn't exist) during construction. It closes itself automatically on destruction. Optionally, it also accepts a bool for enabling compression (disabled by default), if needed.

Content in the database is stored as raw bytes. This is due to space optimizations, as one raw byte equals two UTF-8 characters (e.g. an address like 0x1234567890123456789012345678901234567890, ignoring the "0x" prefix, occupies 20 raw bytes - "12 34 56 ..." - , but 40 bytes if converted to a string, since each byte becomes two separate characters - "1 2 3 4 5 6 ...").

For the main CRUD operations, refer to the has(), get(), put() and del() functions. Due to how the database works internally, updating an entry is the same as inserting a different value in a key that already exists, effectively replacing the value that existed before (e.g. put(oldKey, newValue)). There's also a few other helper functions such as:

  • getBatch() and putBatch() for batched operations

  • getKeys() for fetching only the database's keys

  • keyFromStr() for encapsulating a key into a Bytes object

  • getLastByPrefix() for getting the last value stored in a given prefix

  • makeNewPrefix() for concatenating prefixes when necessary

Structs and Prefixes

We have three helper structs to ease database manipulation:

  • DBServer - struct that contains the host and version of the database that will be connected to

  • DBEntry - struct that contains an entry to be inserted or read by the database, and has only two members: key and value, both strings

  • DBBatch - struct that contains multiple DBEntrys to be inserted and/or deleted all at once

We also have a DBPrefix namespace to reference the database's prefixes in a simpler way:

Descriptor
Prefix

blocks

0x0001

heightToBlock

0x0002

nativeAccounts

0x0003

txToBlock

0x0004

rdPoS

0x0005

contracts

0x0006

contractManager

0x0007

events

0x0008

vmStorage

0x0009

txToAdditionalData

0x000A

txToCallTrace

0x000B

Those prefixes are concatenated to the start of the key, so an entry that would have, for example, a key named "abc" and a value of "123", if inserted to the "0003" prefix, would be like this inside the database (in raw bytes format, strings here are just for the sake of the explanation): {"0003abc": "123"}

Prefixes Overview

blocks

Used to store serialized blocks based on their hashes.

Key
Value

Prefix + BlockHash

Serialized Block

heightToBlock

Used to store block hashes based on their heights.

Key
Value

Prefix + Padded uint64_t BlockHeight

BlockHash

Padded means that the uint64_t is padded with 0's to the left to ensure a fixed length of 8 bytes.

nativeAccounts

Used to store serialized native accounts ("balance + nonce") based on their addresses.

Key
Value

Prefix + Address

Serialized NativeAccount

Serialization for a native account goes like this: requiredBytes(balance) + bytes(balance) + requiredBytes(nonce) + bytes(nonce).

For example, an account with balance 1000000 and nonce 2 would be serialized as 03 + 0f4240 + 01 + 02.

An account with balance 0 and nonce 0 would be serialized as 0000.

txToBlock

Used to store block hashes, the tx indexes within that block and the block heights, based on their transaction hashes.

Key
Value

Prefix + TransactionHash

BlockHash + Padded uint32_t BlockIndex + Padded uint64_t BlockHeight

Padded means that the uint32_t and uint64_t are padded with 0's to the left to ensure a fixed length of 4 and 8 bytes respectively.

rdPoS

Used to store Validator addresses based on their index within the rdPoS list.

Key
Value

Prefix + Padded uint64_t ValidatorIndex

Address

contracts

Used in a multitude of ways, where an "additional" prefix is used per contract (the contract address).

Key
Value

Prefix + Contract Address + "contractName"

The Contract Class Name

Prefix + Contract Address + "contractAddress

The Contract Address

Prefix + Contract Address + "contractCreator"

The Contract Creator Address

Prefix + Contract Address + "contractChainId"

The Contract ChainId

Contracts are free to use the contract prefix as they wish, but the following is the default structure for the contract prefix:

Key
Value

Prefix + Contract Address + "VARIABLE_NAME_AS_STRING"

The Contract Variable to store in DB

contractManager

Used to store a contract class name based on their address.

Key
Value

Prefix + Contract Address

Contract Class Name

events

Used to store events emitted from contracts.

Key
Value

Prefix + Padded uint64_t Block Index + Padded uint64_t Tx Index + Padded uint64_t Log Index + ContractAddress

Event data serialized as a JSON string

vmStorage

Used to store EVM-related stuff like storage keys and values (essentially the EVM's "permanent memory").

Key
Value

Address (20 bytes) + Storage Key (32 bytes)

Storage Value (32 bytes)

txToAdditionalData

Used to store EVM transactions that created contracts, storing the transaction hash and additional data about the contract that was deployed (see the TxAdditionalData struct in utils/tx.h for more details).

Key
Value

Prefix + TransactionHash

Serialized TxAdditionalData struct

txToCallTrace

Used to store debugging information about contract calls - see the Call struct in contract/calltracer.h for more details.

Key
Value

Prefix + TransactionHash

Serialized Call struct

Last updated