Deep Dive into Ethereum Logs
Logs in Ethereum are very useful since they are the underlying
mechanism for events, which is the recommended way for smart contracts
to communicate with the outside world. They also provide a
significantly cheaper way to store data on-chain.
As an example, the following code snippet from the UniswapV2ERC20.sol contract emits a Transfer event whenever a transfer of tokens takes place.
For a more detailed description of the Ethereum logs, including the concept of indexes/topics, data and cost, etc, highly recommend Luit Hollander’s excellent blog post Understanding event logs on the Ethereum blockchain. This blog post will instead focus on how logs are actually implemented in Ethereum, specifically in Geth.
When an event is emitted from a smart contract, one of the LOG opcodes will be generated by the smart contract compiler. Events can have maximum 4 topics/indexes, which is why there are 5 LOG opcodes (LOG0, LOG1, …, LOG4) defined. The Transfer event in the above example has three indexes/topics in total: two explicit indexes from and to, and one implicit index which is the hash of the event signature (see Luit’s post), in this case Transfer(address, address, uint). As the result, the generated opscode is LOG3.
LOG opcode is then interpreted by the makeLog(size int) function, which basically reads the indexes as well as a blob of arbitrary data from the stack, uses them to construct a log object and then adds it to StateDB. StateDB is an EVM data structure that stores anything within the merkle trie, and is the general query interface to retrieve contracts and accounts. It’s used to store data during transaction execution, including logs.
When a transaction is executed using the applyTransaction function, the generated transaction receipt contains logs that are retreived from StateDB together with a bloom filter that statistically keeps track of all the topics of the log as well as the address of the contract that emits the logs. The reason to use a bloom filter here is to strike a balance between query efficiency and storage space. If a user wants to search for logs with a set of topics, these topics can be run against the bloom filter to probabilistically determine if any log match the search criteria exist in this transaction receipt. False positives are removed by checking with the search criteria again.
When a block is created in the NewBlock function, it creates a bloom filter in the block header which essentially combines the bloom filters of all the transaction receipts included in this block. This allows quick check of whether any logs with matching criteria exists in this block.
When a block is persisted using writeBlockWithState function, it persists all transaction receipts with logs per block. The format of the key value pair is blockReceiptsPrefix + num (uint64 big endian) + hash -> block receipts.
There are a few ways to consume the emmited events. After a
transaction is mined into the blockchain,
eth_getTransactionReceipt
can be called with the transaction hash to get the transaction
receipt, which
contains a logs
field that carries an array of log objects that this
transaction generated. DApp frontends can also leverage the
combination of the
eth_newFilter and
eth_getFilterChanges
method to poll the new events of interest. Web3.js abstracts this away
by providing the
subscribe(“log”)
method to help real time consumption of the incoming logs using the
subscription model.
Let’s look at these JSON-RPCs in more details:
- eth_getTransactionReceipt. Return the transaction receipt for a transaction id. Since transaction receipts are stored per block, this basically fetches all the receipts for the block that this transaction belongs to and only return the requested one. As mentioned before, logs are part of the transaction receipt.
- eth_newFilter & eth_getFilterChanges. Return logs that match the search criteria such as the block range, contract address as well as topics. This leverages the bloom filter both in the block header as well as in the transaction receipt to determine which logs should be returned. False positives are filtered again using the search criteria. The filter logic can be found here. There are also some index optimization so that it doesnt have to iterate through each blocks in the block range while searching for the matching logs.
Hope this blog post has shed some light on how logging is implemented Ethereum. The overall design feels pretty elegant and efficient. One tradeoff here is how much data should be indexed on-chain v.s off-chain. Ethereum decided to have max 4 topics per log, perhaps a good balance between searchability and storage, but one can make an legitimate argument that indexes should be pushed more to the application layer since data is more understood there and it is easier to make changes afterwards.
Reference: