Deep Dive into Ethereum Logs

Logs in Ethereum are very useful since they are the underlying mechanism for events, which is the recommended way for smart contracts to communicate with the outside world. They also provide a significantly cheaper way to store data on-chain.

logs

As an example, the following code snippet from the UniswapV2ERC20.sol contract emits a Transfer event whenever a transfer of tokens takes place.

event Transfer(address indexed from, address indexed to, uint value);

function _transfer(address from, address to, uint value) private {
  balanceOf[from] = balanceOf[from].sub(value);
  balanceOf[to] = balanceOf[to].add(value);
  emit Transfer(from, to, value);
}

For a more detailed description of the Ethereum logs, including the concept of indexes/topics, data and cost, etc, highly recommend Luit Hollander’s excellent blog post Understanding event logs on the Ethereum blockchain. This blog post will instead focus on how logs are actually implemented in Ethereum, specifically in Geth.

When an event is emitted from a smart contract, one of the LOG opcodes will be generated by the smart contract compiler. Events can have maximum 4 topics/indexes, which is why there are 5 LOG opcodes (LOG0, LOG1, …, LOG4) defined. The Transfer event in the above example has three indexes/topics in total: two explicit indexes from and to, and one implicit index which is the hash of the event signature (see Luit’s post), in this case Transfer(address, address, uint). As the result, the generated opscode is LOG3.

LOG opcode is then interpreted by the makeLog(size int) function, which basically reads the indexes as well as a blob of arbitrary data from the stack, uses them to construct a log object and then adds it to StateDB. StateDB is an EVM data structure that stores anything within the merkle trie, and is the general query interface to retrieve contracts and accounts. It’s used to store data during transaction execution, including logs.

func makeLog(size int) executionFunc {
	return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
        // ... skip ...
		for i := 0; i < size; i++ {
			addr := stack.pop()
			topics[i] = addr.Bytes32()
		}

        // ... skip ...
		interpreter.evm.StateDB.AddLog(&types.Log{
			Address: scope.Contract.Address(),
			Topics:  topics,
			Data:    d,
			// This is a non-consensus field, but assigned here because
			// core/state doesn't know the current block number.
			BlockNumber: interpreter.evm.Context.BlockNumber.Uint64(),
		})
        // ... skip ...
	}
}

When a transaction is executed using the applyTransaction function, the generated transaction receipt contains logs that are retreived from StateDB together with a bloom filter that statistically keeps track of all the topics of the log as well as the address of the contract that emits the logs. The reason to use a bloom filter here is to strike a balance between query efficiency and storage space. If a user wants to search for logs with a set of topics, these topics can be run against the bloom filter to probabilistically determine if any log match the search criteria exist in this transaction receipt. False positives are removed by checking with the search criteria again.

func applyTransaction(msg types.Message, config *params.ChainConfig, bc ChainContext, author *common.Address, gp *GasPool, statedb *state.StateDB, blockNumber *big.Int, blockHash common.Hash, tx *types.Transaction, usedGas *uint64, evm *vm.EVM) (*types.Receipt, error) {
    // ... skip ...

	// Set the receipt logs and create the bloom filter.
	receipt.Logs = statedb.GetLogs(tx.Hash(), blockHash)
	receipt.Bloom = types.CreateBloom(types.Receipts{receipt})
	receipt.BlockHash = blockHash
	receipt.BlockNumber = blockNumber

    // ... skip ...
}

When a block is created in the NewBlock function, it creates a bloom filter in the block header which essentially combines the bloom filters of all the transaction receipts included in this block. This allows quick check of whether any logs with matching criteria exists in this block.

func NewBlock(header *Header, txs []*Transaction, uncles []*Header, receipts []*Receipt, hasher TrieHasher) *Block {
    // ... skip ...

	if len(receipts) == 0 {
		b.header.ReceiptHash = EmptyRootHash
	} else {
		b.header.ReceiptHash = DeriveSha(Receipts(receipts), hasher)
		b.header.Bloom = CreateBloom(receipts)
	}

    // ... skip ...
}

When a block is persisted using writeBlockWithState function, it persists all transaction receipts with logs per block. The format of the key value pair is blockReceiptsPrefix + num (uint64 big endian) + hash -> block receipts.

func (bc *BlockChain) writeBlockWithState(block *types.Block, receipts []*types.Receipt, logs []*types.Log, state *state.StateDB) error {
    // ... skip ...

	rawdb.WriteBlock(blockBatch, block)
	rawdb.WriteReceipts(blockBatch, block.Hash(), block.NumberU64(), receipts)
	rawdb.WritePreimages(blockBatch, state.Preimages())

    // ... skip ...
}

There are a few ways to consume the emmited events. After a transaction is mined into the blockchain, eth_getTransactionReceipt can be called with the transaction hash to get the transaction receipt, which contains a logs field that carries an array of log objects that this transaction generated. DApp frontends can also leverage the combination of the eth_newFilter and eth_getFilterChanges method to poll the new events of interest. Web3.js abstracts this away by providing the subscribe(“log”) method to help real time consumption of the incoming logs using the subscription model.

Let’s look at these JSON-RPCs in more details:

eth_getTransactionReceipt. Return the transaction receipt for a transaction id. Since transaction receipts are stored per block, this basically fetches all the receipts for the block that this transaction belongs to and only return the requested one. As mentioned before, logs are part of the transaction receipt.
eth_newFilter & eth_getFilterChanges. Return logs that match the search criteria such as the block range, contract address as well as topics. This leverages the bloom filter both in the block header as well as in the transaction receipt to determine which logs should be returned. False positives are filtered again using the search criteria. The filter logic can be found here. There are also some index optimization so that it doesnt have to iterate through each blocks in the block range while searching for the matching logs.

Hope this blog post has shed some light on how logging is implemented Ethereum. The overall design feels pretty elegant and efficient. One tradeoff here is how much data should be indexed on-chain v.s off-chain. Ethereum decided to have max 4 topics per log, perhaps a good balance between searchability and storage, but one can make an legitimate argument that indexes should be pushed more to the application layer since data is more understood there and it is easier to make changes afterwards.

Reference:

Written on January 22, 2022