post

October 2023

Why event proofs? Why Starky?

by Anton Yezhov, Denis Kanonik, Alex Rusnak and Maru Team

We would like to thank Daniel Lubarov and Brendan Farmer from PolygonZero, Sebastien La Duca from Nocturne Labs, David Wong from zksecurity.xyz for reviewing and providing their valuable feedback

TL;DR

We present “event proofs” - a type of data proof zk proof construction using event logs.

The importance of data proofs and their use cases, particularly in cross-chain interactions and historical data for on-chain applications is tremendous. We explore the difference between storage and event logs as a valuable source of historical data in Ethereum and their role in cross-chain transfers, incentives mechanisms, decentralized exchanges, and voting systems.

The main challenge is that provable data requires extensive resources to bring it on-chain. Zero knowledge proofs are beneficial for such purpose compressing cryptographic commitments of large data sets into a one tiny commitment which is easy to verify on-chain. Maru Network leverages zk-STARKs, Starky in particular, to provide efficient on-chain access to data and compute over it, releasing the testnet with volumes proofs for Curve 3pool.

Introduction

There are over 18 millions of blocks (in Ethereum), but smart contracts only have access to the latest one without options to query events that happened earlier. Moreover smart contracts are not aware about events in other chains. Such awareness is crucial for dApps to have trustless pricing, risk analysis, or to settle contractual terms that depend on past events, volumes as an example.

Ethereum blockchain consists of 3 types of data:

  • State, which holds accounts and smart contracts balances, in addition to contract storage root hashes;
  • Transaction, which encapsulates State change, either by directly transferring ETH between accounts or interacting with a smart contract (including smart contract deployment);
  • Receipt, which provides insight into contract execution by exposing event logs.

All three types mentioned above are cryptographically committed into the blockchain database. State is updated by the chain of transactions, while transactions and receipts represent historical state transitions. However, the EVM only operates over the current state - contracts cannot access data from historical state transitions, i.e. receipts and transactions.

Virtually every dApp uses off-chain indexers / oracles to feed their dashboards with historical data, but their contracts are completely oblivious of this information.

Figure 2. AAVE lending “Supply APR” - data and graphics are queried by off-chain indexers.

While users could trust the data mentioned on the figures above (using web apps), on-chain applications (other smart contracts) cannot. They require some form of “proof”.

Storage Proofs

One of the greatest features of Ethereum (and most of all other L1s) is verifiable historical data. Technically smart contracts are able to get trustless access to any already accepted Ethereum block using cryptographic commitments called Merkle Patricia Trie (MPT). Each block header contains MPT root for the state, transactions and receipts and every data piece is a leaf on one of these tries. Leaves are hashed into branches up to the tree root, just like a pyramid. Merkle inclusion proof, also widely called a “storage proof”, is used to verify that some data piece is included in the corresponding MPT.

Storage proofs started to gain their popularity due to high demand for blockchains interoperability and necessity to use for cross-chain interactions (bridging assets is the most obvious example). If the chain X runs the light client of the chain Y, verifying merkle proof against already computed blockhash opens trustless data exchange. It is a largely described and used concept by Cosmos ecosystem with their IBC (Inter-Blockchain Communication Protocol) to bridge messages across chains using on-chain light clients.

Storage Limitations in Blockchains

While storage proofs solve our trustless access problem when it comes to state data, this is only a fraction of the data on-chain. This is especially true because on-chain storage is extremely expensive, so smart contract developers do everything in their power to avoid putting data into storage. Which leads to a small subset of the data we’d like to know things about.

For instances, Uniswap v2 pools stores in their slots next variables:

// uses single storage slot, accessible via getReserves
uint112 private reserve0;

// uses single storage slot, accessible via getReserves
uint112 private reserve1;

// uses single storage slot, accessible via getReserves
uint32  private blockTimestampLast;

uint public price0CumulativeLast;

uint public price1CumulativeLast;

// reserve0 * reserve1, as of immediately after 
// the most recent liquidity event
uint public kLast; 

The individual trades, volume, parties involved, etc are all missing. The variables above represent only the reserves quantity at the end of the block (a state snapshot, not a sequence of events), so with corresponding storage proofs won’t be enough to determine the value of an individual swap transaction or a number of swaps.

The same applies for Curve pools:

balances: public(uint256[N_COINS])
fee: public(uint256)  # fee * 1e10
admin_fee: public(uint256)  # admin_fee * 1e10
…
initial_A: public(uint256)
future_A: public(uint256)
initial_A_time: public(uint256)
future_A_time: public(uint256)

This data does not allow, for example, filtering trades by users (or smart contracts) that made a specific volume. And does not provide us with information about a specific trade if it happened at all in this block, as this storage represents only a snapshot for a particular timestamp (block time) without taking into account whether there were swaps in it or not.

Storage Proofs aren’t “atomic”

A storage read is a “snapshot” of a particular state. Oftentimes we don’t care about the state, we care about what happened. Each snapshot is taken after all transactions in a block have been executed. Therefore, in one block, there can be many trades (including those opposing each other) and other transactions (such as adding and withdrawing liquidity), and it is impossible to separate what happened inside each individual trade by only examining the storage slots.

There are other storage proof use cases widely discussed in the community like ownership: NFTs or fungible assets. Voting based on the ownership of the assets is discussed in Starkware article “What are Storage Proofs and how can they improve Oracles?” touches the point about cross-chain voting and bridges. But in fact, most of the bridges (see Figure 5) or voting systems require atomic transactions (like vote or asset bridging), which means they rely on the fact that is final, rather than valid upon some block height.

Figure 5. Cross-chain bridge.

In other words storage use cases mentioned in this article are useful only for a certain state root and may be changed after (assets sent to the bridge smart contract may be returned, or the voter may send his assets to another address making his vote invalid). Furthermore, in case of bridges storage slots usually represent vaults (where all assets are stored together) which makes it impossible to identify the particular user without considering his transaction or receipt to this slot.

Figure 6. TWAP formula..

In case of TWAP (time-weighted average price) calculations storage represents only the residual pool assets balances at the end of the block and not the swap/trade prices during this or any previous blocks. In other words, pool storage proofs represent AMM quotes (the price automated market maker indicates for the current ratio of two assets in the pool) at the end of the block.

Event logs (receipts)

As Ethereum is a highly resource-constrained environment and storage is very expensive, this brought forth using event logs as the only convenient way to explain what happened and what actually the storage represents. Transaction receipt contains transaction outcomes (status and logs) to indicate the user that the event occurred during the transaction execution. A very simple example is ERC-20 transfer, where the result of the transaction is a new balances state in storage slots, while the trigger that some amount has been sent from A to B is defined in logs.

Events, just like storage slots and transactions, are stored in an MPT (in the receipt trie). Therefore we can prove events commitments the same way as storage slots described in the previous section.

Events are the de-facto way of getting data out of a blockchain.

The Ethereum yellow paper baked-in a notion of “topics” and “logger address” that allow developers to easily tag logs with information that is relevant to their applications and filter them in their frontends and APIs.

The simple Swap event from the Uniswap v2 contract looks like (using Solidity): looks like (using Solidity):

event Swap(
    address indexed sender,
    uint amount0In,
    uint amount1In,
    uint amount0Out,
    uint amount1Out,
    address indexed to
);

The contract event descriptions are part of the ABI of the contract and this allows them to be used transparently to filter swap operations by tokens in the pair, the address of the user (or smart contract) who sent and/or received tokens, as well as by the smart contract address of the pair. Each unitary exchange or trade operation is stored as an event in the block logs and is associated with the transaction in which it was created. This information is stored forever on the blockchain and can be verified by cryptographic linkage to block hashes.

That’s why events have been the primary tool for developers to “send” historical data to code running off-chain. Furthermore the primary, best-supported APIs for getting data out of a node in pretty much every client library - web3.js, ethers, viem, etc - are APIs for fetching and/or subscribing to events and the same applies for every indexing stack - including subgraphs, hosted services like Goldsky, Nxyz, Proxima streams etc.

Event proofs use cases

A smart contract’s ABI - the primary way that off-chain code makes sense of the pile of bytes that exists in a full node’s database - do not define the storage layout. Instead, they provide event signatures, including complete types and names for each component of the event.

For example this code from the smart contract source:

event TokenExchange:
    buyer: indexed(address)
    sold_id: int128
    tokens_sold: uint256
    bought_id: int128
    tokens_bought: uint256

Has a clear, understandable and unambiguous structure. Event data, according to the standards of the Ethereum network, is encoded in transaction receipts in the form of an RLP sequence according to the structure described above and is easily verifiable.

Taking as an example the event given above we are able to state first of all that such swap has occurred and is final (“atomic”), so its details are non-changeable any more, in comparison to the storage slot that represents the statement upon the exact state root or the block height and is subject to change in the next blocks. So using only one such event we are able to state the trade volume, price and the buyer address.

The most obvious use cases for event proofs are:

  • cross-chain - different types of message transfers: like asset bridging, proposals and votes etc;
  • TWAP - price oracle case where we are often willing to consider the weighted (by volumes) average of a particular asset pair. In comparison with storage slot proofs (described in the Storage Proofs section), event proofs represent each swap/trade separately with corresponding volume and price;
  • DEX pool volumes or DEX overall volumes for a given time range (block range);
  • margin, liquidations and funding rates compute for lending, derivatives and perps;
  • user activity tracking;
  • voting systems and DAO management (age-weighted / behavior weighted);
  • incentives and reward protocols;
  • computing Black-Scholes for options pricing.

Volumes, for example, is the core parameter used in traditional finance to evaluate markets and their activity. Similarly, in the realm of blockchain, trading pools are evaluated using the same volume metric. The distinction lies in the process of calculating this volume, which involves filtering trades (swaps in only certain pools).

Event proof challenges

The main challenge in general for any type of on-chain proofs is the size and verification cost. If the dApp wants to verify even the small event proof from the previous block - it requires to verify that the receipt is included in the Merkle proof and then compute a block header in order to check the inclusion into a blockhash. All this means a lot of computation and a huge amount of call data.

One tree usually contains up to 200 transaction receipts. And in the case on Figure 7, we have 3 rounds of Keccak (hash function) to compute the merkle proof. However, there are blocks containing more transactions (for example, block 17763419 contains 480 transactions with 5 Keccak rounds). Thus, the average number of Keccak hashes required to prove a trade in a block is 3 or 5. Furthermore an additional round of Keccak (in which the Receipt Root is added to the block header, then serialized with other data and hashed by Keccak) is required to compute the blockhash.

Figure 7. Ethereum Receipt Merkle Patricia Trie.

Several Keccak rounds don’t seem to be complex or expensive operations if they are used once, but what to do when you have a historical dataset?

Let’s look at Uniswap USDC/ETH v3 pool volumes - in order to compute only a daily volume user needs to provide around 15k trades with corresponding merkle proofs and then compute the verification of them on-chain - which is impossible due to limitations of the EVM.

Figure 8. Uniswap USDC/ETH v3 all time token transfers.

This is where zero knowledge proofs are used, moving this huge compute to the off-chain side and only verification of a tiny result with corresponding proof on-chain (for example groth16 proof verification cost starts from 230k gas). Existence of data in the specified block is proved via merkle inclusion, compute (the sum) via arithmetic and data type (correspondence to a swap for example) via search/indexing proofs. Aggregating these proofs opens a way for trustless access to historical activity that exists inside the chain. This proof is verified on-chain and uses the data for dApp needs.

The verification of such proofs requires to have a public inputs. In the volumes case we have to provide the sources (blocks) where trades happened and the output volume number. Thus we still have to send several thousands hashes into call data - which makes verification mostly impossible.

In order to solve public input size usually MMR (Merkle Mountain Range) and cryptographic interconnectedness of blocks. MMR performs hashing of blockhashes on-chain, so the new derived hash is stored in a smart contract storage and considered as trusted. When a user provides a proof for the corresponding hash of blocks range the verification uses only one hash and proved data.

Figure 9. Merkle Mountain Range..

Another approach is to prove the range between two blocks via their cryptographic connection which means to prove not only data inclusion but also inclusion of a block hash into the header of the next one.

Figure 10. Blocks cryptographic interconnections..

How does Maru use events to prove Curve 3pool volumes?

On Curve Finance, $CRV the inflation is going to users who provide liquidity. This usage is measured with gauges. The liquidity gauge measures how many dollars you have provided in a Curve pool. Each Curve pool has its own liquidity gauge where you can stake your liquidity provider tokens. Each gauge also has a weight and a type. Those weights represent how much of the daily CRV inflation will be received by the liquidity gauge.

The weight systems allow the Curve DAO to dictate where the $CRV inflation should go. While it is fully dependent on voting, it lacks considering the activity inside the pools (the volumes). Because volumes create initial fees flow to LPs and DAO, it is important to consider them while deciding the incentives for gauges to better compete on liquidity markets.

Figure 11. Events indexing, computing and proving.

Maru facilitates such trustless data access for Curve 3pool volumes by indexing trades, computing their sum and proving such operations, while verification is done on-chain in the smart contract in Goerli testnet. Using event proofs is essential for such purposes to create a provable sequence of historical transactions and provide compute over them.

Figure 12. Maru Testnet

Maru Proofs

While there are different approaches and proof systems today, Maru has decided to use Starky (Plonky2 with STARKs) to build our computing platform.

We generate proofs of requested block range to prove volume of Curve 3pool. Our circuits prove the integrity of Keccak, volume summation that includes arithmetic addition operating on 256 bit values, data integrity from receipts, transactions and integrated application logic. We have the following components for Curve 3pool Volume: Sum proof, Search proof, Data proof, Keccak proof, Keccak sponge proof, Logic proof and Arithmetic proof.

Each STARK, such as DataStark, SumStark, SearchStark. . . etc. are implemented as a sub STARK proof of the AllStark structure, that performs each proof verification as a specific part of volume proving of trade pool by using Merkle Patricia Tree from Ethereum Node. The structure allow to three things:

  • Check the correct integration between circuits via the shared cross table lookups, to verify that certain data is shared between tables.
  • Allow having a one structure of proofs for which proofs can be generated and verified if we build regular proofs.
  • Allow having a single structure for which a proof can be generated and verified recursively using aggregation. The main idea is to aggregate each child STARK proof to one proof that we will name ”root proof” in terms of SNARK and verify it.

In public inputs, we include the counted volume as total_sum, the parent block hash of the first block header for the desired period as starting_blockhash, and the last block hash as ending_blockhash for volume proving. Other data required for proof generation is placed in private inputs. Additionally, we have the option to recursively aggregate the two primary STARK proofs to optimize both proof generation speed and verification cost.

Detailed specification is described in Maru gitbook

Figure 13. Maru high-level architecture.

Maru tech STARKs

There are two main compelling zero-knowledge technologies in the market today: zk-STARKs and zk-SNARKs. Both are acronyms for the method by which the two parties prove their knowledge: zk-STARK stands for zero-knowledge scalable transparent argument of knowledge, and zk-SNARK stands for zero-knowledge succinct non-interactive argument of knowledge. Furthermore, both of these zero-knowledge technologies are non-interactive by nature, meaning the code can be deployed and act autonomously.

Starky is a highly performant STARK implementation built along the Plonky2 SNARK proving system. It is as easy to understand as it gets in the ZK world and already adopted rather novel features such as Cross-Table Lookups. Tight integration with Plonky2 allows a lot of freedom in aggregating proofs and proof compression and CTLs provide a standardized way to link data inside different proofs together, greatly reducing proof complexity. From the development point of view Starky reduces time and complexity, allowing even small development teams tackle really hard problems. From an execution speed point of view, Starky is not only insanely fast by ZK standards, it has internal synergies with Plonky2 and combined performance even better than for each of the systems alone.

let local_values: &XORColumnsView<P> = vars.local_values.borrow();
let bits_diff: P = local_values.A + local_values.B
 - local_values.A.doubles() * local_values.B;
yield_constr.constraint(bits_diff - local_values.AxorB);

Table 1. CTLs example.

Maru adopted Starky as proof structure is a table, often interpreted as one row - one step of a process and a set of columns is the state at the particular moment. Constraints are pretty straightforward and put them all together in an intuitive way. Only non-intuitive thing is that constraints should be stated twice if you want Plonky2 recursion to work. In general CTLs drastically reduce complexity of writing proofs as parts of proofs in different tables can be connected together, the number of columns in each table plummeted as the amount of work and focus needed to maintain state. And with clever division of tables each can be more or less independently maintained and swapped to find a better approach to each particular problem. Basically Starky made a number of huge steps from purely research oriented development towards general industrial programming methods, allowing to get faster, more business-focused results and enabling a full spectrum of development techniques.

Figure 14. Keccak implementation on Halo2 (by Axiom), Plonky2 (by JumpCrypto) and Starky (by Maru) comparison from Keccak circuit benchmark.

With the sheer amount of advancements in available methodology mentioned above, one may expect significant performance drawbacks. Two different proving systems atop of each other definitely introduce a bit of latency, as the proof generation process always involves two steps. But on the other hand by clever manipulation of internal parameters (namely blowup factor) and use of Plonky advantages like SNARK-friendly hashing, in most cases proving time is much faster than with conventional one-stage prover. Figure 14 clearly illustrates that.

Conclusion

Events as the main source of useful publicly committed information in Ethereum ledger will be used to optimize existing dApps use cases and create new ones to better serve the customers in Web3 world. Maru leverages Starky, an efficient STARK implementation, to streamline data availability and simplicity to use it on-chain. For a long time ZK toolset was quite barren and anything beyond simplest snippets was an enormous endeavor, but with novel features like CTLs we finally dare to start working on complex and fruitful applications.

Maru Testnet

Maru Docs

Testnet Sign up

References

  1. Ethereum (ETH) Blockchain Explorer, https://etherscan.io/. Accessed 13 October 2023.

  2. AAVE App v3 https://app.aave.com/reserve-overview/?underlyingAsset=0x7f39c581f595b53c5cb19bd0b3f8da6c935e2ca0&marketName=proto_mainnet_v3. Accessed 13 October 2023.

  3. Ethereum.org, Merkle Patricia Trie https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/ Accessed 13 October 2023

  4. Inter-Blockchain Communication Protocol https://tutorials.cosmos.network/academy/3-ibc/1-what-is-ibc.html Accessed 13 October 2023

  5. Uniswap v2 pools https://github.com/Uniswap/v2-core/blob/master/contracts/UniswapV2Pair.sol#L22 Accessed 13 October 2023

  6. Curve pools https://github.com/curvefi/curve-contract/blob/master/contracts/pools/3pool/StableSwap3Pool.vy#L93C1-L104C31Accessed 13 October 2023

  7. “What are Storage Proofs and how can they improve Oracles?” https://www.starknet.io/en/posts/developers/what-are-storage-proofs-and-how-can-they-improve-oracles Accessed 13 October 2023

  8. TWAP formula https://en.wikipedia.org/wiki/Time-weighted_average_price Accessed 13 October 2023

  9. Uniswap v2 contract https://github.com/Uniswap/v2-core/blob/master/contracts/interfaces/IUniswapV2Pair.sol#L26 Accessed 13 October 2023

  10. Curve pool contract https://github.com/curvefi/curve-contract/blob/master/contracts/pools/3pool/StableSwap3Pool.vy#L14C1-L19C27 Accessed 13 October 2023

  11. Proxima streams https://docs.proxima.one/proxima/streams/overview Accessed 13 October 2023

  12. Etherscan block 17763419 https://etherscan.io/block/17763419 Accessed 13 October 2023

  13. Ethereum Receipt Merkle Patricia Trie https://flow.com/engineering-blogs/ethereum-merkle-patricia-trie-explained Accessed 13 October 2023

  14. Uniswap USDC/ETH v3 https://etherscan.io/address/0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640#analytics Accessed 13 October 2023

  15. Merkle Mountain Range. https://ethglobal.com/showcase/kinetex-light-clients-mmr-uvzr1 Accessed 13 October 2023

  16. Blocks cryptographic interconnections https://www.researchgate.net/figure/Blocks-are-cryptographically-linked-together-Blockchains-are-collection-of-blocks-A_fig1_341788606 Accessed 13 October 2023

  17. Maru gitbook https://maru-2.gitbook.io/maru/zk-implementation/schema Accessed 13 October 2023

  18. Keccak circuit benchmark. https://github.com/proxima-one/keccak-circuit-benchmarks/blob/master/short_description.pdf Accessed 13 October 2023