Verifiable Compute: Scaling Trust with Cryptography

May 9th, 2024

Verifiable compute is poised to become as important of a technological revolution as blockchain itself. It encompasses the commercialization of various cryptographic techniques that can be collectively as impactful for blockchains as HTTPS has been for the internet. While many technological hurdles and barriers to adoption exist, the density of sheer brain power pushing its limits points to an optimistic (or rather verifiable) future.

Verifiable compute and blockchains aren’t only groundbreaking technologies in their own right—they’re also uniquely complimentary. Where blockchains provide economically-guaranteed integrity by having multiple untrusted machines re-execute the same programs and agree on the end state, verifiable compute provides mathematically-guaranteed integrity by proving that some end state is the result of a set of inputs into some program.

Together, these two emerging technologies are poised to expand the design space of what is possible with blockchains, solve the issues present in today’s blockchain designs, and address broader societal concerns that arise from the proliferation of generative AI.

In this piece I will:

Provide an overview and history of verifiable compute.
Outline the use cases for verifiable compute.
Introduce a framework for evaluating verifiable compute projects.

Note to the reader: there will be several abbreviations used throughout—if you are unfamiliar with these terms then you can reference the definitions at the end of the piece.

A Technology 90 Years in the Making

While verifiable compute is becoming an increasingly important tool for scaling blockchains, it is far from a new concept. We can trace its roots all the way back to the commercialization of the internet in the early 90s and, in fact, mathematical concepts even further back to the 1930s.

Kurt Gödel can be considered the grandfather of arithmetization, a key component of verifiable compute protocols. He figured out in 1931, before computers really existed, that you can efficiently encode complex mathematical expressions, logical proofs, or even entire programs into a numerical form. In other words, that’s right, you can turn code into math. While Gödel numbering played a foundational role in the development of mathematical concepts related to verifiable compute, the term as we know it today emerged because of two technological developments following the commercialization of the internet.

The first was distributed computation. Several projects emerged in the late 1990s and early 2000s that experimented with this idea, namely the Mersenne prime search in 1996, SETI@Home in 1999, and Folding@Home in 2000, all of which distributed computations to millions of internet clients to take advantage of their idle cycles. While the idea was novel, an issue that all of those projects faced was dishonest clients—end users could modify their client software to return plausible results without performing any actual work.

The second development was cloud computing. As cloud computing became more mainstream in the late 2000s, applications began outsourcing computation to commercial cloud computing services and increasingly wanted guarantees that the cloud performed the computation correctly to protect against faulty or malicious providers.

Before we continue, here is a quick refresher (or introduction) on computational complexity. Ideally we want constant or logarithmic complexity because that is when it becomes commercially viable at scale.

Time Complexities Graph from Stackademic

The initial cryptographic techniques for addressing these issues were introduced around the late 80s and early 90s, namely Interactive Proofs (IPs) in 1985 and (non-interactive) Probabilistically Checkable Proofs (PCPs) in 1992. In particular, Shamir’s IP=PSPACE paper in 1992 paved the way for people’s understanding around the potential of proof systems by demonstrating that any computation (which requires a generally practical amount of memory to solve) can be efficiently checked for correctness through the use of interactive proofs. These breakthroughs, however, were simply not commercially viable. IPs were exponential-time for the prover and PCPs were so long (superlinear-size) that it would have taken thousands of years to generate and check them.

Innovation began accelerating over a decade later in 2007 with two key advancements. One was the introduction of more performant Argument systems (IKO07) which were converted from PCPs and offered a succinct prover and more efficient verifier while maintaining computational soundness (i.e., making the assumption that a malicious actor does not have unlimited computing power). Another, GKR08, introduced an interactive proof protocol with a polynomial-time prover (prior protocols had superpolynomial-time provers) and a quasilinear-time verifier. That said, both were still completely impractical if implemented as-is.

It took five more years for researchers to propose notable improvements to these protocols in 2012, with Pepper refining IKO07 to achieve better costs (but still had a quadratic-time prover so did not scale to large computations) and CMT12 reducing the prover time in the GKR protocol from quartic-time to quasilinear-time. 2012 also saw an introduction of new “underlying cryptographic machinery”, which served as the theoretical foundation of the first SNARK that remains popular today-Groth16 (introduced in 2016).

Since 2016, advancements in proof systems have only continued to accelerate, with recursive SNARKs, STARKs in 2018, and folding schemes in 2021 making generating and verifying proofs orders of magnitude cheaper and less computationally intensive. Today, proofs for many common computational tasks can be generated in a matter of seconds using commodity hardware. While the performance still isn’t ideal for many blockchain applications, it is generally believed to be commercially viable at scale.

Verifiable Compute 101

The easiest way to verify a computation is to simply repeat the computation, but that’s clearly inefficient and results in wasted time and energy. Techniques for verifiable compute aim to run computations on your behalf while providing assurances around the validity of the computation’s inputs, outputs, and methods used.

While people often think of zero-knowledge proofs (ZKPs) when they think of verifiable compute, it is important to note that these concepts are related but different. Verifiable compute is a broader concept that encompasses various techniques for ensuring computational integrity. It doesn't necessarily involve zero knowledge and the goal is moreso to prove the correctness of computation rather than to preserve privacy. Moreover, zero knowledge is an additional property that verifiable compute protocols can obtain.

That said, there are similarities between the concepts. Both ZKPs and verifiable compute utilize cryptographic techniques designed to establish trust and verify information. In addition, the primary constraint for both concepts is that the verification of the computation should require substantially less computational effort than performing the computation from scratch.

The traditional framing of zero-knowledge proofs is that they provide a way to cryptographically prove knowledge of a particular set of information without revealing what that information is. While this framing focused on proving simple statements, recent advances have unlocked breakthroughs in "general-purpose" ZKPs that allow you to prove arithmetic circuits, which can be encodings of real programs. These circuits are fundamentally a series of mathematical statements that are used as the basis for the proof. The circuit verifies details about the program’s execution, such as:

Who called the program’s function?
Did the caller pass correct inputs to the program?
Did the execution use the correct program?
Was the final output the result of correctly executing the program with the provided inputs?

In other words, you are effectively modeling the physical logic gates and circuits that make up a program as arithmetic gates and equations that give you an equivalent output for that program. The broader technique involves a programmer specifying a computation in a high-level language and a compiler transforming the computation to the formalism that the verification machinery uses, ultimately outputting executables that implement the verifier and prover. In short, ZKPs turn code into math.

A physical logic gate, from Tindie, and its arithmetic circuit form from, Hadas Zeilberger

There are other tangential technologies that are often associated with verifiable compute and it is important to clarify the differences. Fully homomorphic encryption (FHE) is a technique that allows computations to be performed on encrypted data without revealing the plaintext. While FHE is also used for privacy, and in fact provides stronger privacy properties because it ensures full-state privacy, the technology does not ensure the inherent correctness of the computation itself. Like FHE, trusted execution environments (TEEs) are also primarily used for confidential compute, but do not provide verifiable compute. TEEs ensure computational integrity with an attestation-based trust model that provides verification for three things: the application’s identity, its intactness (that it has not been tampered with), and that it is running securely within an enclave.

Issues

While researchers and developers have made tremendous strides in commercializing the technology over the last decade, we still have a long way to go.

Costs

Verifiability does not come for free. There is an inherent overhead for proving computation, and it is high for cryptography-based systems. From one source, it is currently approximately 1,000x to 100,000x more expensive to generate a ZKP for a computation compared to natively performing the computation. From another source, the Jolt prover, which is currently 6x faster than other currently deployed zkVMs, it is 500,000x slower relative to native execution of a RISC-V program. In other words, proving one step of the RISC-V CPU requires ~500,000 cycles of the RISC-V CPU.

The good news is that the efficiency of chips (power usage per unit of compute) improves exponentially over time, so we can expect costs to gradually converge to the cost of native compute. Even still, without additional advances in hardware (e.g. ASICs for ZKPs) and algorithms (i.e. even more performant proof systems), it is hard to imagine the overhead being less than 100x over the next decade. And for mainstream adoption, the power required to produce a proof likely needs to be a fraction of the percentage of power required to do the computation, rather than orders of magnitude more.

Societal Value

That said, I believe that there will be increased societal value placed on verifiability, which will translate into a willingness to pay for commercial use cases for verifiable compute technology, especially in low-trust environments like blockchains.

We already see this today with ZK rollups (ZKRUs). If it costs a ZKRU ~$200K/month to run a prover but they earn ~$1M/month in sequencer revenue, the economics clearly make sense. Naively, one could argue that the economics already make sense from a cryptonetwork perspective. If Ethereum has 7,500 nodes (at the time of this writing), each of which re-executes every transaction, one could argue that a ZKRU with 5,000x overhead is still cheaper for the system as a whole.

We will see more data points here over time. For example, if consumers are willing to pay more for a device (e.g. cameras, microphones) with an attested sensor (a TEE that stores a secret key for digital signatures), then we can say that GenAI is likely driving real concern over the authenticity of content.

Tooling

Despite the power of verifiable compute, it remains largely inaccessible to the vast majority of developers. Domain specific languages (DSL) like Circom, Noir, Leo, Lurk, and Cairo are effective but require large communities and comprehensive libraries to be adopted at scale. Transpilers are slow, and virtual machines that provide abstraction come at the cost of an additional 100x to 1000x overhead.

Ideally developers can avoid thinking about circuits, constraints, and arithmetization altogether, and the developer experience is as close to that of the native blockchain on which the developer is building their application. The performance trade-off is glaring here, but I would argue that the experience as-is caters to too niche of a developer base to generate enough experimentation with new use cases (for if the use case is compelling enough, there will likely be a willingness to pay from users). Furthermore, developers need simpler deployment workflows that incorporate verifiable compute for their applications, including the underlying infrastructure and integration with onchain components.

An Overview of Use Cases

The blockchain industry often frames use cases through an introspective lens—L1s, EVMs, defi, bridges, etc. But because the use cases for verifiable compute can extend beyond blockchains themselves, they deserve a broader framing that extends beyond those we see in production today.

Privacy

The first production use case for verifiable compute has been privacy, starting with Zcash in 2014 and applied to projects like Aztec in 2017, TornadoCash in 2019, and Nocturne in 2022. These use cases were particularly focused on payments and aimed to make private the sender, receiver, and/or amount of a given transaction.

To provide a more tangible intuition of what exactly is being “proven” for privacy-related use cases, the zk SNARK for Zcash proves the following:

Knowledge of Spending Key: The transaction creator knows the spending key corresponding to the notes they intend to spend without revealing the keys themselves.
Unspent Note and No Double-Spending: Each note being spent has not been spent before, managed through nullifiers, which are cryptographic values derived uniquely and deterministically from the spending key and the note. The nullifiers for spent notes are revealed and checked against a list of all nullifiers from previously spent notes.
Conservation of Value: The total value of inputs equals the total value of outputs.
Merkle Tree Validity: The spent notes exist in the blockchain's current Merkle tree of all notes (spent and unspent).

The general issue with mainstream adoption of these solutions has been difficulty with regulatory compliance. In response, new projects have been leveraging verifiable compute to provide certain guarantees around this. For example, a user can generate a proof saying that “I didn’t interact with anyone on the OFAC list.” It is also possible to make the privacy properties adjustable and “pull back the privacy curtain” if necessary. For example, one can reveal the connections between addresses but keep the values and amounts hidden. This paper has a good overview of what can be done and what fundamentally can’t. This use case actually does demand the zero-knowledge property that ZKPs provide, but most other use cases rely on the succinctness properties of ZKPs.

Compression

The second production use case has been around succinctness, which can be reframed as state and compute compression. ZKPs can be used to post a proof onchain that says “here is some batch of valid transactions whose execution leads to some valid new global state.” There are several blockchain-native use cases here, notably ZKRUs and coprocessors. ZKRUs scale the execution throughput of blockchains by maintaining a partitioned persistent state that is periodically posted and finalized at the settlement layer. Coprocessors allow blockchain applications to use offchain compute while accessing the full state of the underlying chain without adding any trust assumptions to the application itself. The primary difference between ZKRUs and coprocessors is that the former is stateful while the latter is not.

Compute compression can also improve security for blockchain interoperability. Instead of relying on economic or game theoretic security, projects can efficiently verify proofs that the data posted on the destination chain has been signed by the validator set on the source chain, thus implementing “proofs of consensus.” In other words, instead of verifying a bunch of signatures onchain, which is generally cost-prohibitive, you verify a proof of a circuit that verifies a bunch of signatures. This is the basis for the “ZK bridges” we see coming to market today.

To provide a more tangible intuition of what exactly is being “proven” for compression-related use cases, a proof for a ZKRU proves the following:

Validity of Transactions: Each transaction in the batch has been authorized by the owner of the funds.
Correct State Transitions: The application of the state transition function is correct, including whether balances are updated correctly and ensuring that inputs, outputs, and fees are accounted for correctly.
Conservation of Value: The sum of inputs equals the sum of outputs plus any transaction fees.
Accurate State Root: Confirms that the transition from the old state root to the new state root is valid.
No Double-Spending: Each account transaction must increment the nonce correctly.

Another example of an application using this property is onchain games. Instead of verifying individual actions of a game onchain, users can simulate the game locally and provide a proof of their game’s end state which can then be trustlessly verified. This also computationally enforces the game logic because the proof will be invalid if someone submits a forbidden action.

Data Integrity

Verifiable compute can be used to prove that some piece of data from either onchain or offchain sources, as well as any additional computation on that data, is accurate and has not been tampered with. This enables a class of oracles that can be framed as verifiable oracles, which can be used in a wide range of applications, such as price feeds for decentralized exchanges and ingesting arbitrary web2 data and identity onchain. Verifiable compute can also be used to scale oracles in general by performing price updates and signature verification offchain in a circuit and posting proofs containing the signed updates onchain. It can also be used for building verifiable databases, which ensure data provenance by proving correctness of both data injection and retrieval.

To provide a more tangible intuition of what exactly is being “proven” for data integrity-related use cases, the proof for a zkTLS project like DECO proves the following:

Data Possession: The user (prover) possesses a value that corresponds to specific data obtained via a secure TLS session.
Data Origin: The data originated from a specific server that is verified through the server's TLS certificate.
Data Integrity and Freshness: The data is unchanged from what the server sent and was retrieved in a timely manner.

A note on oracles: one can consider something an oracle if it breaks the chain of custody of the data that is being provided. Any data ingested onchain from offchain like ML models or exchange feeds is certainly considered an oracle, but what about offchain computation on L1 state or execution of L2 state? In an attempt to further categorize this, I suggest that all coprocessors are also oracles, specifically verifiable oracles, but not all oracles are coprocessors. In addition, verifiable oracles can be considered coprocessors if they perform compute on the data being provided along with a proof for the output of that computation.

A related use case is storage proofs, which proves the value in a storage slot for an address at a specific block. The “proof” here is not in the context of a ZKP but rather a light client proof which consists of the block header, state root, and Merkle-Patricia inclusion proofs for the key-value pairs of account data in the state trie and storage slot data in the storage trie. Verifying this light client proof requires checking that the block header, state trie proof, and storage trie proof is correct, which involves many Keccak hashes and is very expensive to do onchain. Instead, one can do each of these checks directly in a circuit and only need to verify ZK proof onchain, which is much cheaper.

Machine Learning

Verifiable compute also has applications for machine learning (often referred to as “zkML”) by proving one or more steps along the “ML pipeline”. zkML can be used to delegate the execution of machine learning models to service providers and obtain proofs that a particular trained model produces the correct output given some inputs (i.e. verifiable inference), which ensures transparency and prevents model substitution. There are different flavors of verifiable inference: public model/private (but signed) input like Worldcoin, private model/public input like credit scores, and private model/private input like KYC.

It can also be used for verifiable training by generating proofs that the correct dataset and learning algorithm were used in the creation of a model (e.g. proving that no data poisoning occurred in a dataset of non-copyrighted works). zkML can also be used for proving specific properties of a model, such as fairness (e.g. the model is not biased), which can be important for certain applications, such as granting university admissions, assessing loan applications, and distributing medical resources. In a blockchain context, zkML effectively acts as an oracle that ensures the integrity of external models and their outputs that can be used to update mission-critical blockchain state.

In terms of the proofs themselves, the components being “proved” are generally:

Correct Application: A specific model and set of parameters (e.g. weights of a neural network) were used to compute the output from given inputs.
Integrity of Parameters: The parameters used are the ones claimed (i.e. they haven't been tampered with or altered).
Correct Execution: Each step in the computation (e.g. for each layer in a model) was executed correctly.

To dive deeper into the intuition behind zkML, it is helpful to revisit ZKPs. In general, ZKPs enforce consistency between the circuit, committed inputs/outputs, public inputs/outputs, and intermediate values within the circuit (these are generally either committed to and/or not revealed at all to the verifier). Note that the only difference between a committed vs. public input/output is that the former is private (i.e. the prover says "I know some x such that...") while the latter is public (i.e. the prover says "here is x - you can check it for yourself!"). In other words, a ZKP shows that a circuit which takes in some committed/public inputs indeed produces the claimed committed/public outputs, with consistency between those and some set of intermediates which the verifier doesn't care about. Applying this framework to zkML, we have:

Circuit ≈ model architecture.
Committed inputs ≈ model weights, which are hidden but consistent between runs (enforcing the same parameters to be used each time), and potentially also private/hidden inputs.
Committed outputs: not commonly used in zkML unless you want the outputs to be private and perhaps further used in another ZK computation (e.g. proving some property about the private output of a model).
Public inputs ≈ model inputs, although this can also be model weights for public/open-source models too.
Public outputs ≈ model outputs.
Intermediate values within the circuit: typically not used in zkML.

Proof Generation and Aggregation

One of the magical capabilities of certain verifiable compute systems, namely SNARKs and incrementally verifiable Computation (IVC) schemes, is recursion, which is the ability to generate “proofs-of-proofs” by using the output of one proof as the input for another proof. This can decrease the cost of verification by aggregating proofs from different sources and recursively proving them into a singular proof, amortizing the cost for all participants involved.

Proof aggregation can generally be applied to scalability (e.g. a ZKRU ecosystem aggregating proofs from its L3s and L2s to settle onto the L1) and interoperability (e.g. composing proofs from different ZKRU ecosystems to enable more secure and seamless value transfer across them).

It can also be looked at from the perspective of vertical scaling (i.e. rolling up lots of disconnected proofs for blockchain-level scaling) and horizontal scaling (i.e. providing more expressive and interconnected programs).

Conclusion

Verifiable compute expands the design space of blockchains by enabling them to interact with external systems and data in a secure and mathematically verifiable way. In addition, it scales blockchains by removing the need to re-execute the same computations while guaranteeing that the execution was done correctly and effectively, enabling infinitely complex smart contracts that can bypass a blockchain’s gas limits.

On top of all of this, recent advancements in generative AI like OpenAI’s Sora have placed a significant societal pressure on ensuring verifiability and further commercializing the underlying cryptographic techniques. The drive is a fundamentally human one; in a world where we do not know what is real and what is AI-generated, verifiability impacts the identity of both the species and the self.

Despite the issues with performance and cost, we are close to achieving the dream envisioned 33 years ago, where: “...a single reliable PC can monitor the operation of a herd of supercomputers working with possibly extremely powerful but unreliable software and untested hardware.” -Babai, Fortnow, Levin, Szegedy, 1991

If you are building infrastructure or applications around verifiable compute, please reach out!

You can find me on Twitter/X and Farcaster.

Many thanks to Srinath Setty, Thor Kampefner, David Wong, Shumo Chu, JD Kanani, Ben Livshits, Tom Walton-Pocock, Kobi Gurkan, and Ryan Cao for conversations and feedback on this piece.

Definitions

Verifiable Computation: Enables, through a variety of cryptographic techniques, a computationally weak client to “outsource” computation to a worker with more computational power, who then returns the result along with a proof that the computation was carried out correctly. Verifying the proof should require less effort than performing the computation from scratch.
Arithmetization: The process of turning a generic statement or question into a set of equations to be verified or solved.
Proof: a logical demonstration or evidence that establishes the validity of a statement.
Interactive Proof (IP): An iterative exchange that permits the verifier to pose random queries and the prover to demonstrate the validity of a statement.
Probabilistically Checkable Proof (PCP): A proof system where the proof's validity can be efficiently verified probabilistically and with high confidence by examining only a small, randomly chosen portion of the proof.
Interactive Oracle Proof (IOP): A generalization of PCPs and IPs, where a prover interacts with a verifier by making queries to an oracle, aiming to convince the verifier of the correctness of a statement.
Argument of Knowledge (the ARK in SNARK): A protocol where a prover can convince a verifier of the truth of a statement, while additionally demonstrating knowledge of some secret information related to that statement. An “argument” of knowledge, as opposed to a “proof” of knowledge, relaxes the inflexibility of proofs by stipulating that certain computational tasks are infeasible for the prover.
Succinct Non-Interactive Argument of Knowledge (SNARK): A protocol where a prover can convince a verifier of the validity of a statement without revealing any underlying information, while maintaining a small and efficiently verifiable proof size. IPs, PCPs, and IOPs can all be compiled to a SNARK by utilizing polynomial commitments (for succinctness) and applying Fiat-Shamir (for non-interactivity).
Incrementally Verifiable Computation (IVC): An ARK for incremental computations; enables producing proofs of correct execution of “long running” computations in a way that can be efficiently verified in an incremental manner.
Folding scheme: A protocol used for IVC that is weaker, simpler, and more efficient compared to direct proof composition approaches where a verifier is implemented in a circuit. IVC proofs are usually constructed using a folding scheme and a SNARK at the end to finalize the proof.

Sources

Disclaimer:

This post is for general information purposes only. It does not constitute investment advice or a recommendation or solicitation to buy or sell any investment and should not be used in the evaluation of the merits of making any investment decision. It should not be relied upon for accounting, legal or tax advice or investment recommendations. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment or legal matters. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by Archetype. This post reflects the current opinions of the authors and is not made on behalf of Archetype or its affiliates and does not necessarily reflect the opinions of Archetype, its affiliates or individuals associated with Archetype. The opinions reflected herein are subject to change without being updated.

Subscribe to Archetype

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

YZsfBurTsZyYB1N…bssfWGFNr--cXvM

Author Address

0x47C53BEf909913d…831D68F26Afba0E

Content Digest

Lov-dI8FOueUt4J…XfHCdEmSg67jzoM