The Future Of StarkNet: Performance Boost, New Features and Time Frames
Summary of key points:
– Cairo 1.0 is about to be released in the next few days. The repo and the compiler will be available for anyone to start playing with. And the first version that will allow writing StarkNet contracts will be released at the end of the month.
– Re-genesis is needed for two main reasons. To clean up StarkNet’s architecture and to add the ability to include reverted transactions in a block. Which will enable more security, more performance, and remove issues with building on StarkNet.
– The StarkNet team is working on implementing transaction parallelization. It enables multiple transactions to be processed at the same time, and improves performance by 3X. This will be deployed in the next few days on testnet.
– They will then build on top of transaction parallelization and separate the execution and the validation of the block. This will give StarkNet another performance boost of about another 3X.
– The next implementation is rewriting everything from Python to Rust, starting with the Cairo VM in Rust.The benchmarks show that this will improve performance by 20X to 30X. The timeline for this is in the upcoming weeks.
This is only a small portion of what Tom and Motty discussed. You can find more details on the future of StarkNet, re-genesis and Cairo 1.0 in the following transcript.
So first of all welcome Tom, it’s a pleasure to have you with us. For people who don’t know you, you are Product Manager at StarkWare, and people can think of you as the father of StarkNet. I heard you say once that you have four kids: three biological and StarkNet. Which I think fits well the definition and the role.
Before we dive into the network, re-genesis and so on, I had an exciting time at the StarkNetCC in Lisbon. How was it for you?
Thanks for having me here. First of all, while some might consider me the father of StarkNet, I no longer see myself that way. Now, I see myself more as a child of StarkNet, alongside its many other innovative initiatives like StarkNetCC. Observing the project from an outsider’s perspective has been incredibly exciting. The overall atmosphere and interactions between participants have been truly remarkable. So for me, it was an absolute pleasure to be there.
Yeah, definitely. People are extremely curious about the network status, and maybe a bit frustrated about the testnet status so what’s the plan ahead in terms of performance?
What can be done, what is already in the oven and almost ready to be served out?
Yeah. So first of all. I can understand the frustration here since the timeline of improvements was not exactly what we expected internally. However improving performance is the top priority and has been for some time now.
It’s actually a positive challenge to encounter on StarkNet. Unlike other blockchains, validity roll-ups offer a roadmap for enhancing the sequencer. On those blockchains, you would need to continually upgrade to larger machines as the transactions increase, which becomes a resource-intensive task of improving software performance to enhance network performance. However, on StarkNet, the validity roll-up approach allows for a more systematic and planned improvement of the sequencer without the same hardware scalability concerns.
Tom, I’m sorry to interrupt but can you explain the difference between a sequencer and a node?
Yes, of course. In general, the role of the full node in any blockchain network is for users to be able to participate in the network and to hold its full state without the need to trust any other external entity.
For example, if I want to run my own guest client on Ethereum, I basically download all the blocks of the network and re-execute them. It means that, starting from the Genesis block, I go over each block and re-execute all their transactions and then I can generate my state and know exactly what is the state of the network. For example, what assets I have in my wallet.
The beauty and the main point of running a full node is that you don’t need to trust any external source, you have the full state locally and you execute it, so you know it’s true.
This is how full nodes work on Ethereum.
And a sequencer is the entity which progresses the network. On Layer1 proof of work, they’re called minors, and for proof of stake, they’re called validators.
The role of the sequencer is to create a block from the transactions that were propagated in the network, and then to send this block to other nodes on the chain. Once you have consensus on the block, full nodes can include it as part of the chain and update their state.
And so this is the difference between sequencers, which are responsible for advancing the chain and full nodes, which are the way for users to take part in the network.
If I were to simplify that, I would say that nodes are mainly there for reading data and sequencers are the entity that drives data into the chain.
Would you say that’s a fair high level summary?
That’s correct but the principal property you want in your full node is to read data in a trustless way. So nodes also have to somehow possess the state to be able to read from it without any external dependencies.
The key point on StarkNet is that achieving this improvement doesn’t require re-executing every transaction in the blocks. Instead, you can rely on proofs that validate the blocks’ integrity. This means that you only need to update the state, which is a simpler operation that can be efficiently performed by any computer. Unlike linear scaling with throughput, this process doesn’t exhibit the same scalability constraints.
That’s great, maybe we will touch on how much the sequencer can scale later on.
So we talked about nodes and sequencers but, how does it help us improve network performance from the current state?
As I said, with StarkNet’s validity roll ups you can improve the sequencer and this is exactly what we’ve been working on for more than six months. And for the last four months, we’ve worked with full force on adding parallelization to the execution of transactions.
To explain transaction parallelization briefly, at this time, the sequencer executes the transactions inside the block sequentially. Meaning it starts by executing the first transaction, then the second and so on.
This operation can be parallelized, meaning you can execute several transactions at the same time. There are many ways to do it, but the way we chose to approach it is to just try and execute multiple transactions in parallel.
For example, if we have four processes, we start executing the first four transactions, and then we check if they are somehow dependent on one another, but what does “dependent” mean?
Let’s say that the first transaction writes to some storage cell and then the second transaction tries to read from this same storage cell. Running at the same time (in parallel) creates an unresolved dependency, leading to the failure of the dependent transaction. But in the optimistic scenario, this allows us to run as many transactions as we want in parallel, which will improve StarkNet’s performance. This is one of the things we’re currently implementing internally.
I know that you don’t like to make harsh commitments in terms of numbers, so we won’t hold you to that. But can you give a general feeling of how much the sequencer is anticipated to be improved due to parallel transaction execution and what’s the approximate timeline for it?
Ok, let’s start with the timeline.
We’re currently testing it, and it follows the usual upgrade process: we’ll first launch it in an integration environment to see that nothing breaks, then we upgrade Goerli and then we’ll upgrade the mainnet.
The timeline is quite tight, and we are aiming to have it on the testnet this week. However, there is a possibility of a slight delay, and it might be pushed to early next week.
The part of the execution that we’ve parallelized so far is 2/3 of the execution. Initial performance improvements are therefore limited by 3X because even if the 2/3 part is negligible, there is still 1/3 that remains unimproved. So this is the immediate optimization that we can expect in the next few days.
Another caveat of parallelization is that when there are bursts of activity from a specific application, such as minting, where interactions have dependencies on each other, the benefits of sequencer improvement won’t be realized. This is because these transactions must be executed sequentially, one after the other.
The second step of sequencer improvement is to take out of the critical path the other third of the execution. And then we hope that we can remove this bound to have much better improvement. So this is the current plan and timeline for the parallelization effort.
OK, let me try to summarize to make sure that I understand.
We’re talking about an increase of the transaction speed that will be up to 3X on testnet in the next few days to a few weeks.
Following that, we will see another improvement that will take the transaction parallelization part and build on top of it by separating the execution and the validation of the block itself. Which would give us about another 3X. Is that right?
That’s what we hope, yes.
That’s very cool, so hopefully we’ll see on testnet, in a matter of a few short weeks, an order of magnitude improvement, because 3X and another 3X is almost an order of magnitude improvement which is definitely significant.
I’m curious because for example, today in StarkNet’s sequential sequencer, it’s promised that the transaction that comes first will be executed first.
Is that order going to change once we move to parallel transactions?
Yes and no, since this is something that won’t hold in the future.
First, yes because there is still the promise, even for dependent transactions, that we always try to execute transactions in the order that they arrive.
But let’s say that the second transaction is dependent on the first transaction. So the first transaction is guaranteed to be executed before the second. But if I fail the second, then I retry after the first one has finished. So maybe in the meantime, we could have already executed transaction three and four before retrying transaction two. At least, this is what we hope to achieve.
But note that this parallelization is only the case if transactions are not dependent. So for example, if two transactions are trying to execute the same trade, or two transactions try to mint the same asset and so on, then those transactions still have the guarantee to be executed in the order that they arrived. Does that make sense?
Yeah that makes sense.
I wonder if we have transaction #1, which is a very long transaction and transaction #2 is executed in parallel, but it’s a very short transaction.
Could it be that transaction #2 will be accepted before transaction #1?
No, because the “accepted on L2” status is given to a block, not to specific transactions. It’s only after I finish executing a block that all the transactions in this block will receive this status. In the future, at least after re-genesis, the assumption that you have some guaranteed order won’t exist for two reasons.
First, we’ll have a fee option which means you won’t just send the maximum amount of fee you are willing to pay. But you will have something like Ethereum, where you choose the price of execution that you’re willing to pay and then the sequencer picks whatever is most profitable for it.
And the second reason is that once we move to a decentralized protocol where you have many sequencers then there’s no such thing as order. if you judge them by the time of arrival, each sequencer has its own mem pool with different order of transaction. So there’s no notion of order of transaction.
Yeah, so should we all prepare for MEV coming to StarkNet?
Yes, this situation is common, but it’s important to prepare yourself for the absence of assumptions regarding the order of transactions and the presence of Miner Extractable Value (MEV) with its merits and downsides. However, we will also provide valuable suggestions on how to balance these factors.
OK, cool. So let’s go back to the performance.
We just agreed to this twitter space and to the entire world that we’ll soon see an order of magnitude improvement on the sequencer in testnet and later mainnet in the next few weeks which is great, but is it the end of the line or do we have more potential to optimize the sequencer?
As a disclaimer to set expectations, I won’t promise an order of magnitude though I hope for it. After parallelizing the execution and the storage access at a later time, the next step on the roadmap is to make the software more performant overall.
When we started StarkNet Alpha, we used our own existing infrastructure to try and quickly ship the functionality. It allow developers to start developing StarkNet contracts and interact with StarkNet. Unfortunately, this led to using a non-performant infrastructure that is written in Python.
And now we’re rewriting everything in a performance implementation in Rust. The first step for that is to implement the Cairo VM, the execution environment for Cairo in Rust. This rewriting started about four months ago and is now complete.
The performance benchmarks for the difference between Python and Rust that we see on this implementation are between 20 and 30X improvement.
Wow, that’s impressive. And that’s on top of the improvement that will come from parallel transactions and the separation of execution and validation of the block?
Nice. You will not sign on this number but just for the audience: if we have an improvement of let’s say, x5 and we add another x20, this results in 100X which is 2 orders of magnitude.
And now I have a feeling that Tom will put some disclaimers.
Yes a disclaimer is needed, because the first improvement that I mentioned, the implementation of the Cairo VM in Rust, only accelerates the places where we run Cairo code. For example if I want to access my database, this is unrelated to running Cairo code. So the Cairo VM which was implemented in Rust by Lambdaclass in a very impressive project, will only improve the places where we execute Cairo.
Now as for the status of this project, the implementation itself is done. They are now working on wrapping it in Python to allow the existing sequencer and the existing tools in the ecosystem, such as devnet and the testing frameworks, to use it and to accelerate everything.
They estimate that they will finish the Python wrapper somewhere around next week. Then we’ll begin its integration into our sequencer. I don’t have an exact timeline on that, but I think can be done in the upcoming weeks.
Wow so do all these improvements stack on top of each other?
Even though both the parallelization and the Cairo VM are improving in roughly the same area, I don’t think that they combine exactly.
But the next step to improve other areas is implementing the sequencer in Rust. Which will start tackling storage handling and stuff like that to make them more efficient. And this is something that we started at the beginning of November. We’re aiming for a first small milestone, let’s say between two and three months, maybe a bit more.
Will we have this project executed in iterations or does it have to be finished from start to end before it can be released?
This will be done in iterations. You can think of a sequencer as a big piece of software with many modules. So the first module that we define is what we call the block creator.
It receives a list of transactions, it executes them, and it generates a block. So this module can be used as is, both in external tooling and in the internal sequencer. This will also be the basis for the sequencer that will be available for everyone to use once StarkNet is decentralized.
People are familiar with Ethereum and understand its value. They also recognize that it has a scalability issue, and that there are various strategies available to address this problem. It would be interesting to hear what these different solutions are, and why one is better than the others.
So can you explain why validity rollups are better than optimistic rollups and side chains? Why do they have the potential to reach the highest number of transactions per second (TPS) in the future?
The way I see it, it all goes back to what we mentioned about the separation between a full node and sequencer and what it really means when we’re saying “The network is secure and decentralized and anyone can participate in it”.
The way I define it is for a network to be decentralized and to achieve security, it must allow anyone that wants to participate to be able to hold the state of the network and verify it.
If you are in a situation where your ability to verify the state is just by executing it then these two properties create conflicting tradeoffs. So if you want to increase the throughput of the network, you have no option but to remove part of the network from being able to verify it.
Again, we see it, Ethereum chose to be decentralized, same is true for Bitcoin. And for example, Solana chose to be very performant. But if you want to somehow verify the state of Solana in a trustless way or to participate in it, you just can’t do it. You have no ability to spin up a node that can actually verify the state of Solana. So it’s very permissioned, in the sense that as a user, I have to trust some external entities to understand what is the state of the network and what happens there.
This is also true for optimistic rollups. It’s a bit harder to understand it there because you can always spin up a validator or trust that one validator will be able to monitor the chain. To detect network activities and submit fraud proofs in case of an invalid state transition, the validator needs to operate at a speed equal to or even faster than the hardware necessary to handle the network’s throughput.
Because the validator needs to run all these transactions by itself to make sure that there are no frauds, right?
Exactly and in that sense, I don’t want to say it but I’m not optimistic about optimistic roll-ups.
Now the use of computational integrity proofs (validity proofs) basically allows you to solve this trade-off. This is because you are still able to verify the correctness of the computation by verifying the proof. But the work that you’re required to do is logarithmic, or in other words, very, very small compared to the actual throughput of the network. So even though we are scaling the network and taking it to the limit of what software and machines can do, we do not need that from the full nodes of the network. We only require them to keep up by verifying the proofs, which is something any computer can do.
To be honest, this is why I originally joined StarkWare four years ago. There is something magical about how the biggest problem in blockchains, which is scalability, fits perfectly with the properties of zero knowledge proofs, since the scalability of the proofs is logarithmic in the size of the computation.
This was a bit unreal but it was clear to me that this is something that will be part of allowing blockchains to become mainstream and to have real use cases in the interaction between people.
I completely agree, this is mind-boggling. Certainly, let me clarify and summarize. In other solutions, the traditional approach involves rerunning transactions to ensure their correctness and detect any potential fraud. However, in the case of ZK rollups like StarkNet, there is no need to rerun each transaction. Instead, the focus is on verifying the proof that demonstrates the correct execution of the transactions. This approach saves computational resources and allows for efficient verification without the need for full transaction re-execution. And the key thing here is that verifying the proof is a much cheaper, much lighter task than doing the actual computation.
To provide a tangible understanding of the order of magnitude and the concept of logarithmic complexity, let’s consider some numbers. If a computation requires a billion steps, its verification process would only entail 30 steps, disregarding certain constants. This dramatic reduction from a billion to 30 steps showcases the significant optimization achieved. The remarkable aspect is that even if we escalate to a trillion steps, the verification steps would only increase by another thirty. So for 1,000,000,000,000,000,000 steps (1 with 18 zeros after), we are talking about only 60 steps to verify.
Take the time to think about it and try to visualize it. That’s really mind boggling, which is why we are all here very bullish on StarkNet’s potential to scale Ethereum. So even though today we see 1 TPS, this is only because the network is still in development, this is by no means its potential.
And I’m curious Tom, what do you think is the true potential of StarkNet and ZK rollups that are based on STARKs? What should we expect in the next 2–3 years?
Saying this is like famous last words, right? It’s an engineering problem and I have the confidence that we can scale as well as other solutions and hopefully even more so. For example, we see that Solana can have on the order of hundreds transactions per second. This number is debatable because it’s not clear how the transactions are defined and what percentage of the transactions are part of the consensus. However, let’s assume that this is something they’re able to do.
You might raise the question, “Could the Cairo VM, being optimized for proving rather than executing, impose some lower bounds?” It is worth mentioning that as part of our roadmap, we aim to eliminate this limitation by developing separate compilers for the prover and the sequencer. This strategic approach will allow us to surpass the current boundaries and unlock enhanced capabilities within the system.
This is a bit more complicated, but let’s dive into it.
So there are two parts of the protocol, one that we already talked about is the sequencer, which decides what is the block and propagates it into the network. As for the second part, the prover takes many blocks and generates the proof to each one and posts it to L1.
Now even though both can be written in Cairo, what is needed from the sequencer is just to compute the new state and what is needed from the proving is to be able to generate the trace, then you can prove its validity.
For both the prover and the sequencer, we can have protocols optimized specifically to remove this barrier, to not require the sequencer to run the Cairo VM but something which is compiled to X86, to assembly.
Just to give another example. One of the nice properties of Cairo 1.0 is that it has an intermediate language which is called Sierra. And now we can say many things about Sierra, but it allows us to compile from Sierra to different assemblies. One of which is the Cairo assembly that will be used for proving and the other one is the X86 assembly which will be used for executing in a very efficient way.
And by the way, Cairo 1.0 is about to be released in two weeks, the repo and the compiler will be available for anyone to start playing with. And by the end of the month we will release the first version that will allow writing StarkNet contracts.
Wow that’s interesting, I didn’t know that. That’s quite an interesting architecture choice and the implementation choice is very innovative.
That’s a good segway to Cairo 1.0 and re-genesis, etcetera, etcetera. Allow me to be blunt, but why do we need re-genesis? What’s wrong with the current StarkNet network?
There are two major reasons why we want re-genesis. I will start with the simplest one. As I said, we started the system in an alpha state and had changes to many aspects of the network during the last 1 1/2 years.
For example some data that are now part of the block hash was not existent for their first blocks, so we lack their pre-image, and there are system calls that were originally supported but now aren’t and we want to remove them. And a lot of other changes were made to the architecture and to the structures of blocks and transactions during this one and a half year. And it is somewhat of a tax on any infrastructure that wants to build something for StarkNet.
So the first reason is due to a lot of stuff that you prefer not to have when you’re going to production. If this was the only reason, we might have debated doing re-genesis but the second reason is related to Sierra.
One property that transactions in StarkNet currently lack, is the ability to include reverted transactions in a block. So on Ethereum if you have a failed transaction, it will still be part of the block. This allows the sequencer to say. “See, I spent some computation on this transaction so I’m eligible to take a fee even though it failed.”
This mechanism carries critical significance, particularly within a decentralized system, where safeguarding the sequencer against Denial of Service (DoS) attacks is paramount. For those who are unfamiliar with DoS, it involves individuals sending transactions with the goal of overwhelming the sequencer, thereby hindering the execution of transactions by others.
If the sequencer can’t prove the execution of a failed transaction then it means that he can’t include it in a block and then it cannot charge for that. This is the opening for having these denials of service attacks.
And this is merely one of numerous examples of the increased complexity that arises when the inclusion of failed transactions in a block is not feasible. In StarkNet, we desired to possess this capability. However, to achieve it, we must refrain from executing contracts that were not authored with Cairo 1.0.
These are the two reasons why we need re-genesis.
To include failed transactions in a block, all contracts need to be written in this Sierra intermediate representation language.
Does this mean that all existing contracts will have to migrate to the new VM and to the new language?
Not exactly, let me rephrase it a bit. The contracts will have to be written in Cairo 1.0 and then it will be compiled into Sierra. As a result, nobody needs to write anything in Sierra, this is just some intermediate representation, the compiler takes care of everything. So there will be no migration in the sense that migration means moving to a new contract or to a new address or to a new network.
Which is why re-genesis, as it is defined today, will not require any migration. It will require the replacement of existing contracts implementation with new implementation and we have a scheme to do that in a way that does not require any migration.
If we take it down to Earth, what does it mean about the contract addresses and wallet addresses? Will they change after re-genesis?
No. The way it will work is that once we release the updated compiler, we’ll release the new version of StarkNet which will begin what we call the transition period. In that transition period, all contracts will need to change their implementation to an implementation which was written in Cairo 1.0.
Now, there is no way today to change implementation without modifying the address. But we are adding a new way for contracts to do it just for this transition period. So all contracts, whether it’s account contracts or application contracts, will have a scheme to update their implementation without changing their address, changing the storage or having to move their assets, and so on, with no downtime.
So how will re-genesis impact end users?
So the only thing they will need to do is to log into their wallet once during this transition period and click on the “upgrade” button in their wallet. And that’s it. Everything will stay the same. The address will say the same, assets in their wallet will stay the same. The ability to connect to dApps will stay the same, so it’s like a regular account update from user perspective.
The only thing they need to be aware of is that if they don’t upgrade their wallets, then they won’t be able to use their account after the 3 or 4 months of the transition period.
On one hand it’s much better than any other migration, but on the other hand it’s not something which you usually do. But this is a one-time event in the network and we think it’s the least of all evil to achieve these important properties for the network.
One question from the audience is: can you give some sort of a gut feeling of what would be the throughput within two years’ time?
I have no exact figure, but I can say that the throughput is our top priority until it reaches at least 100 TPS. I want to hope, and this is also how I’m planning the roadmap, that it won’t take us two years to achieve that goal.
After we reach that point, improving the throughput will be closely related to the demand, so if we see that improving the throughput along with the transaction costs allows many applications to thrive, then it will again be our top priority.
By then, I expect that there will be other implementations of the sequencer not only done by StarkWare. We are anticipating that external teams will start projects the same way we have three teams working on full nodes for StarkNet.
We want to have more than one sequencer implementation because we think that’s healthy for the network because even the optimization will be decentralized. And I expect these teams to focus entirely on that so hopefully they will give us competition.
Very cool, competition is always great.
There is a community question that I feel is worth addressing: After re-genesis, will projects need to be audited again?
That’s a great question. Yes, when you port your code to Cairo 1.0, even though the logic remains the same, code auditing is still required. We are currently preparing auditing companies for a period of high demand because the duration of the transition period will be influenced by the speed at which applications complete their audits.
If, after 3 months, we observe that 20 projects have not completed their audits, we will extend the transition period. The good news is that just 4 months ago, we had 5 auditing companies actively conducting audits on StarkNet, and as of now, I’ve lost count. At StarkNetCC, I recently learned about 3 new auditing companies that are auditing projects, and they were previously unknown to me. Therefore, I have no concerns about the transition period.
It’s been a real pleasure and I think that we could talk for at least another hour. However every good thing has to come to an end.
To summarize what we talked about, the current performance of the network doesn’t need to discourage people. Since Tom said that in the next few weeks we will see a first boost in performance, followed by two more boosts in performance in the weeks following.
On top of that, the rewriting of the sequencer, moving that from Python to Rust will give us another significant increase in performance.
To be clear, increased TPS helps the liveness of the network, but for StarkNet, it also helps reduce the fees. With ZK rollups, the more transactions we have in a block, the lower the fees the network can charge per specific transactions which is which is amazing, because as the network scales, fees should go down.
So that’s in terms of performance and of the re-genesis process that will begin at the end of the year, and will take approximately 3–4 months. And the only thing end users will have to do is go to the wallet to update the contract. Everything will stay like it was from a user perspective, there won’t be any hassle and everyone can keep their assets and their addresses both on testnet and mainnet.