Solana and the 8 Key Innovations
Content:
1) Solana and Blockchains
2) Proof of History (PoH), Tower-BFT, Turbine, Gulf Stream, Archivers, Sealevel, Pipelining, Cloud Break
1) Solana and Blockchains
Solana is my favorite blockchain, even though it has some bad marketing strategies, design flaws and stalling problems. And why is that?
Because it approaches the scaling problem differently than existing blockchains. No sharding, no L2, no sidechain, no subnet.
Anatoly, the CEO of Solana Labs, has an argument for Solana that he constantly emphasizes, COMPOSABILITY. Doing all transactions on a single chain without splitting the database. They do this by increasing the hardware requirements, by scaling vertically.
This is a trade-off, a choice. Blockchains are all in beta. They are all very new, they all have different design preferences.
So is composability the only reason for vertical scaling? No, it is not.
Let’s take a look.
2) 8 Key Innovations
2.a) Proof of History (PoH) and Tower-BFT
PoH is not a consensus algorithm. It is a clock that runs before consensus.
In a decentralized system, nodes must reach a common decision about the state of the network as a result of the operations performed.
During this decision, various p2p communication paths are used (such as all to all messages or gossip protocol).
In pBFT consensus, the all to all method is used, all nodes communicate with each other. As the number of nodes increases, the amount of messages increases and the network slows down.
For this reason, the number of nodes is limited in networks with pBFT consensus.
Let’s talk about Tower-BFT.
tBFT is a consensus algorithm derived from pBFT. But with one difference, Proof of History. Proof of History is a verifiable delay function. A unique output is generated from a given input. Since SHA-256 is used, the hashing process is irreversible.
The basic concepts behind it are as follows:
→ This function takes time to run,
→ Output cannot be produced without executing this function and
→ The only way to check the correctness of the output is to rerun this function with the input.
These concepts guarantee the following.
If the correct output is produced for an input, a certain amount of time has passed to produce that output.
This is very important. This is the magic of Proof of History. In Solana, the output of each transaction generates the input for the next transaction.
Blocks also contain data such as transaction details. For this reason, you can’t just put the output of the previous transaction into the block and move the chain forward.
Due to the Fork Choice Rule, your block will be rejected and you will not receive a reward.
In Solana, the people who create blocks are called “Leaders” and the people who approve the blocks are called “Validators”.
Validators run the Verifiable Delay Function (VDF) to check the number of transactions in the block and the correctness of the output of the transactions.
Since each transaction contains the output of the previous transaction, there is no need for nodes to communicate with each other during this check. Thanks to the VDF, it is certain that some time elapses during the creation of the hash functions of the transactions.
Thanks to PoH and the VDF it provides, communication between nodes is minimized and the network does not have to be crammed into a limited number of nodes as in pBFT. Tower-BFT is an improved version of pBFT by eliminating latency and overhead with PoH.
Below we see a fork. Both Hash#4 are correct according to Hash#3, but only one will be accepted. The hashes here represent blocks. After the blocks are published to the network, validators vote for the blocks.
Each block has a vote stack and validators record their votes in this vote stack. At the end of the voting, the block with the larger vote stack is accepted and the validators who voted for that block are rewarded.
Assuming that the network will continue from the green block, validators who vote for blocks that continue from the orange block cannot vote for the original blocks and receive no reward. This incentivizes following the main chain.
tBFT favors liveness over certainty. However, it provides deterministic finality due to the voting system.
- — — — — — — — — — — — — — — — — -
tBFT Voting Algorithm
A new block is created and put to vote. I mentioned that a vote stack is created for this new block.
In the image below, the vote stack is an orange pyramid. The reason for the pyramid is that each vote has a different weight (validity) on the block.
The first vote has started and a validator has cast a vote for this block at time index 1.
The visual is updated as follows;
Expiry Time is the time when votes sent to the stack are removed from the stack along with all votes sent after it (along with votes with a higher vote index).
We calculate Expiry Time with the formula “Vote time + Vote stack at time X”.
For the first vote, the vote weight is 2. The vote weight increases with the formula 2^n as you go down the pyramid. Older votes are more weighted (valid).
1 (vote time) + 2 (vote weight) = 3 (expiry time)
One vote can be cast in each slot (block).
After 400ms, when a new block is added, another vote is added for our block.
The new vote is placed at the top of the pyramid because it is newer. The old vote shifts to the bottom and its weight increases.
We are now familiar with the mathematical flow.
The 3rd vote is cast when “Vote Time = 5” (i.e. 5 blocks after the block was published to the network).
Since the vote with index 2 expired (Expiry Time was 4, we voted at time 5), the vote is reset and deleted from the stack.
The vote with index 1 survives because time 5 has not yet passed.
Since the 3rd vote was cast after 5 blocks and the vote weight is 2 because it is at the top of the pyramid, the expiry time is 5+2=7.
When a new vote for the block is cast, the vote with Index 1 is also deleted. When a vote with index 1 is deleted, it also deletes votes with a lighter weight (above it in the pyramid).
At Vote Time = 6 (i.e. 6 new blocks after the block was submitted for approval), the vote with index 1 is deleted. Votes above the vote with index 1 (in our scheme, the vote with index 3) are also deleted.
There is now only one vote in the stack.
Vote Index =4
Vote Time = 6
Vote weight = 2
Expiry Time = 6+2 = 8.
Voting continues like this.
Now let’s go to Vote Time = 18. The vote at Time=18 has not been cast yet. The scheme will look like this:
When Time = 19;
As validators cast more votes, the stack grows and the validity period of old votes increases. When the stack reaches 32, the oldest votes have a lock time of 54 years. This means that this vote will most likely never be reversed because new blocks are constantly being generated and it becomes difficult to change the old vote.
That’s the voting system. This was a specific vote. We did not allow voting from Time=2 to Time=5 to better understand the system.
- — — — — — — — — — — — — — — — — -
Solana creates a block in 400 milliseconds. Once a block is created, it can be undone within the first 400ms before the first vote. Because there is no vote on it.
tBFT does not wait for the current vote to end for the next vote. As I mentioned above, there is an exponential increase in voting power, which keeps the network alive. Also, when 2/3 of validators cast valid votes, the block is fully approved and cannot be reversed.
What makes tBFT asynchronous is again PoH. Nodes can calculate the delay of other nodes without any p2p communication.
2.b) Turbine, Gulf Stream and Archivers
As new nodes are added to the network, the distribution of new blocks to nodes requires more time. Consider a 64MB block of 250,000 transactions, with each transaction consisting of 256 bytes. Let the current leader in the network distribute this chunk to 1000 validators.
The simplest approach is for the leader to send this 64MB of data to the 1000 validators one by one. But how efficient is this for a network like Solana, which outputs a block every 400ms?
Zero efficiency.
Solana solves the block distribution with Turbine.
For a 64MB block, the current leader divides it into 1000 equal chunks of 64KB. It starts distributing each small piece of data to validators, with randomized roadmaps between validators generated by Turbine.
Validators send these small chunks to “neighborhood” validators and receive the chunks sent by them.
A “Neighborhood” structure is a group of validators in which validators fall in with other validators on the random path generated by Turbine.
In this case, you can think of the network as a tree structure consisting of 1000 validators in a group of “Neighbors”. Each validator in the “Neighbor” group is responsible for transmitting the small data received from the leader to the other validators in the same group.
Neighbor groups also transfer data among themselves. In the images, Neighbor groups consist of 2 validators each. However, this is determined by Turbine with the DATA_PLANE_OUT parameter, which specifies the number of validators in the group according to the total number of validator nodes in the network.
In a network with DATA_PLANE_OUT=200, the leader can reach 40,000 (200*200) validators in 2 hops on the resulting tree. Assuming that each connection is resolved in 100ms on average, this deployment takes 200ms on average. Half the block time.
Such a rapid deployment raises security issues. Malicious nodes may choose not to share their data at all or share it incorrectly.
For such cases, the Leader generates a Read-Solomon Erasure code.
With the erasure code, the validator, which obtains a certain amount of data according to the erasure code rate without the need for the entire data, can reconstruct the entire block.
Suppose 3 out of 10 small data are published in the form of erasure code.
Even if 3 of the remaining 7 are erroneous, the validator can reconstruct the correct data. The erasure code rate here is 30%, but the current leader makes an assumption based on previous blocks in the network and adjusts it accordingly.
Well, we print a block in 400ms with PoH and tBFT without the need for time confirmation and distribute it like an engine with Turbine. But for that much speed, do I as a user have to wait for my turn in the network in mempool?
No, I don’t.
Gulf Stream is a technology developed by Solana to avoid using mempool.
Let’s talk about mempool briefly. A mempool is a pool where transfers that have not yet been processed are waiting. The mempool size depends on the supply and demand of the network’s block size. When more people want to use the network (demand increases), more transactions are sent to the network and the mempool size (supply) increases. As the mempool size increases, the time to write the transaction to the network increases. This causes slowness in the network.
Solana overcomes this with Gulf Stream.
On Solana, it is clear who will be the leader until a certain future date. The current leader passes on to the next leader any transfer that comes into the network that it cannot compress into its block. The next leader processes this transfer before its time.
This reduces the load on the current leader and results in faster transaction confirmation. To validate untimely transactions, validators select a fully validated block and reference its PoH Hash.
Let’s say a transaction is sent by a leader to the next leader for processing, and right after this event, the block whose turn it was to be confirmed is rejected. Then the next leader to whom the transaction was sent would ignore it and not process it.
Gulf Stream has two privileges in maintaining Solana’s speed during overload;
1-> Validators know in advance and reject invalid transactions,
2 -> Transactions are “quickly” passed to the next validator according to the PoH hash.
This is important for a network that prints a block every 400ms.
We print and distribute blocks like an engine. And the rapidly growing data?
Archivers are coming to the rescue.
The network generates 4PB (4000TB) of data per year. If we distribute this data across the entire network, the network would become increasingly centralized.
To prevent this, Solana uses a customized version of Proof of Replication (PoRep) announced by Filecoin.
PoRep is simply a system where a prover defends a verifiable claim that it has allocated unique resources to store one or more copies of data files.
Let’s give an example.
You want to store a large amount of data and you go to the storage facility and present an agreement with the storage providers (Archivers). If they have space, they will contact you and they will take your documents continuously up to their full capacity.
You ask the providers (Archivers) for proof that they are storing your documents (data on the network) and they send you a proof of storage (complete PoRep for Solana). The Archivers say they have X bytes of free space and the data on the network is shredded and distributed to them.
This way validators work with a lighter ledger without keeping all the network data. Archivers do not participate in consensus and do not build computers that require as much hardware as validators. They receive a 3% inflation reward for each proof of storage they send.
2.c) Sealevel, Pipelining and Cloud Break
Smart contracts are programs on the blockchain, developers write the code and deploy it on the chain (make it accessible) and users use (call) it. The EVM is a virtual machine where these programs run.
EVM is a single-core virtual machine, transactions are processed one by one. Sealevel (Solana’s runtime for smart contracts), on the other hand, leverages the power of multiple cores to perform parallel processing.
Solana processes are composed of instructions.
Each instruction contains the ID of the program it is running (program ID), the program instruction, and a list of accounts that the process needs to write or read.
This allows Sealevel to sort and classify transactions that read the same data from accounts and write the same state to the same accounts without conflicts.
Each instruction tells Sealevel which accounts it wants to read and write to in advance, and Sealevel receives and prepares this data in advance. This allows Sealevel to organize transactions at the software level for initial optimization, but the real work is done at the hardware level.
Let tons of transactions be organized and divided into different groups.
Suppose the same instruction from a smart contract is called by 1000 different transactions. A SIMD CPU (Single instruction, multiple data) can execute one instruction on multiple data sources.
So the next optimization is to sort all instructions in a group by program IDs and run the program on all data streams with all idle cores on the CPU.
SIMD implementation is also found on GPUs, and in Solana the nodes use GPUs.
A modern Nvidia GPU has 4000 cores and about 50 multiprocessors. A multiprocessor can execute only one program instruction at a time, but with 4000 of them, it can execute 4000/50=80+ different inputs in parallel.
Let’s return to the image classified by program ID.
The GPU can run the smart contract simultaneously on all data streams specified by the instructions. There are some trade-offs for SIMD to work.
Instructions should be divided into a small number of branches (X,Y,Z branches) and all of them should be aggregated into a single group.
With Gulf Stream, I mentioned that transactions are passed to the next leaders without being put on hold. This process is optimized with Pipelining.
Let’s imagine a restaurant kitchen. There is a constant flow and busyness. The food preparation process consists of the following steps;
- Wash the dirty dishes
- Cook food
- Prepare a serving plate
- Offer to the customer
These operations are done simultaneously, not one by one in sequence, so as not to disrupt the process flow.
This is how Solana works.
For the leader node, there are 4 stages of a transaction to be processed;
1-> Fetching — It is like the washing part of the kitchen example. It receives transactions sent by clients (CLI, wallets, etc.) and forwards them to the next node.
2-> Signature Verification — This is the cooking part of the kitchen example. It verifies whether the account sending the transaction has signed the transaction.
3-> Banking — It is like the serving plate preparation part. It transfers balances between accounts associated with the transaction and adds the hash created with PoH to the new transaction.
4 -> Broadcasting — Equivalent to the presentation part. The leader broadcasts the block to which he added the transactions.
To process transactions quickly, each stage is done by a different compute unit.
Kernel Space -> Fetching and Broadcasting
CPU -> Banking
GPU -> Signature Confirmation
In the kitchen example, cooking is the slowest part. For Solana, the process signature validation, which corresponds to cooking, is the slowest part because it requires a lot of computation and time. For this reason, this process is assigned to the GPU.
There are two Pipelined compute networks for a node;
1-> Transaction process unit (TPU) when the node takes the Leader task,
2 -> Transaction validation unit (TVU) when the node takes the Validator task.
The hardware requirements for TPU and TVU are the same, but the transactions are different.
Sharding is a method used to increase the scalability of the network, Near is currently using sharding in the Ethereum roadmap. The network’s database is split. The load of transactions is spread across smaller chunks (chains).
Cloud Break is a data structure that makes it possible to read data from and write data to the database at the same time.
It uses memory-mapped files and supports modern SSDs, allowing simultaneous reads and writes between 32 threads.
It was an article where I collected the floods I shared on Twitter. There is not a huge difference. It’s better to keep it together.
Twitter: @blockofchain
Enjoy your reading. See you in the next articles.
References: