Architecture of the Solana Validator

Architecture of the Solana Validator

Clusters

Overview

Overview of a Solana Cluster

A Solana cluster is a set of validators working together to serve client transactions and maintain the integrity of the ledger. Many clusters may coexist. When two clusters share a common genesis block, they attempt to converge. Otherwise, they simply ignore the existence of the other. Transactions sent to the wrong one are quietly rejected. In this section, we’ll discuss how a cluster is created, how nodes join the cluster, how they share the ledger, how they ensure the ledger is replicated, and how they cope with buggy and malicious nodes.

Solana 集群概述
Solana 集群是一组协同工作的验证器,为客户端交易提供服务并维护账本的完整性。许多集群可能共存。当两个集群共享一个共同的创世块时,它们会尝试聚合。否则,他们就会忽视对方的存在。发送给错误的交易会被悄悄拒绝。在本节中,我们将讨论如何创建集群、节点如何加入集群、如何共享账本、如何确保账本被复制以及如何应对有问题和恶意的节点。

Creating a Cluster​

Before starting any validators, one first needs to create a genesis config. The config references two public keys, a mint and a bootstrap validator. The validator holding the bootstrap validator’s private key is responsible for appending the first entries to the ledger. It initializes its internal state with the mint’s account. That account will hold the number of native tokens defined by the genesis config. The second validator then contacts the bootstrap validator to register as a validator. Additional validators then register with any registered member of the cluster.

A validator receives all entries from the leader and submits votes confirming those entries are valid. After voting, the validator is expected to store those entries. Once the validator observes a sufficient number of copies exist, it deletes its copy.

在启动任何验证器之前,首先需要创建一个创世配置。该配置引用两个公钥,一个铸币厂和一个引导验证器。持有引导验证器私钥的验证器负责将第一个条目附加到分类帐中。它使用铸币厂的帐户初始化其内部状态。该帐户将保存由创世配置定义的本机令牌数量。然后,第二个验证器联系引导验证器以注册为验证器。然后,其他验证者向集群的任何注册成员注册。

验证者接收来自领导者的所有条目,并提交投票以确认这些条目有效。投票后,验证者应存储这些条目。一旦验证器观察到存在足够数量的副本,它就会删除其副本。

Joining a Cluster​

Validators enter the cluster via registration messages sent to its control plane. The control plane is implemented using a gossip protocol, meaning that a node may register with any existing node, and expect its registration to propagate to all nodes in the cluster. The time it takes for all nodes to synchronize is proportional to the square of the number of nodes participating in the cluster. Algorithmically, that’s considered very slow, but in exchange for that time, a node is assured that it eventually has all the same information as every other node, and that information cannot be censored by any one node.

验证器通过发送到其控制平面的注册消息进入集群。控制平面是使用八卦协议实现的,这意味着节点可以向任何现有节点注册,并期望其注册传播到集群中的所有节点。所有节点同步所需的时间与参与集群的节点数的平方成正比。从算法上来说,这被认为是非常慢的,但作为这段时间的交换,节点可以确保它最终拥有与所有其他节点相同的信息,并且该信息不能被任何一个节点审查。

Sending Transactions to a Cluster​

Clients send transactions to any validator’s Transaction Processing Unit (TPU) port. If the node is in the validator role, it forwards the transaction to the designated leader. If in the leader role, the node bundles incoming transactions, timestamps them creating an entry, and pushes them onto the cluster’s data plane. Once on the data plane, the transactions are validated by validator nodes, effectively appending them to the ledger.

客户端将交易发送到任何验证器的交易处理单元(TPU)端口。如果节点处于验证者角色,它将事务转发给指定的领导者。如果处于领导者角色,节点会捆绑传入的事务,为它们添加时间戳以创建条目,并将它们推送到集群的数据平面上。一旦进入数据平面,交易就会由验证器节点进行验证,从而有效地将它们附加到分类账中。

Confirming Transactions​

A Solana cluster is capable of subsecond confirmation for thousands of nodes with plans to scale up to hundreds of thousands of nodes. Confirmation times are expected to increase only with the logarithm of the number of validators, where the logarithm’s base is very high. If the base is one thousand, for example, it means that for the first thousand nodes, confirmation will be the duration of three network hops plus the time it takes the slowest validator of a supermajority to vote. For the next million nodes, confirmation increases by only one network hop.

Solana 集群能够对数千个节点进行亚秒级确认,并计划扩展到数百个节点数千个节点。预计确认时间只会随着验证者数量的对数而增加,其中对数的底数非常高。例如,如果基数为 1000,则意味着对于前 1000 个节点,确认将是三个网络跳的持续时间加上绝大多数验证者中最慢的验证者投票所需的时间。对于接下来的一百万个节点,确认仅增加一个网络跃点。

Solana defines confirmation as the duration of time from when the leader timestamps a new entry to the moment when it recognizes a supermajority of ledger votes.

Scalable confirmation can be achieved using the following combination of techniques:

  1. Timestamp transactions with a VDF sample and sign the timestamp.
  2. Split the transactions into batches, send each to separate nodes and have each node share its batch with its peers.
  3. Repeat the previous step recursively until all nodes have all batches.

Solana 将确认定义为从领导者为新条目添加时间戳到识别绝大多数账本投票那一刻的持续时间。
可扩展确认可以使用以下技术组合来实现:

  1. 使用 VDF 样本对事务进行时间戳记并签署时间戳。
  2. 将事务拆分为批次,将每个事务发送到单独的节点,并使每个节点与其对等节点共享其批次。
  3. 递归地重复上一步,直到所有节点都拥有所有批次。

Solana rotates leaders at fixed intervals, called slots. Each leader may only produce entries during its allotted slot. The leader therefore timestamps transactions so that validators may lookup the public key of the designated leader. The leader then signs the timestamp so that a validator may verify the signature, proving the signer is owner of the designated leader’s public key.

Solana 以固定间隔(称为槽)轮换领导者。每个领导者只能在其分配的时段内产生条目。因此,领导者会为交易添加时间戳,以便验证者可以查找指定领导者的公钥。然后,领导者对时间戳进行签名,以便验证者可以验证签名,证明签名者是指定领导者公钥的所有者。

Next, transactions are broken into batches so that a node can send transactions to multiple parties without making multiple copies. If, for example, the leader needed to send 60 transactions to 6 nodes, it would break that collection of 60 into batches of 10 transactions and send one to each node. This allows the leader to put 60 transactions on the wire, not 60 transactions for each node. Each node then shares its batch with its peers. Once the node has collected all 6 batches, it reconstructs the original set of 60 transactions.

接下来,交易被分成批次,以便节点可以将交易发送给多方,而无需进行多次交易。副本。例如,如果领导者需要向 6 个节点发送 60 个交易,它会将这 60 个交易分成 10 个交易批次,并向每个节点发送一个。这允许领导者在线上放置 60 个交易,而不是每个节点 60 个交易。然后,每个节点与其对等节点共享其批次。一旦节点收集了所有 6 个批次,它就会重建原始的 60 个交易集。

A batch of transactions can only be split so many times before it is so small that header information becomes the primary consumer of network bandwidth. At the time of this writing (December, 2021), the approach is scaling well up to about 1,250 validators. To scale up to hundreds of thousands of validators, each node can apply the same technique as the leader node to another set of nodes of equal size. We call the technique Turbine Block Propagation.

一批交易只能被分割多次,否则它会变得很小,以至于标头信息成为网络带宽的主要消耗者。截至撰写本文时(2021 年 12 月),该方法已扩展到约 1,250 个验证者。为了扩展到数十万个验证器,每个节点可以将与领导节点相同的技术应用于另一组相同大小的节点。我们将该技术称为涡轮块传播。

Solana Clusters

Available Solana Clusters

Solana maintains several different clusters with different purposes.
Before you begin make sure you have first installed the Solana command line tools

Explorers:

http://explorer.solana.com/.
http://solanabeach.io/.

Devnet

Devnet serves as a playground for anyone who wants to take Solana for a test drive, as a user, token holder, app developer, or validator.
Application developers should target Devnet.
Potential validators should first target Devnet.
Key differences between Devnet and Mainnet Beta:
Devnet tokens are not real
Devnet includes a token faucet for airdrops for application testing
Devnet may be subject to ledger resets
Devnet typically runs the same software release branch version as Mainnet Beta, but may run a newer minor release version than Mainnet Beta.
Gossip entrypoint for Devnet: entrypoint.devnet.solana.com:8001
Metrics environment variable for Devnet:

export SOLANA_METRICS_CONFIG="host=https://metrics.solana.com:8086,db=devnet,u=scratch_writer,p=topsecret"

RPC URL for Devnet: https://api.devnet.solana.com

Example solana command-line configuration

solana config set --url https://api.devnet.solana.com

Example solana-validator command-line

$ solana-validator \
    --identity validator-keypair.json \
    --vote-account vote-account-keypair.json \
    --known-validator dv1ZAGvdsz5hHLwWXsVnM94hWf1pjbKVau1QVkaMJ92 \
    --known-validator dv2eQHeP4RFrJZ6UeiZWoc3XTtmtZCUKxxCApCDcRNV \
    --known-validator dv4ACNkpYPcE3aKmYDqZm9G5EB3J4MRoeE7WNDRBVJB \
    --known-validator dv3qDFk1DTF36Z62bNvrCXe9sKATA6xvVy6A798xxAS \
    --only-known-rpc \
    --ledger ledger \
    --rpc-port 8899 \
    --dynamic-port-range 8000-8020 \
    --entrypoint entrypoint.devnet.solana.com:8001 \
    --entrypoint entrypoint2.devnet.solana.com:8001 \
    --entrypoint entrypoint3.devnet.solana.com:8001 \
    --entrypoint entrypoint4.devnet.solana.com:8001 \
    --entrypoint entrypoint5.devnet.solana.com:8001 \
    --expected-genesis-hash EtWTRABZaYq6iMfeYKouRu166VU2xqa1wcaWoxPkrZBG \
    --wal-recovery-mode skip_any_corrupted_record \
    --limit-ledger-size

The --known-validators are operated by Solana Labs

Testnet

Testnet is where the Solana core contributors stress test recent release features on a live cluster, particularly focused on network performance, stability and validator behavior.
Testnet tokens are not real
Testnet may be subject to ledger resets.
Testnet includes a token faucet for airdrops for application testing
Testnet typically runs a newer software release branch than both Devnet and Mainnet Beta
Gossip entrypoint for Testnet: entrypoint.testnet.solana.com:8001
Metrics environment variable for Testnet:

export SOLANA_METRICS_CONFIG="host=https://metrics.solana.com:8086,db=tds,u=testnet_write,p=c4fa841aa918bf8274e3e2a44d77568d9861b3ea"

RPC URL for Testnet: https://api.testnet.solana.com
Example solana command-line configuration

solana config set --url https://api.testnet.solana.com

Example solana-validator command-line

$ solana-validator \
    --identity validator-keypair.json \
    --vote-account vote-account-keypair.json \
    --known-validator 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on \
    --known-validator dDzy5SR3AXdYWVqbDEkVFdvSPCtS9ihF5kJkHCtXoFs \
    --known-validator Ft5fbkqNa76vnsjYNwjDZUXoTWpP7VYm3mtsaQckQADN \
    --known-validator eoKpUABi59aT4rR9HGS3LcMecfut9x7zJyodWWP43YQ \
    --known-validator 9QxCLckBiJc783jnMvXZubK4wH86Eqqvashtrwvcsgkv \
    --only-known-rpc \
    --ledger ledger \
    --rpc-port 8899 \
    --dynamic-port-range 8000-8020 \
    --entrypoint entrypoint.testnet.solana.com:8001 \
    --entrypoint entrypoint2.testnet.solana.com:8001 \
    --entrypoint entrypoint3.testnet.solana.com:8001 \
    --expected-genesis-hash 4uhcVJyU9pJkvQyS88uRDiswHXSCkY3zQawwpjk2NsNY \
    --wal-recovery-mode skip_any_corrupted_record \
    --limit-ledger-size

The identities of the --known-validators are:

5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on - Solana Labs
dDzy5SR3AXdYWVqbDEkVFdvSPCtS9ihF5kJkHCtXoFs - MonkeDAO
Ft5fbkqNa76vnsjYNwjDZUXoTWpP7VYm3mtsaQckQADN - Certus One
eoKpUABi59aT4rR9HGS3LcMecfut9x7zJyodWWP43YQ - SerGo
9QxCLckBiJc783jnMvXZubK4wH86Eqqvashtrwvcsgkv - Algo|Stake

Mainnet Beta

A permissionless, persistent cluster for Solana users, builders, validators and token holders.
Tokens that are issued on Mainnet Beta are real SOL
Gossip entrypoint for Mainnet Beta: entrypoint.mainnet-beta.solana.com:8001
Metrics environment variable for Mainnet Beta:

export SOLANA_METRICS_CONFIG="host=https://metrics.solana.com:8086,db=mainnet-beta,u=mainnet-beta_write,p=password"

RPC URL for Mainnet Beta: https://api.mainnet-beta.solana.com
Example solana command-line configuration

solana config set --url https://api.mainnet-beta.solana.com

Example solana-validator command-line

$ solana-validator \
    --identity ~/validator-keypair.json \
    --vote-account ~/vote-account-keypair.json \
    --known-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 \
    --known-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ \
    --known-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ \
    --known-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S \
    --only-known-rpc \
    --ledger ledger \
    --rpc-port 8899 \
    --private-rpc \
    --dynamic-port-range 8000-8020 \
    --entrypoint entrypoint.mainnet-beta.solana.com:8001 \
    --entrypoint entrypoint2.mainnet-beta.solana.com:8001 \
    --entrypoint entrypoint3.mainnet-beta.solana.com:8001 \
    --entrypoint entrypoint4.mainnet-beta.solana.com:8001 \
    --entrypoint entrypoint5.mainnet-beta.solana.com:8001 \
    --expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d \
    --wal-recovery-mode skip_any_corrupted_record \
    --limit-ledger-size

Benchmark a Cluster

The Solana git repository contains all the scripts you might need to spin up your own local testnet. Depending on what you’re looking to achieve, you may want to run a different variation, as the full-fledged, performance-enhanced multinode testnet is considerably more complex to set up than a Rust-only, singlenode testnode. If you are looking to develop high-level features, such as experimenting with smart contracts, save yourself some setup headaches and stick to the Rust-only singlenode demo. If you’re doing performance optimization of the transaction pipeline, consider the enhanced singlenode demo. If you’re doing consensus work, you’ll need at least a Rust-only multinode demo. If you want to reproduce our TPS metrics, run the enhanced multinode demo.

For all four variations, you’d need the latest Rust toolchain and the Solana source code:

First, setup Rust, Cargo and system packages as described in the Solana README

Now checkout the code from github:

git clone https://github.com/solana-labs/solana.git
cd solana

Solana git 存储库包含启动您自己的本地测试网可能需要的所有脚本。根据您想要实现的目标,您可能需要运行不同的变体,因为成熟的、性能增强的多节点测试网的设置比仅使用 Rust 的单节点测试节点要复杂得多。如果您希望开发高级功能,例如尝试智能合约,请避免一些设置麻烦并坚持使用仅 Rust 的单节点演示。如果您正在对事务管道进行性能优化,请考虑增强的单节点演示。如果您正在进行共识工作,您至少需要一个仅 Rust 的多节点演示。如果您想重现我们的 TPS 指标,请运行增强型多节点演示。
对于所有四种变体,您需要最新的 Rust 工具链和 Solana 源代码:
首先,将 Rust、Cargo 和系统包设置为Solana README 中进行了描述

The demo code is sometimes broken between releases as we add new low-level features, so if this is your first time running the demo, you’ll improve your odds of success if you check out the latest release before proceeding:

当我们添加新的低级功能时,演示代码有时会在版本之间被破坏,因此,如果这是您第一次运行演示,如果您在继续之前查看最新版本,您将提高成功的几率:

TAG=$(git describe --tags $(git rev-list --tags --max-count=1))
git checkout $TAG

Configuration Setup

Ensure important programs such as the vote program are built before any nodes are started. Note that we are using the release build here for good performance. If you want the debug build, use just cargo build and omit the NDEBUG=1 part of the command.

确保在启动任何节点之前构建重要程序(例如投票程序)。请注意,我们在这里使用发布版本以获得良好的性能。如果您想要调试版本,请仅使用货物构建并省略命令的 NDEBUG=1 部分。

Faucet

In order for the validators and clients to work, we’ll need to spin up a faucet to give out some test tokens. The faucet delivers Milton Friedman-style “air drops” (free tokens to requesting clients) to be used in test transactions.

为了使验证器和客户端正常工作,我们需要启动水龙头来发出一些测试令牌。水龙头提供米尔顿·弗里德曼式的“空投”(向请求客户提供免费代币)以用于测试交易。
通过以下方式启动水龙头:

Start the faucet with:

NDEBUG=1 ./multinode-demo/faucet.sh

Singlenode Testnet

Before you start a validator, make sure you know the IP address of the machine you want to be the bootstrap validator for the demo, and make sure that udp ports 8000-10000 are open on all the machines you want to test with.

在启动验证器之前,请确保您知道要用作演示的引导验证器的计算机的 IP 地址,并确保在要测试的所有计算机上打开 udp 端口​​ 8000-10000。
现在在单独的 shell 中启动引导验证器:

Now start the bootstrap validator in a separate shell:

NDEBUG=1 ./multinode-demo/bootstrap-validator.sh

Wait a few seconds for the server to initialize. It will print “leader ready…” when it’s ready to receive transactions. The leader will request some tokens from the faucet if it doesn’t have any. The faucet does not need to be running for subsequent leader starts.

等待几秒钟,让服务器初始化。当它准备好接收交易时,它将打印“leader Ready…”。如果水龙头没有令牌,领导者会向水龙头请求一些令牌。后续领导启动时不需要运行水龙头。

Multinode Testnet

To run a multinode testnet, after starting a leader node, spin up some additional validators in separate shells:

NDEBUG=1 ./multinode-demo/validator-x.sh

To run a performance-enhanced validator on Linux, CUDA 10.0 must be installed on your system:

./fetch-perf-libs.sh
NDEBUG=1 SOLANA_CUDA=1 ./multinode-demo/bootstrap-validator.sh
NDEBUG=1 SOLANA_CUDA=1 ./multinode-demo/validator.sh

Testnet Client Demo

Now that your singlenode or multinode testnet is up and running let’s send it some transactions!

In a separate shell start the client:

NDEBUG=1 ./multinode-demo/bench-tps.sh # runs against localhost by default

What just happened? The client demo spins up several threads to send 500,000 transactions to the testnet as quickly as it can. The client then pings the testnet periodically to see how many transactions it processed in that time. Take note that the demo intentionally floods the network with UDP packets, such that the network will almost certainly drop a bunch of them. This ensures the testnet has an opportunity to reach 710k TPS. The client demo completes after it has convinced itself the testnet won’t process any additional transactions. You should see several TPS measurements printed to the screen. In the multinode variation, you’ll see TPS measurements for each validator node as well.

刚刚发生了什么?客户端演示启动多个线程,以尽快将 500,000 个交易发送到测试网。然后,客户端定期对测试网进行 ping 操作,以查看该时间段内处理了多少笔交易。请注意,该演示故意用 UDP 数据包淹没网络,这样网络几乎肯定会丢弃一堆数据包。这确保了测试网有机会达到 710k TPS。客户端演示在确信测试网不会处理任何其他交易后完成。您应该会看到屏幕上打印出几个 TPS 测量值。在多节点变体中,您还将看到每个验证器节点的 TPS 测量值。

Testnet Debugging
There are some useful debug messages in the code, you can enable them on a per-module and per-level basis. Before running a leader or validator set the normal RUST_LOG environment variable.
For example
To enable info everywhere and debug only in the solana::banking_stage module:

export RUST_LOG=solana=info,solana::banking_stage=debug

To enable SBF program logging:

export RUST_LOG=solana_bpf_loader=trace

Generally we are using debug for infrequent debug messages, trace for potentially frequent messages and info for performance-related logging.

You can also attach to a running process with GDB. The leader’s process is named solana-validator:

sudo gdb
attach <PID>
set logging on
thread apply all bt

This will dump all the threads stack traces into gdb.txt

Developer Testnet

In this example the client connects to our public testnet. To run validators on the testnet you would need to open udp ports 8000-10000.

NDEBUG=1 ./multinode-demo/bench-tps.sh --entrypoint entrypoint.devnet.solana.com:8001 --faucet api.devnet.solana.com:9900 --duration 60 --tx_count 50

You can observe the effects of your client’s transactions on our metrics dashboard

Performance Metrice

Solana Cluster Performance Metrics
Solana cluster performance is measured as average number of transactions per second that the network can sustain (TPS). And, how long it takes for a transaction to be confirmed by super majority of the cluster (Confirmation Time).
Each cluster node maintains various counters that are incremented on certain events. These counters are periodically uploaded to a cloud based database. Solana’s metrics dashboard fetches these counters, and computes the performance metrics and displays it on the dashboard.
TPS​
Each node’s bank runtime maintains a count of transactions that it has processed. The dashboard first calculates the median count of transactions across all metrics enabled nodes in the cluster. The median cluster transaction count is then averaged over a 2 second period and displayed in the TPS time series graph. The dashboard also shows the Mean TPS, Max TPS and Total Transaction Count stats which are all calculated from the median transaction count.
Confirmation Time​
Each validator node maintains a list of active ledger forks that are visible to the node. A fork is considered to be frozen when the node has received and processed all entries corresponding to the fork. A fork is considered to be confirmed when it receives cumulative super majority vote, and when one of its children forks is frozen.
The node assigns a timestamp to every new fork, and computes the time it took to confirm the fork. This time is reflected as validator confirmation time in performance metrics. The performance dashboard displays the average of each validator node’s confirmation time as a time series graph.
Hardware setup​
The validator software is deployed to GCP n1-standard-16 instances with 1TB pd-ssd disk, and 2x Nvidia V100 GPUs. These are deployed in the us-west-1 region.
solana-bench-tps is started after the network converges from a client machine with n1-standard-16 CPU-only instance with the following arguments: --tx_count=50000 --thread-batch-sleep 1000
TPS and confirmation metrics are captured from the dashboard numbers over a 5 minute average of when the bench-tps transfer stage begins.
Solana 集群性能指标
Solana 集群性能以网络可维持的每秒平均事务数 (TPS) 来衡量。并且,事务需要多长时间才能被集群中的绝大多数成员确认(确认时间)。
每个集群节点都维护着各种计数器,这些计数器会在某些事件发生时递增。这些计数器定期上传到基于云的数据库。 Solana 的指标仪表板获取这些计数器,计算性能指标并将其显示在仪表板上。
TPS​
每个节点的银行运行时都会维护其已处理的事务计数。仪表板首先计算集群中所有启用指标的节点的事务中位数。然后,对 2 秒时间段内的集群事务计数中值进行平均,并显示在 TPS 时间序列图中。仪表板还显示平均 TPS、最大 TPS 和总交易计数统计信息,这些统计数据都是根据交易计数中位数计算得出的。
确认时间​
每个验证器节点都维护一个对该节点可见的活动账本分叉列表。当节点接收并处理了与分叉对应的所有条目时,该分叉被认为被冻结。当分叉收到累积超级多数投票,并且其子分叉之一被冻结时,该分叉被视为已确认。
节点为每个新分叉分配一个时间戳,并计算确认分叉所需的时间。该时间在性能指标中反映为验证者确认时间。性能仪表板以时间序列图的形式显示每个验证器节点确认时间的平均值。
硬件设置​
验证器软件部署到具有 1TB pd-ssd 磁盘和 2 个 Nvidia 的 GCP n1-standard-16 实例V100 GPU。这些部署在 us-west-1 区域。
solana-bench-tps 在网络从具有 n1-standard-16 仅 CPU 实例的客户端计算机收敛后启动,参数如下:–tx_count =50000 --thread-batch-sleep 1000
TPS 和确认指标是在 bench-tps 传输阶段开始时平均 5 分钟内从仪表板数字中捕获的。

Consensus

Solana Commitment Status

The commitment metric gives clients a standard measure of the network confirmation for the block. Clients can then use this information to derive their own measures of commitment.
There are three specific commitment statuses:

  1. Processed
  2. Confirmed
  3. Finalized
    在这里插入图片描述

Fork Generation

The Solana protocol doesn’t wait for all validators to agree on a newly produced block before the next block is produced. Because of that, it’s quite common for two different blocks to be chained to the same parent block. In those situations, we call each conflicting chain a “fork.”

Solana 协议不会等待所有验证者就新生成的块达成一致,然后再生成下一个块。因此,两个不同的块链接到同一个父块是很常见的。在这些情况下,我们将每个冲突链称为“分叉”。

Solana validators need to vote on one of these forks and reach agreement on which one to use through a consensus algorithm (that is beyond the scope of this article). The main point you need to remember is that when there are competing forks, only one fork will be finalized by the cluster and the abandoned blocks in competing forks are all discarded.

Solana 验证者需要对其中一个分叉进行投票,并通过共识算法就使用哪一个分叉达成一致(这超出了本文的范围)。需要记住的要点是,当存在竞争分叉时,集群只会最终确定一个分叉,并且竞争分叉中的废弃块将全部被丢弃。

This section describes how forks naturally occur as a consequence of leader rotation.

本节描述了领导者如何自然产生分叉回转。

Overview

Nodes take turns being leader and generating the PoH that encodes state changes. The cluster can tolerate loss of connection to any leader by synthesizing what the leader would have generated had it been connected but not ingesting any state changes.

节点轮流成为领导者并生成编码状态变化的 PoH。集群可以通过综合领导者在连接时生成的内容来容忍与任何领导者的连接丢失,但不吸收任何状态更改。

The possible number of forks is thereby limited to a “there/not-there” skip list of forks that may arise on leader rotation slot boundaries. At any given slot, only a single leader’s transactions will be accepted.

因此,分叉的可能数量被限制为可能出现在引导者旋转时隙边界上的分叉的“那里/不那里”跳跃列表。在任何给定的槽中,仅接受单个领导者的交易。

Forking example

The table below illustrates what competing forks could look like. Time progresses from left to right and each slot is assigned to a validator that temporarily becomes the cluster “leader” and may produce a block for that slot.

下表说明了竞争的分叉可能是什么样子。时间从左到右进行,每个时隙被分配给一个验证者,该验证者暂时成为集群“领导者”,并可能为该时隙生成一个块。

In this example, the leader for slot 3 chose to chain its “Block 3” directly to “Block 1” and in doing so skipped “Block 2”. Similarly, the leader for slot 5 chose to chain “Block 5” directly to “Block 3” and skipped “Block 4”.

在此示例中,时隙 3 的领导者选择链接其“区块 3” ”直接转到“块 1”,并跳过“块 2”。类似地,槽 5 的领导者选择将“块 5”直接链接到“块 3”并跳过“块 4”。

Note that across different forks, the block produced for a given slot is always the same because producing two different blocks for the same slot is a slashable offense. So the conflicting forks above can be distinguished from each other by which slots they have skipped.
请注意,在不同的分叉上,为给定槽生成的块始终相同,因为生成两个同一位置的不同块是可削减的进攻。因此,上面的冲突分叉可以通过它们跳过的槽来区分。

在这里插入图片描述

Message Flow

  1. Transactions are ingested by the current leader.
  2. Leader filters valid transactions.
  3. Leader executes valid transactions updating its state.
  4. Leader packages transactions into entries based off its current PoH slot.
  5. Leader transmits the entries to validator nodes (in signed shreds)
    5.1 The PoH stream includes ticks; empty entries that indicate liveness of the leader and the passage of time on the cluster.
    5.2 A leader’s stream begins with the tick entries necessary to complete PoH back to the leader’s most recently observed prior leader slot.
  6. Validators retransmit entries to peers in their set and to further downstream nodes.
  7. Validators validate the transactions and execute them on their state.
  8. Validators compute the hash of the state.
  9. At specific times, i.e. specific PoH tick counts, validators transmit votes to the leader.
    9.1 Votes are signatures of the hash of the computed state at that PoH tick count.
    9.2 Votes are also propagated via gossip.
  10. Leader executes the votes, the same as any other transaction, and broadcasts them to the cluster.
  11. Validators observe their votes and all the votes from the cluster.

Partitions, Forks

Forks can arise at PoH tick counts that correspond to a vote. The next leader may not have observed the last vote slot and may start their slot with generated virtual PoH entries. These empty ticks are generated by all nodes in the cluster at a cluster-configured rate for hashes/per/tick Z.

There are only two possible versions of the PoH during a voting slot: PoH with T ticks and entries generated by the current leader, or PoH with just ticks. The “just ticks” version of the PoH can be thought of as a virtual ledger, one that all nodes in the cluster can derive from the last tick in the previous slot.
Validators can ignore forks at other points (e.g. from the wrong leader), or slash the leader responsible for the fork.

Validators vote based on a greedy choice to maximize their reward described in Tower BFT.

分区、分叉​
分叉可能会在与投票相对应的 PoH 滴答计数时出现。下一个领导者可能没有观察到最后一个投票时隙,并且可能以生成的虚拟 PoH 条目开始他们的时隙。这些空刻度由集群中的所有节点以集群配置的哈希/每个/刻度 Z 的速率生成。
在投票时隙期间只有两种可能的 PoH 版本:具有 T 个刻度的 PoH 和由当前的领导者,或仅带有刻度的 PoH。 PoH 的“只是刻度”版本可以被认为是一个虚拟账本,集群中的所有节点都可以从前一个槽中的最后一个刻度中派生出该账本。
验证者可以忽略其他点的分叉(例如,来自错误的领导者),或者削减负责分叉的领导者。
验证者根据贪婪的选择进行投票,以最大化 Tower BFT 中描述的奖励。

Validator’s View​

Time Progression​
The diagram below represents a validator’s view of the PoH stream with possible forks over time. L1, L2, etc. are leader slots, and Es represent entries from that leader during that leader’s slot. The xs represent ticks only, and time flows downwards in the diagram.

下图表示验证者对 PoH 流的视图,随着时间的推移可能出现分叉。 L1、L2 等是领导者时隙,Es 表示在该领导者时隙期间来自该领导者的条目。 xs 仅代表刻度,时间在图中向下流动。
在这里插入图片描述

Note that an E appearing on 2 forks at the same slot is a slashable condition, so a validator observing E3 and E3’ can slash L3 and safely choose x for that slot. Once a validator commits to a fork, other forks can be discarded below that tick count. For any slot, validators need only consider a single “has entries” chain or a “ticks only” chain to be proposed by a leader. But multiple virtual entries may overlap as they link back to the a previous slot.

请注意,E 出现在同一槽的 2 个分叉上是可削减的条件,因此观察 E3 和 E3’ 的验证者可以削减 L3 并安全地为该槽选择 x。一旦验证者提交了一个分叉,低于该刻度数的其他分叉就可以被丢弃。对于任何槽,验证者只需要考虑领导者提议的单个“有条目”链或“仅蜱”链。但是多个虚拟条目在链接回前一个插槽时可能会重叠。

Time Division​

It’s useful to consider leader rotation over PoH tick count as time division of the job of encoding state for the cluster. The following table presents the above tree of forks as a time-divided ledger.

将 PoH 滴答数上的领导轮换视为集群状态编码作业的时间划分非常有用。下表将上述分叉树呈现为时分账本。

在这里插入图片描述

Note that only data from leader L3 will be accepted during leader slot L3. Data from L3 may include “catchup” ticks back to a slot other than L2 if L3 did not observe L2’s data. L4 and L5’s transmissions include the “ticks to prev” PoH entries.
This arrangement of the network data streams permits nodes to save exactly this to the ledger for replay, restart, and checkpoints.

请注意,在领导者时隙 L3 期间,仅接受来自领导者 L3 的数据。如果 L3 没有观察到 L2 的数据,则来自 L3 的数据可能包括返回到除 L2 之外的时隙的“追赶”滴答。 L4 和 L5 的传输包括“ticks to prev”PoH 条目。
网络数据流的这种安排允许节点将其精确保存到分类帐中以进行重放、重新启动和检查点。

Leader’s View​
When a new leader begins a slot, it must first transmit any PoH (ticks) required to link the new slot with the most recently observed and voted slot. The fork the leader proposes would link the current slot to a previous fork that the leader has voted on with virtual ticks.

当新的领导者开始一个时隙时,它必须首先传输将新时隙与最近观察和投票的时隙链接所需的任何 PoH(蜱)。领导者提出的分叉会将当前槽位与领导者通过虚拟刻度进行投票的前一个分叉链接起来。

Solana Leader Rotation

At any given moment, a cluster expects only one validator to produce ledger entries. By having only one leader at a time, all validators are able to replay identical copies of the ledger. The drawback of only one leader at a time, however, is that a malicious leader is capable of censoring votes and transactions. Since censoring cannot be distinguished from the network dropping packets, the cluster cannot simply elect a single node to hold the leader role indefinitely. Instead, the cluster minimizes the influence of a malicious leader by rotating which node takes the lead.
Each validator selects the expected leader using the same algorithm, described below. When the validator receives a new signed ledger entry, it can be certain that an entry was produced by the expected leader. The order of slots which each leader is assigned a slot is called a leader schedule.

在任何给定时刻,集群都期望只有一个验证器生成账本条目。通过一次只有一名领导者,所有验证者都能够重播相同的账本副本。然而,一次只有一名领导者的缺点是,恶意领导者能够审查投票和交易。由于无法将审查与网络丢弃数据包区分开来,因此集群不能简单地选举单个节点来无限期地担任领导者角色。相反,集群通过轮换哪个节点带头来最大限度地减少恶意领导者的影响。
每个验证器都使用相同的算法选择预期的领导者,如下所述。当验证者收到新签名的账本条目时,可以确定该条目是由预期的领导者生成的。为每个领导者分配一个槽位的槽位顺序称为领导者调度。

Leader Schedule Rotation

A validator rejects blocks that are not signed by the slot leader. The list of identities of all slot leaders is called a leader schedule. The leader schedule is recomputed locally and periodically. It assigns slot leaders for a duration of time called an epoch. The schedule must be computed far in advance of the slots it assigns, such that the ledger state it uses to compute the schedule is finalized. That duration is called the leader schedule offset. Solana sets the offset to the duration of slots until the next epoch. That is, the leader schedule for an epoch is calculated from the ledger state at the start of the previous epoch. The offset of one epoch is fairly arbitrary and assumed to be sufficiently long such that all validators will have finalized their ledger state before the next schedule is generated. A cluster may choose to shorten the offset to reduce the time between stake changes and leader schedule updates.

验证者拒绝未由时隙领导者签名的区块。所有槽领导者的身份列表称为领导者时间表。领导者时间表会在本地定期重新计算。它会在称为纪元的持续时间内分配时隙领导者。调度必须远远早于它分配的时隙进行计算,以便最终确定用于计算调度的账本状态。该持续时间称为领导者计划偏移量。 Solana 将偏移量设置为直到下一个纪元的时隙持续时间。也就是说,一个纪元的领导者调度是根据前一个纪元开始时的账本状态计算的。一个时期的偏移量是相当任意的,并且假设足够长,以便所有验证者在生成下一个时间表之前最终确定其账本状态。集群可以选择缩短偏移量,以减少权益变更和领导者计划更新之间的时间。

While operating without partitions lasting longer than an epoch, the schedule only needs to be generated when the root fork crosses the epoch boundary. Since the schedule is for the next epoch, any new stakes committed to the root fork will not be active until the next epoch. The block used for generating the leader schedule is the first block to cross the epoch boundary.

当没有分区持续时间超过纪元的情况下运行时,仅当根分叉跨越纪元边界时才需要生成调度。由于计划是针对下一个纪元的,因此提交给根分叉的任何新权益在下一个纪元之前都不会激活。用于生成领导者调度的块是第一个跨越纪元边界的块。

Without a partition lasting longer than an epoch, the cluster will work as follows:
A validator continuously updates its own root fork as it votes.
The validator updates its leader schedule each time the slot height crosses an epoch boundary.

如果分区持续时间超过一个纪元,集群将按如下方式工作:
验证器在投票时不断更新其自己的根分叉。
每次槽高度跨越纪元边界时,验证器都会更新其领导者计划。

For example:
Let’s assume an epoch duration of 100 slots, which in reality is magnitudes higher. The root fork is updated from fork computed at slot height 99 to a fork computed at slot height 102. Forks with slots at height 100, 101 were skipped because of failures. The new leader schedule is computed using fork at slot height 102. It is active from slot 200 until it is updated again.
No inconsistency can exist because every validator that is voting with the cluster has skipped 100 and 101 when its root passes 102. All validators, regardless of voting pattern, would be committing to a root that is either 102, or a descendant of 102.

假设纪元持续时间为 100 个时隙,实际上要高得多。根分叉从在槽高度 99 处计算的分叉更新为在槽高度 102 处计算的分叉。由于故障,槽位在高度 100、101 处的分叉被跳过。新的领导者调度是使用槽位高度 102 处的分叉计算的。它从槽位 200 开始处于活动状态,直到再次更新。
不会存在不一致,因为与集群一起投票的每个验证器在其根通过时都跳过了 100 和 101 102. 所有验证者,无论投票模式如何,都将承诺根为 102 或 102 的后代。

Leader Schedule Rotation with Epoch Sized Partitions.

The duration of the leader schedule offset has a direct relationship to the likelihood of a cluster having an inconsistent view of the correct leader schedule.
Consider the following scenario:
Two partitions that are generating half of the blocks each. Neither is coming to a definitive supermajority fork. Both will cross epoch 100 and 200 without actually committing to a root and therefore a cluster-wide commitment to a new leader schedule.
In this unstable scenario, multiple valid leader schedules exist.
A leader schedule is generated for every fork whose direct parent is in the previous epoch.
The leader schedule is valid after the start of the next epoch for descendant forks until it is updated.
Each partition’s schedule will diverge after the partition lasts more than an epoch. For this reason, the epoch duration should be selected to be much much larger then slot time and the expected length for a fork to be committed to root.
After observing the cluster for a sufficient amount of time, the leader schedule offset can be selected based on the median partition duration and its standard deviation. For example, an offset longer then the median partition duration plus six standard deviations would reduce the likelihood of an inconsistent ledger schedule in the cluster to 1 in 1 million.

领导者计划偏移的持续时间与集群具有不一致的正确领导者计划视图的可能性有直接关系。
考虑以下场景:
两个分区各自生成一半的块。两者都没有达到最终的绝对多数分叉。两者都将跨越 epoch 100 和 200,而无需实际提交到根,因此需要在集群范围内提交新的领导者计划。
在这种不稳定的情况下,存在多个有效的领导者计划。
为每个领导者计划生成一个领导者计划。其直接父代位于上一个纪元的分叉。
领导者计划在后代分叉的下一个纪元开始后有效,直到更新为止。
每个分区的计划将在分区持续超过一个纪元后发散。因此,纪元持续时间应选择为远大于时隙时间和分叉提交到根的预期长度。
在观察集群足够长的时间后,领导者调度偏移量可以根据中值分区持续时间及其标准差进行选择。例如,比中值分区持续时间加上六个标准差更长的偏移量会将集群中账本计划不一致的可能性降低到百万分之一。

Leader Schedule Generation at Genesis

The genesis config declares the first leader for the first epoch. This leader ends up scheduled for the first two epochs because the leader schedule is also generated at slot 0 for the next epoch. The length of the first two epochs can be specified in the genesis config as well. The minimum length of the first epochs must be greater than or equal to the maximum rollback depth as defined in Tower BFT.

创世配置声明了第一个纪元的第一个领导者。该领导者最终会在前两个 epoch 中进行调度,因为领导者调度也会在下一个 epoch 的槽 0 处生成。前两个纪元的长度也可以在创世配置中指定。第一个 epoch 的最小长度必须大于或等于 Tower BFT 中定义的最大回滚深度。

Leader Schedule Generation Algorithm

Leader schedule is generated using a predefined seed. The process is as follows:
Periodically use the PoH tick height (a monotonically increasing counter) to seed a stable pseudo-random algorithm.
At that height, sample the bank for all the staked accounts with leader identities that have voted within a cluster-configured number of ticks. The sample is called the active set.
Sort the active set by stake weight.
Use the random seed to select nodes weighted by stake to create a stake-weighted ordering.
This ordering becomes valid after a cluster-configured number of ticks.

领导者计划是使用预定义的种子生成的。过程如下:
定期使用 PoH 刻度高度(单调递增计数器)来播种稳定的伪随机算法。
在该高度,对银行中所有具有领导者身份的质押账户进行抽样在集群配置的刻度数内进行投票。该样本称为活跃集。
按权益权重对活跃集进行排序。
使用随机种子选择按权益加权的节点,创建权益加权排序。
此排序在集群后生效-配置的刻度数。

Schedule Attack Vectors

Seed

The seed that is selected is predictable but unbiasable. There is no grinding attack to influence its outcome.

Active Set

A leader can bias the active set by censoring validator votes. Two possible ways exist for leaders to censor the active set:
Ignore votes from validators
Refuse to vote for blocks with votes from validators
To reduce the likelihood of censorship, the active set is calculated at the leader schedule offset boundary over an active set sampling duration. The active set sampling duration is long enough such that votes will have been collected by multiple leaders.

领导者可以通过审查验证者投票来对活跃集产生偏见。领导者审查活动集有两种可能的方式:
忽略验证者的投票
拒绝对验证者投票的区块进行投票
为了减少审查的可能性,活动集是在领导者计划中计算的活动集采样持续时间内的偏移边界。活跃集采样持续时间足够长,以便多个领导者收集选票。

Staking

Leaders can censor new staking transactions or refuse to validate blocks with new stakes. This attack is similar to censorship of validator votes.
领导者可以审查新的质押交易或拒绝用新质押验证区块。这种攻击类似于对验证者投票的审查。

Validator operational key loss

Leaders and validators are expected to use ephemeral keys for operation, and stake owners authorize the validators to do work with their stake via delegation.
The cluster should be able to recover from the loss of all the ephemeral keys used by leaders and validators, which could occur through a common software vulnerability shared by all the nodes. Stake owners should be able to vote directly by co-signing a validator vote even though the stake is currently delegated to a validator.

领导者和验证者应该使用临时密钥进行操作,并且权益所有者授权验证者通过委托来处理他们的权益。
集群应该能够从领导者和验证者使用的所有临时密钥丢失中恢复,这可能是通过所有节点共享的通用软件漏洞而发生的。即使股权当前已委托给验证人,股权所有者也应该能够通过共同签署验证人投票来直接投票。

Appending Entries

The lifetime of a leader schedule is called an epoch. The epoch is split into slots, where each slot has a duration of T PoH ticks.
A leader transmits entries during its slot. After T ticks, all the validators switch to the next scheduled leader. Validators must ignore entries sent outside a leader’s assigned slot.
All T ticks must be observed by the next leader for it to build its own entries on. If entries are not observed (leader is down) or entries are invalid (leader is buggy or malicious), the next leader must produce ticks to fill the previous leader’s slot. Note that the next leader should do repair requests in parallel, and postpone sending ticks until it is confident other validators also failed to observe the previous leader’s entries. If a leader incorrectly builds on its own ticks, the leader following it must replace all its ticks.

Managing Forks

The ledger is permitted to fork at slot boundaries. The resulting data structure forms a tree called a blockstore. When the validator interprets the blockstore, it must maintain state for each fork in the chain. It is the responsibility of a validator to weigh those forks, such that it may eventually select a fork. Details for selection and voting on these forks can be found in Tower Bft

账本允许在时隙边界处分叉。生成的数据结构形成称为块存储的树。当验证器解释块存储时,它必须维护链中每个分叉的状态。验证者有责任权衡这些分叉,以便最终选择一个分叉。有关这些分叉的选择和投票的详细信息可以在 Tower Bft 中找到

Forks

A fork is as a sequence of slots originating from some root. For example:

      2 - 4 - 6 - 8
     /
0 - 1       12 - 13
     \     /
      3 - 5
           \
            7 - 9 - 10 - 11

The following sequences are forks:

- {0, 1, 2, 4, 6, 8}
- {0, 1, 3, 5, 12, 13}
- {0, 1, 3, 5, 7, 9, 10, 11}

Pruning and Squashing

修剪和压缩

As the chain grows, storing the local forks view becomes detrimental to performance. Fortunately we can take advantage of the properties of tower bft roots to prune this data structure. Recall a root is a slot that has reached the max lockout depth. The assumption is that this slot has accrued enough lockout that it would be impossible to roll this slot back.

随着链的增长,存储本地分叉视图会对性能产生不利影响。幸运的是,我们可以利用 tower bft 根的特性来修剪这个数据结构。回想一下,根是已达到最大锁定深度的槽。假设该插槽已累积足够的锁定,因此无法回滚该插槽。

Thus, the validator prunes forks that do not originate from its local root, and then takes the opportunity to minimize its memory usage by squashing any nodes it can into the root. Although not necessary for consensus, to enable some RPC use cases the validator chooses to keep ancestors of its local root up until the last slot rooted by the super majority of the cluster. We call this the super majority root (SMR).

因此,验证器会修剪并非源自其本地根的分叉,然后利用机会将其可以压缩到根中的任何节点来最小化其内存使用量。虽然对于达成共识来说不是必需的,但为了启用某些 RPC 用例,验证器会选择保留其本地根的祖先,直到由集群的绝大多数成员作为根的最后一个槽。我们称之为超级多数根(SMR)。

Starting from the above example imagine a max lockout depth of 3. Our validator votes on slots 0, 1, 3, 5, 7, 9. Upon the final vote at 9, our local root is 3. Assume the latest super majority root is 0. After pruning this is our local fork view.

从上面的例子开始,假设最大锁定深度为 3。我们的验证者在插槽 0、1、3、5、7、9 上投票。在 9 进行最终投票时,我们的本地根是 3。假设最新的超级多数根是0. 修剪后,这是我们本地的分叉视图。

SMR
 0 - 1       12 - 13
      \     /
       3 - 5
     ROOT   \
             7 - 9 - 10 - 11

Now imagine we vote on 10, which roots 5. At the same time the cluster catches up and the latest super majority root is now 3. After pruning this is our local fork view.

现在想象我们对 10 进行投票,其根为 5。同时,集群赶上并且最新的超级多数根现在为 3。修剪后,这是我们的本地分叉视图。

             12 - 13
            /
       3 - 5 ROOT
      SMR   \
             7 - 9 - 10 - 11

Finally a vote on 11 will root 7, pruning the final fork

最后对 11 进行投票将根 7,修剪最终的分叉

       3 - 5 - 7 - 9 - 10 - 11
      SMR     ROOT

Stake Delegation and Rewards

权益委托及奖励

Stakers are rewarded for helping to validate the ledger. They do this by delegating their stake to validator nodes. Those validators do the legwork of replaying the ledger and sending votes to a per-node vote account to which stakers can delegate their stakes. The rest of the cluster uses those stake-weighted votes to select a block when forks arise. Both the validator and staker need some economic incentive to play their part. The validator needs to be compensated for its hardware and the staker needs to be compensated for the risk of getting its stake slashed. The economics are covered in staking rewards. This section, on the other hand, describes the underlying mechanics of its implementation.

质押者因帮助验证账本而获得奖励。他们通过将股份委托给验证节点来做到这一点。这些验证者负责重放账本并将选票发送到每个节点的投票账户,利益相关者可以将其股份委托给该账户。当分叉出现时,集群的其余部分使用这些权益加权投票来选择一个区块。验证者和质押者都需要一些经济激励来发挥自己的作用。验证者需要对其硬件进行补偿,而质押者则需要对其股份被削减的风险进行补偿。经济学包含在质押奖励中。另一方面,本节描述其实现的底层机制。

Basic Design

The general idea is that the validator owns a Vote account. The Vote account tracks validator votes, counts validator generated credits, and provides any additional validator specific state. The Vote account is not aware of any stakes delegated to it and has no staking weight.

总体思路是验证者拥有一个投票账户。投票账户跟踪验证者投票,计算验证者生成的积分,并提供任何其他验证者特定状态。投票账户不知道有任何委托给它的权益,也没有权益权重。

A separate Stake account (created by a staker) names a Vote account to which the stake is delegated. Rewards generated are proportional to the amount of lamports staked. The Stake account is owned by the staker only. Some portion of the lamports stored in this account are the stake.

一个单独的权益账户(由权益持有者创建)指定权益被委托到的投票账户。产生的奖励与质押的 lamport 数量成正比。质押账户仅由质押者拥有。此帐户中存储的 lamport 的某些部分是权益。

Passive Delegation

被动委托
Any number of Stake accounts can delegate to a single Vote account without an interactive action from the identity controlling the Vote account or submitting votes to the account.
任意数量的权益账户都可以委托给单个投票账户,而无需控制投票账户的身份进行交互操作或向该账户提交投票。

The total stake allocated to a Vote account can be calculated by the sum of all the Stake accounts that have the Vote account pubkey as the StakeStateV2::Stake::voter_pubkey.

分配给投票账户的总权益可以通过所有投票账户公钥为 StakeStateV2::Stake::voter_pubkey 的权益账户之和来计算。

Vote and Stake accounts

The rewards process is split into two on-chain programs. The Vote program solves the problem of making stakes slashable. The Stake program acts as custodian of the rewards pool and provides for passive delegation. The Stake program is responsible for paying rewards to staker and voter when shown that a staker’s delegate has participated in validating the ledger.

奖励过程分为两个链上程序。投票计划解决了赌注可削减的问题。 Stake 计划充当奖励池的托管人并提供被动委托。当证明质押者的代表已参与验证账本时,质押计划负责向质押者和投票者支付奖励。

VoteState

VoteState is the current state of all the votes the validator has submitted to the network. VoteState contains the following state information:

  1. votes - The submitted votes data structure.
  2. credits - The total number of rewards this Vote program has generated over its lifetime.
  3. root_slot - The last slot to reach the full lockout commitment necessary for rewards.
  4. commission - The commission taken by this VoteState for any rewards claimed by staker’s Stake accounts. This is the percentage ceiling of the reward.
  5. Account::lamports - The accumulated lamports from the commission. These do not count as stakes.
  6. authorized_voter - Only this identity is authorized to submit votes. This field can only modified by this identity.
  7. node_pubkey - The Solana node that votes in this account.
  8. authorized_withdrawer - the identity of the entity in charge of the lamports of this account, separate from the account’s 9. address and the authorized vote signer.

VoteState 是验证者已提交给网络的所有投票的当前状态。 VoteState 包含以下状态信息:

  1. votes - 提交的投票数据结构。
  2. credits - 此投票程序在其生命周期内生成的奖励总数。
  3. root_slot - 达到完全锁定的最后一个槽位奖励所需的承诺。
  4. commission - 此 VoteState 对质押者的 Stake 账户索取的任何奖励收取的佣金。这是奖励的百分比上限。
  5. Account::lamports - 来自佣金的累积 lamports。这些不算作权益。
  6. authorized_voter - 只有此身份才有权提交投票。该字段只能由该身份修改。
  7. node_pubkey - 在此帐户中投票的 Solana 节点。
  8. authorized_withdrawer - 负责该帐户的 lamports 的实体的身份,与帐户的地址和授权分开投票签名者。
VoteInstruction::Initialize(VoteInit)

account[0] - RW - The VoteState.
VoteInit carries the new vote account’s node_pubkey, authorized_voter, authorized_withdrawer, and commission.

other VoteState members defaulted.

VoteInstruction::Authorize(Pubkey, VoteAuthorize)

Updates the account with a new authorized voter or withdrawer, according to the VoteAuthorize parameter (Voter or Withdrawer). The transaction must be signed by the Vote account’s current authorized_voter or authorized_withdrawer.

account[0] - RW - The VoteState. VoteState::authorized_voter or authorized_withdrawer is set to Pubkey.

VoteInstruction::Vote(Vote)

account[0] - RW - The VoteState. VoteState::lockouts and VoteState::credits are updated according to voting lockout rules see Tower BFT.
account[1] - RO - sysvar::slot_hashes A list of some N most recent slots and their hashes for the vote to be verified against.
account[2] - RO - sysvar::clock The current network time, expressed in slots, epochs.

StakeStateV2

A StakeStateV2 takes one of four forms, StakeStateV2::Uninitialized, StakeStateV2::Initialized, StakeStateV2::Stake, and StakeStateV2::RewardsPool. Only the first three forms are used in staking, but only StakeStateV2::Stake is interesting. All RewardsPools are created at genesis.

StakeStateV2::Stake

StakeStateV2::Stake is the current delegation preference of the staker and contains the following state information:
Account::lamports - The lamports available for staking.
stake - the staked amount (subject to warmup and cooldown) for generating rewards, always less than or equal to Account::lamports.
voter_pubkey - The pubkey of the VoteState instance the lamports are delegated to.
credits_observed - The total credits claimed over the lifetime of the program.
activated - the epoch at which this stake was activated/delegated. The full stake will be counted after warmup.
deactivated - the epoch at which this stake was de-activated, some cooldown epochs are required before the account is fully deactivated, and the stake available for withdrawal.
authorized_staker - the pubkey of the entity that must sign delegation, activation, and deactivation transactions.
authorized_withdrawer - the identity of the entity in charge of the lamports of this account, separate from the account’s address, and the authorized staker.

StakeStateV2::RewardsPool

To avoid a single network-wide lock or contention in redemption, 256 RewardsPools are part of genesis under pre-determined keys, each with std::u64::MAX credits to be able to satisfy redemptions according to point value.
The Stakes and the RewardsPool are accounts that are owned by the same Stake program.

StakeInstruction::DelegateStake

The Stake account is moved from Initialized to StakeStateV2::Stake form, or from a deactivated (i.e. fully cooled-down) StakeStateV2::Stake to activated StakeStateV2::Stake. This is how stakers choose the vote account and validator node to which their stake account lamports are delegated. The transaction must be signed by the stake’s authorized_staker.

account[0] - RW - The StakeStateV2::Stake instance. StakeStateV2::Stake::credits_observed is initialized to VoteState::credits, StakeStateV2::Stake::voter_pubkey is initialized to account[1]. If this is the initial delegation of stake, StakeStateV2::Stake::stake is initialized to the account’s balance in lamports, StakeStateV2::Stake::activated is initialized to the current Bank epoch, and StakeStateV2::Stake::deactivated is initialized to std::u64::MAX
account[1] - R - The VoteState instance.
account[2] - R - sysvar::clock account, carries information about current Bank epoch.
account[3] - R - sysvar::stakehistory account, carries information about stake history.
account[4] - R - stake::Config account, carries warmup, cooldown, and slashing configuration.

StakeInstruction::Authorize(Pubkey, StakeAuthorize)

Updates the account with a new authorized staker or withdrawer, according to the StakeAuthorize parameter (Staker or Withdrawer). The transaction must be by signed by the Stakee account’s current authorized_staker or authorized_withdrawer. Any stake lock-up must have expired, or the lock-up custodian must also sign the transaction.

account[0] - RW - The StakeStateV2.
StakeStateV2::authorized_staker or authorized_withdrawer is set to to Pubkey.

StakeInstruction::Deactivate

A staker may wish to withdraw from the network. To do so he must first deactivate his stake, and wait for cooldown. The transaction must be signed by the stake’s authorized_staker.

account[0] - RW - The StakeStateV2::Stake instance that is deactivating.
account[1] - R - sysvar::clock account from the Bank that carries current epoch.
StakeStateV2::Stake::deactivated is set to the current epoch + cooldown. The account’s stake will ramp down to zero by that epoch, and account::lamports will be available for withdrawal.

StakeInstruction::Withdraw(u64)

Lamports build up over time in a Stake account and any excess over activated stake can be withdrawn. The transaction must be signed by the stake’s authorized_withdrawer.

account[0] - RW - The StakeStateV2::Stake from which to withdraw.
account[1] - RW - Account that should be credited with the withdrawn lamports.
account[2] - R - sysvar::clock account from the Bank that carries current epoch, to calculate stake.
account[3] - R - sysvar::stake_history account from the Bank that carries stake warmup/cooldown history.

Benefits of the design

Single vote for all the stakers.
Clearing of the credit variable is not necessary for claiming rewards.
Each delegated stake can claim its rewards independently.
Commission for the work is deposited when a reward is claimed by the delegated stake.

Example Callflow

在这里插入图片描述

Staking Rewards

The specific mechanics and rules of the validator rewards regime is outlined here. Rewards are earned by delegating stake to a validator that is voting correctly. Voting incorrectly exposes that validator’s stakes to slashing.

Basics

The network pays rewards from a portion of network inflation. The number of lamports available to pay rewards for an epoch is fixed and must be evenly divided among all staked nodes according to their relative stake weight and participation. The weighting unit is called a point.

Rewards for an epoch are not available until the end of that epoch.

At the end of each epoch, the total number of points earned during the epoch is summed and used to divide the rewards portion of epoch inflation to arrive at a point value. This value is recorded in the bank in a sysvar that maps epochs to point values.

During redemption, the stake program counts the points earned by the stake for each epoch, multiplies that by the epoch’s point value, and transfers lamports in that amount from a rewards account into the stake and vote accounts according to the vote account’s commission setting.

Economics

Point value for an epoch depends on aggregate network participation. If participation in an epoch drops off, point values are higher for those that do participate.

Earning credits

Validators earn one vote credit for every correct vote that exceeds maximum lockout, i.e. every time the validator’s vote account retires a slot from its lockout list, making that vote a root for the node.
Stakers who have delegated to that validator earn points in proportion to their stake. Points earned is the product of vote credits and stake.

Stake warmup, cooldown, withdrawal

Stakes, once delegated, do not become effective immediately. They must first pass through a warmup period. During this period some portion of the stake is considered “effective”, the rest is considered “activating”. Changes occur on epoch boundaries.

The stake program limits the rate of change to total network stake, reflected in the stake program’s config::warmup_rate (set to 25% per epoch in the current implementation).

The amount of stake that can be warmed up each epoch is a function of the previous epoch’s total effective stake, total activating stake, and the stake program’s configured warmup rate.

Cooldown works the same way. Once a stake is deactivated, some part of it is considered “effective”, and also “deactivating”. As the stake cools down, it continues to earn rewards and be exposed to slashing, but it also becomes available for withdrawal.

Bootstrap stakes are not subject to warmup.

Rewards are paid against the “effective” portion of the stake for that epoch.

Warmup example
Consider the situation of a single stake of 1,000 activated at epoch N, with network warmup rate of 20%, and a quiescent total network stake at epoch N of 2,000.

At epoch N+1, the amount available to be activated for the network is 400 (20% of 2000), and at epoch N, this example stake is the only stake activating, and so is entitled to all of the warmup room available.

在这里插入图片描述

Were 2 stakes (X and Y) to activate at epoch N, they would be awarded a portion of the 20% in proportion to their stakes. At each epoch effective and activating for each stake is a function of the previous epoch’s state.

在这里插入图片描述

Withdrawal

Only lamports in excess of effective+activating stake may be withdrawn at any time. This means that during warmup, effectively no stake can be withdrawn. During cooldown, any tokens in excess of effective stake may be withdrawn (activating == 0). Because earned rewards are automatically added to stake, withdrawal is generally only possible after deactivation.

撤回​
只有超过有效+激活股份的lamport才可以随时撤回。这意味着在预热期间,实际上无法提取任何权益。在冷却期间,任何超过有效权益的代币都可以被撤回(激活 == 0)。由于获得的奖励会自动添加到质押中,因此通常只有在停用后才能提款。

Lock-up

Stake accounts support the notion of lock-up, wherein the stake account balance is unavailable for withdrawal until a specified time. Lock-up is specified as an epoch height, i.e. the minimum epoch height that must be reached by the network before the stake account balance is available for withdrawal, unless the transaction is also signed by a specified custodian. This information is gathered when the stake account is created, and stored in the Lockup field of the stake account’s state. Changing the authorized staker or withdrawer is also subject to lock-up, as such an operation is effectively a transfer.

锁定​
质押账户支持锁定的概念,其中质押账户余额在指定时间之前无法提取。锁定被指定为纪元高度,即在质押账户余额可用于提取之前网络必须达到的最小纪元高度,除非交易也由指定的托管人签署。该信息在创建权益账户时收集,并存储在权益账户状态的 Lockup 字段中。更改授权的质押者或提款者也受到锁定,因为此类操作实际上是转让。

Synchronization

Fast, reliable synchronization is the biggest reason Solana is able to achieve such high throughput. Traditional blockchains synchronize on large chunks of transactions called blocks. By synchronizing on blocks, a transaction cannot be processed until a duration, called “block time”, has passed. In Proof of Work consensus, these block times need to be very large (~10 minutes) to minimize the odds of multiple validators producing a new valid block at the same time. There’s no such constraint in Proof of Stake consensus, but without reliable timestamps, a validator cannot determine the order of incoming blocks. The popular workaround is to tag each block with a wallclock timestamp. Because of clock drift and variance in network latencies, the timestamp is only accurate within an hour or two. To workaround the workaround, these systems lengthen block times to provide reasonable certainty that the median timestamp on each block is always increasing.

快速、可靠的同步是 Solana 能够实现如此高吞吐量的最大原因。传统的区块链在称为区块的大块交易上进行同步。通过在块上同步,直到经过一段称为“块时间”的持续时间后才能处理事务。在工作量证明共识中,这些区块时间需要非常长(约 10 分钟),以最大限度地减少多个验证者同时生成新有效区块的几率。权益证明共识中没有这样的约束,但是如果没有可靠的时间戳,验证者就无法确定传入块的顺序。流行的解决方法是用挂钟时间戳标记每个块。由于时钟漂移和网络延迟的变化,时间戳仅在一两个小时内准确。为了解决这个问题,这些系统延长了区块时间,以提供合理的确定性,即每个区块的时间戳中位数始终在增加。

Solana takes a very different approach, which it calls Proof of History or PoH. Leader nodes “timestamp” blocks with cryptographic proofs that some duration of time has passed since the last proof. All data hashed into the proof most certainly have occurred before the proof was generated. The node then shares the new block with validator nodes, which are able to verify those proofs. The blocks can arrive at validators in any order or even could be replayed years later. With such reliable synchronization guarantees, Solana is able to break blocks into smaller batches of transactions called entries. Entries are streamed to validators in realtime, before any notion of block consensus.

Solana 采用了一种非常不同的方法,称为历史证明或 PoH。领导节点使用加密证明对区块进行“时间戳”,表明自上次证明以来已经过去了一段时间。所有散列到证明中的数据肯定是在生成证明之前发生的。然后,该节点与验证器节点共享新块,验证器节点能够验证这些证明。这些区块可以以任何顺序到达验证器,甚至可以在几年后重播。有了这种可靠的同步保证,Solana 就能够将区块分解为称为条目的小批量事务。在任何区块共识概念之前,条目会实时传输到验证器。

Solana technically never sends a block, but uses the term to describe the sequence of entries that validators vote on to achieve confirmation. In that way, Solana’s confirmation times can be compared apples to apples to block-based systems. The current implementation sets block time to 800ms.

从技术上讲,Solana 从不发送区块,而是使用该术语来描述验证者投票以实现确认的条目序列。这样,Solana 的确认时间就可以与基于区块的系统进行比较。当前的实现将区块时间设置为 800 毫秒。

What’s happening under the hood is that entries are streamed to validators as quickly as a leader node can batch a set of valid transactions into an entry. Validators process those entries long before it is time to vote on their validity. By processing the transactions optimistically, there is effectively no delay between the time the last entry is received and the time when the node can vote. In the event consensus is not achieved, a node simply rolls back its state. This optimistic processing technique was introduced in 1981 and called Optimistic Concurrency Control. It can be applied to blockchain architecture where a cluster votes on a hash that represents the full ledger up to some block height. In Solana, it is implemented trivially using the last entry’s PoH hash.

幕后发生的事情是,当领导节点将一组有效交易批量处理到条目中时,条目就会流式传输到验证器。验证者早在对其有效性进行投票之前就开始处理这些条目。通过乐观地处理交易,从收到最后一个条目的时间到节点可以投票的时间之间实际上没有延迟。如果未达成共识,节点只需回滚其状态。这种乐观处理技术于 1981 年引入,称为乐观并发控制。它可以应用于区块链架构,其中集群对代表完整分类账的哈希值进行投票,直至达到某个区块高度。在 Solana 中,它是使用最后一个条目的 PoH 哈希来简单实现的。

Relationship to VDFs

The Proof of History technique was first described for use in blockchain by Solana in November of 2017. In June of the following year, a similar technique was described at Stanford and called a verifiable delay function or VDF.

Solana 于 2017 年 11 月首次描述了历史证明技术在区块链中的使用。次年 6 月,斯坦福大学描述了一种类似的技术,称为可验证延迟函数或 VDF。

A desirable property of a VDF is that verification time is very fast. Solana’s approach to verifying its delay function is proportional to the time it took to create it. Split over a 4000 core GPU, it is sufficiently fast for Solana’s needs, but if you asked the authors of the paper cited above, they might tell you (and have) that Solana’s approach is algorithmically slow and it shouldn’t be called a VDF. We argue the term VDF should represent the category of verifiable delay functions and not just the subset with certain performance characteristics. Until that’s resolved, Solana will likely continue using the term PoH for its application-specific VDF.

VDF 的一个理想特性是验证时间非常快。 Solana 验证其延迟函数的方法与创建它所花费的时间成正比。分割超过 4000 个核心 GPU,它的速度足以满足 Solana 的需求,但如果您询问上述论文的作者,他们可能会告诉您(并且已经)Solana 的方法在算法上很慢,不应该被称为 VDF 。我们认为 VDF 一词应该代表可验证延迟函数的类别,而不仅仅是具有某些性能特征的子集。在该问题得到解决之前,Solana 可能会继续使用术语 PoH 来表示其特定于应用程序的 VDF。

Another difference between PoH and VDFs is that a VDF is used only for tracking duration. PoH’s hash chain, on the other hand, includes hashes of any data the application observed. That data is a double-edged sword. On one side, the data “proves history” - that the data most certainly existed before hashes after it. On the other side, it means the application can manipulate the hash chain by changing when the data is hashed. The PoH chain therefore does not serve as a good source of randomness whereas a VDF without that data could. Solana’s leader rotation algorithm, for example, is derived only from the VDF height and not its hash at that height.

PoH 和 VDF 之间的另一个区别是 VDF 仅用于跟踪持续时间。另一方面,PoH 的哈希链包含应用程序观察到的任何数据的哈希值。该数据是一把双刃剑。一方面,数据“证明了历史”——数据在散列之前肯定存在。另一方面,这意味着应用程序可以通过更改数据的哈希值来操纵哈希链。因此,PoH 链不能作为良好的随机性来源,而没有该数据的 VDF 则可以。例如,Solana 的领导者轮换算法仅源自 VDF 高度,而不是该高度的哈希值。

Relationship to Consensus Mechanisms

Proof of History is not a consensus mechanism, but it is used to improve the performance of Solana’s Proof of Stake consensus. It is also used to improve the performance of the data plane protocols.

与共识机制的关系​
历史证明不是共识机制,但它用于提高 Solana 的权益证明共识的性能。它还用于提高数据平面协议的性能。

Turbine Block Propagation

A Solana cluster uses a multi-layer block propagation mechanism called Turbine to broadcast ledger entries to all nodes. The cluster divides itself into layers of nodes, and each node in a given layer is responsible for propagating any data it receives on to a small set of nodes in the next downstream layer. This way each node only has to communicate with a small number of nodes.

Solana 集群使用称为 Turbine 的多层块传播机制将账本条目广播到所有节点。集群将自身划分为节点层,给定层中的每个节点负责将其接收到的任何数据传播到下一个下游层中的一小组节点。这样每个节点只需与少量节点通信。

Layer Structure​

The leader communicates with a special root node. The root can be thought of as layer 0 and communicates with layer 1, which is made up of at most DATA_PLANE_FANOUT nodes. If the number of nodes in the cluster is greater than layer 1, then the data plane fanout mechanism adds layers below. The number of nodes in each additional layer grows by a factor of DATA_PLANE_FANOUT.
A good way to think about this is, layer 0 starts with a single node, layer 1 starts with fanout nodes, and layer 2 will have fanout * number of nodes in layer 1 and so on.

层结构​
领导者与特殊的根节点通信。根可以被认为是第 0 层并与第 1 层通信,第 1 层最多由 DATA_PLANE_FANOUT 节点组成。如果集群中的节点数量大于第 1 层,则数据平面扇出机制会添加下面的层。每个附加层中的节点数量都会以 DATA_PLANE_FANOUT 的倍数增长。
考虑这一点的一个好方法是,第 0 层从单个节点开始,第 1 层从扇出节点开始,第 2 层将有扇出 * 数量第 1 层中的节点数等等。

Layer Assignment - Weighted Selection​

In order for data plane fanout to work, the entire cluster must agree on how the cluster is divided into layers. To achieve this, all the recognized validator nodes (the TVU peers) are shuffled with a stake weighting and stored in a list. This list is then indexed in different ways to figure out layer boundaries and retransmit peers - referred to as the (turbine tree). For example, the list is shuffled and leader selects the first node to be the root node, and the root node selects the next DATA_PLANE_FANOUT nodes to make up layer 1. The shuffle is biased towards higher staked nodes, allowing heavier votes to come back to the leader first. Layer 2 and lower-layer nodes use the same logic to find their next layer peers.
To reduce the possibility of attack vectors, the list is shuffled and indexed on every shred. The turbine tree is generated from the set of validator nodes for each shred using a seed derived from the slot leader id, slot, shred index, and shred type.

层分配 - 加权选择​
为了使数据平面扇出发挥作用,整个集群必须就集群如何划分层达成一致。为了实现这一目标,所有已识别的验证节点(TVU 对等节点)都通过权益权重进行洗牌并存储在列表中。然后以不同的方式对该列表进行索引,以找出层边界并重新传输对等点 - 称为(涡轮树)。例如,列表被打乱,领导者选择第一个节点作为根节点,根节点选择下一个 DATA_PLANE_FANOUT 节点来组成第 1 层。洗牌偏向于更高质押的节点,允许更重的选票返回到领导者优先。第 2 层和下层节点使用相同的逻辑来查找下一层节点。
为了减少攻击向量的可能性,列表会在每个碎片上进行打乱和索引。涡轮树是使用从时隙领导者 ID、时隙、碎片索引和碎片类型派生的种子从每个碎片的验证器节点集生成的。

Configuration Values

DATA_PLANE_FANOUT - Determines the size of layer 1. Subsequent layers grow by a factor of DATA_PLANE_FANOUT. Layers will fill to capacity before new ones are added, i.e if a layer isn’t full, it must be the last one.

Currently, configuration is set when the cluster is launched. In the future, these parameters may be hosted on-chain, allowing modification on the fly as the cluster sizes change.

配置值
DATA_PLANE_FANOUT - 确定第 1 层的大小。后续层按 DATA_PLANE_FANOUT 的倍数增长。在添加新层之前,层将填满容量,即,如果层未满,则它一定是最后一层。
当前,配置是在集群启动时设置的。将来,这些参数可能会托管在链上,从而允许随着集群大小的变化而动态修改。

Shred Propagation Flow​

During its slot, the leader node makes its initial broadcasts to a special root node (layer 0) sitting atop the turbine tree. This root node is rotated every shred based on the weighted shuffle previously mentioned. The root shares data with layer 1. Nodes in this layer then retransmit shreds to a subset of nodes in the next layer (layer 2). In general, every node in layer-1 retransmits to a unique subset of nodes in the next layer, etc, until all nodes in the cluster have received all the shreds.
To prevent redundant transmission, each node uses the deterministically generated turbine tree, its own index in the tree, and DATA_PLANE_FANOUT to iterate through the tree and identify downstream nodes. Each node in a layer only has to broadcast its shreds to a maximum of DATA_PLANE_FANOUT nodes in the next layer instead of to every TVU peer in the cluster.
The following diagram shows how shreds propagate through a cluster with 15 nodes and a fanout of 3.

粉碎传播流​
在其时隙期间,领导节点向位于涡轮树顶部的特殊根节点(第 0 层)进行初始广播。该根节点根据前面提到的加权洗牌每一次旋转一次。根与第 1 层共享数据。然后,该层中的节点将碎片重新传输到下一层(第 2 层)中的节点子集。一般来说,第 1 层中的每个节点都会重新传输到下一层中节点的唯一子集,依此类推,直到集群中的所有节点都收到所有碎片。
为了防止冗余传输,每个节点使用确定性生成的涡轮机树、它自己在树中的索引以及 DATA_PLANE_FANOUT 来迭代树并识别下游节点。一层中的每个节点只需将其碎片广播到下一层中最多 DATA_PLANE_FANOUT 节点,而不是广播到集群中的每个 TVU 对等点。
下图显示了碎片如何在具有 15 个节点和扇出的集群中传播共 3 个。

在这里插入图片描述

Calculating the required FEC rate​
Turbine relies on retransmission of packets between validators. Due to retransmission, any network wide packet loss is compounded, and the probability of the packet failing to reach its destination increases on each hop. The FEC rate needs to take into account the network wide packet loss, and the propagation depth.
A shred group is the set of data and coding packets that can be used to reconstruct each other. Each shred group has a chance of failure, based on the likelihood of the number of packets failing that exceeds the FEC rate. If a validator fails to reconstruct the shred group, then the block cannot be reconstructed, and the validator has to rely on repair to fixup the blocks.
The probability of the shred group failing can be computed using the binomial distribution. If the FEC rate is 16:4, then the group size is 20, and at least 4 of the shreds must fail for the group to fail. Which is equal to the sum of the probability of 4 or more trials failing out of 20.
Probability of a block succeeding in turbine:

计算所需的 FEC 速率​
Turbine 依赖于验证器之间的数据包重传。由于重传,任何网络范围内的数据包丢失都会加剧,并且数据包未能到达目的地的概率在每一跳上都会增加。 FEC 速率需要考虑网络范围内的数据包丢失和传播深度。
碎片组是可用于相互重构的数据和编码数据包的集合。每个碎片组都有失败的机会,具体取决于失败的数据包数量超过 FEC 速率的可能性。如果验证者无法重建碎片组,则无法重建区块,验证者必须依靠修复来修复区块。
碎片组失败的概率可以使用二项式分布来计算。如果 FEC 率为 16:4,则组大小为 20,并且至少有 4 个碎片必须失败才能使组失败。这等于 20 次中 4 次或更多次试验失败的概率之和。
涡轮机中区块成功的概率:

Probability of packet failure: P = 1 - (1 - network_packet_loss_rate)^2
FEC rate: K:M
Number of trials: N = K + M
Shred group failure rate: S = 1 - (SUM of i=0 -> M for binomial(prob_failure = P, trials = N, failures = i))
Shreds per block: G
Block success rate: B = (1 - S) ^ (G / N)
Binomial distribution for exactly i results with probability of P in N trials is defined as (N choose i) * P^i * (1 - P)^(N-i)

For example:

Network packet loss rate is 15%.
50k tps network generates 6400 shreds per second.
FEC rate increases the total shreds per block by the FEC ratio.

With a FEC rate: 16:4

G = 8000
P = 1 - 0.85 * 0.85 = 1 - 0.7225 = 0.2775
S = 1 - (SUM of i=0 -> 4 for binomial(prob_failure = 0.2775, trials = 20, failures = i)) = 0.689414
B = (1 - 0.689) ^ (8000 / 20) = 10^-203

With FEC rate of 16:16

G = 12800
S = 1 - (SUM of i=0 -> 16 for binomial(prob_failure = 0.2775, trials = 32, failures = i)) = 0.002132
B = (1 - 0.002132) ^ (12800 / 32) = 0.42583

With FEC rate of 32:32

G = 12800
S = 1 - (SUM of i=0 -> 32 for binomial(prob_failure = 0.2775, trials = 64, failures = i)) = 0.000048
B = (1 - 0.000048) ^ (12800 / 64) = 0.99045

Secure Vote Signing

A validator receives entries from the current leader and submits votes confirming those entries are valid. This vote submission presents a security challenge, because forged votes that violate consensus rules could be used to slash the validator’s stake.
The validator votes on its chosen fork by submitting a transaction that uses an asymmetric key to sign the result of its validation work. Other entities can verify this signature using the validator’s public key. If the validator’s key is used to sign incorrect data (e.g. votes on multiple forks of the ledger), the node’s stake or its resources could be compromised.

安全投票签名
验证者接收来自当前领导者的条目并提交投票以确认这些条目有效。此投票提交提出了安全挑战,因为违反共识规则的伪造投票可能会被用来削减验证者的权益。
验证者通过提交使用非对称密钥签署其验证结果的交易来对其选择的分叉进行投票工作。其他实体可以使用验证者的公钥来验证此签名。如果验证者的密钥用于签署不正确的数据(例如对账本的多个分叉进行投票),则节点的权益或其资源可能会受到损害。

Validators, Vote Signers, and Stakeholders​

When a validator receives multiple blocks for the same slot, it tracks all possible forks until it can determine a “best” one. A validator selects the best fork by submitting a vote to it.
A stakeholder is an identity that has control of the staked capital. The stakeholder can delegate its stake to the vote signer. Once a stake is delegated, the vote signer’s votes represent the voting weight of all the delegated stakes, and produce rewards for all the delegated stakes.

验证者、投票签名者和利益相关者​
当验证者收到同一个槽的多个区块时,它会跟踪所有可能的分叉,直到可以确定“最佳”分叉。验证者通过提交投票来选择最佳分叉。
利益相关者是控制质押资本的身份。利益相关者可以将其股份委托给投票签名者。一旦权益被委托,投票签名者的投票代表所有委托权益的投票权重,并为所有委托权益产生奖励。

Validator voting​

A validator node, at startup, creates a new vote account and registers it with the cluster via gossip. The other nodes on the cluster include the new validator in the active set. Subsequently, the validator submits a “new vote” transaction signed with the validator’s voting private key on each voting event.

验证者投票​
验证者节点在启动时创建一个新的投票帐户并通过 gossip 将其注册到集群中。集群上的其他节点将新验证器包含在活动集中。随后,验证者在每个投票事件上提交使用验证者的投票私钥签名的“新投票”交易。

Runtime

Native Programs in the Solana Runtime

Solana contains a small handful of native programs, which are required to run validator nodes. Unlike third-party programs, the native programs are part of the validator implementation and can be upgraded as part of cluster upgrades. Upgrades may occur to add features, fix bugs, or improve performance. Interface changes to individual instructions should rarely, if ever, occur. Instead, when change is needed, new instructions are added and previous ones are marked deprecated. Apps can upgrade on their own timeline without concern of breakages across upgrades.
For each native program the program id and description each supported instruction is provided. A transaction can mix and match instructions from different programs, as well include instructions from on-chain programs.

Solana 运行时中的本机程序
Solana 包含少量运行验证器节点所需的本机程序。与第三方程序不同,本机程序是验证器实现的一部分,并且可以作为集群升级的一部分进行升级。升级可能是为了添加功能、修复错误或提高性能。对个别指令的界面更改应该很少发生(如果有的话)。相反,当需要更改时,会添加新指令,并将以前的指令标记为已弃用。应用程序可以按照自己的时间线进行升级,而不必担心升级过程中的损坏。
对于每个本机程序,都提供了每个受支持指令的程序 ID 和描述。一笔交易可以混合和匹配来自不同程序的指令,也可以包含来自链上程序的指令。

System Program

Create new accounts, allocate account data, assign accounts to owning programs, transfer lamports from System Program owned accounts and pay transaction fees.
Program id: 11111111111111111111111111111111
Instructions: SystemInstruction

Config Program

Add configuration data to the chain and the list of public keys that are permitted to modify it

Program id: Config1111111111111111111111111111111111111
Instructions: config_instruction
Unlike the other programs, the Config program does not define any individual instructions. It has just one implicit instruction, a “store” instruction. Its instruction data is a set of keys that gate access to the account, and the data to store in it.

Stake Program

Create and manage accounts representing stake and rewards for delegations to validators.
Program id: Stake11111111111111111111111111111111111111
Instructions: StakeInstruction

Vote Program

Create and manage accounts that track validator voting state and rewards.
Program id: Vote111111111111111111111111111111111111111
Instructions: VoteInstruction

Address Lookup Table Program

Program id: AddressLookupTab1e1111111111111111111111111
Instructions: AddressLookupTableInstruction

BPF Loader

Deploys, upgrades, and executes programs on the chain.

Program id: BPFLoaderUpgradeab1e11111111111111111111111
Instructions: LoaderInstruction

The BPF Upgradeable Loader marks itself as “owner” of the executable and program-data accounts it creates to store your program. When a user invokes an instruction via a program id, the Solana runtime will load both your the program and its owner, the BPF Upgradeable Loader. The runtime then passes your program to the BPF Upgradeable Loader to process the instruction.

BPF 可升级加载程序将自己标记为它创建的用于存储程序的可执行文件和程序数据帐户的“所有者”。当用户通过程序 ID 调用指令时,Solana 运行时将加载您的程序及其所有者(BPF 可升级加载器)。然后,运行时将您的程序传递给 BPF 可升级加载器来处理指令。

Ed25519 Program

Verify ed25519 signature program. This program takes an ed25519 signature, public key, and message. Multiple signatures can be verified. If any of the signatures fail to verify, an error is returned.

Program id: Ed25519SigVerify111111111111111111111111111
Instructions: new_ed25519_instruction
The ed25519 program processes an instruction. The first u8 is a count of the number of signatures to check, which is followed by a single byte padding. After that, the following struct is serialized, one for each signature to check.

struct Ed25519SignatureOffsets {
    signature_offset: u16,             // offset to ed25519 signature of 64 bytes
    signature_instruction_index: u16,  // instruction index to find signature
    public_key_offset: u16,            // offset to public key of 32 bytes
    public_key_instruction_index: u16, // instruction index to find public key
    message_data_offset: u16,          // offset to start of message data
    message_data_size: u16,            // size of message data
    message_instruction_index: u16,    // index of instruction data to get message data
}

Pseudo code of the operation:

process_instruction() {
    for i in 0..count {
        // i'th index values referenced:
        instructions = &transaction.message().instructions
        instruction_index = ed25519_signature_instruction_index != u16::MAX ? ed25519_signature_instruction_index : current_instruction;
        signature = instructions[instruction_index].data[ed25519_signature_offset..ed25519_signature_offset + 64]
        instruction_index = ed25519_pubkey_instruction_index != u16::MAX ? ed25519_pubkey_instruction_index : current_instruction;
        pubkey = instructions[instruction_index].data[ed25519_pubkey_offset..ed25519_pubkey_offset + 32]
        instruction_index = ed25519_message_instruction_index != u16::MAX ? ed25519_message_instruction_index : current_instruction;
        message = instructions[instruction_index].data[ed25519_message_data_offset..ed25519_message_data_offset + ed25519_message_data_size]
        if pubkey.verify(signature, message) != Success {
            return Error
        }
    }
    return Success
}

Secp256k1 Program

Verify secp256k1 public key recovery operations (ecrecover).

Program id: KeccakSecp256k11111111111111111111111111111
Instructions: new_secp256k1_instruction
The secp256k1 program processes an instruction which takes in as the first byte a count of the following struct serialized in the instruction data:

struct Secp256k1SignatureOffsets {
    secp_signature_offset: u16,            // offset to [signature,recovery_id] of 64+1 bytes
    secp_signature_instruction_index: u8,  // instruction index to find signature
    secp_pubkey_offset: u16,               // offset to ethereum_address pubkey of 20 bytes
    secp_pubkey_instruction_index: u8,     // instruction index to find pubkey
    secp_message_data_offset: u16,         // offset to start of message data
    secp_message_data_size: u16,           // size of message data
    secp_message_instruction_index: u8,    // instruction index to find message data
}

Pseudo code of the operation:

process_instruction() {
  for i in 0..count {
      // i'th index values referenced:
      instructions = &transaction.message().instructions
      signature = instructions[secp_signature_instruction_index].data[secp_signature_offset..secp_signature_offset + 64]
      recovery_id = instructions[secp_signature_instruction_index].data[secp_signature_offset + 64]
      ref_eth_pubkey = instructions[secp_pubkey_instruction_index].data[secp_pubkey_offset..secp_pubkey_offset + 20]
      message_hash = keccak256(instructions[secp_message_instruction_index].data[secp_message_data_offset..secp_message_data_offset + secp_message_data_size])
      pubkey = ecrecover(signature, recovery_id, message_hash)
      eth_pubkey = keccak256(pubkey[1..])[12..]
      if eth_pubkey != ref_eth_pubkey {
          return Error
      }
  }
  return Success
}

This allows the user to specify any instruction data in the transaction for signature and message data. By specifying a special instructions sysvar, one can also receive data from the transaction itself.

Cost of the transaction will count the number of signatures to verify multiplied by the signature cost verify multiplier.

Optimization notes

The operation will have to take place after (at least partial) deserialization, but all inputs come from the transaction data itself, this allows it to be relatively easy to execute in parallel to transaction processing and PoH verification.

Solana Sysvar Cluster Data

Solana exposes a variety of cluster state data to programs via sysvar accounts. These accounts are populated at known addresses published along with the account layouts in the solana-program crate, and outlined below.

There are two ways for a program to access a sysvar.

The first is to query the sysvar at runtime via the sysvar’s get() function:

let clock = Clock::get()

The following sysvars support get:

Clock
EpochSchedule
Fees
Rent
EpochRewards

The second is to pass the sysvar to the program as an account by including its address as one of the accounts in the Instruction and then deserializing the data during execution. Access to sysvars accounts is always readonly.

let clock_sysvar_info = next_account_info(account_info_iter)?;
let clock = Clock::from_account_info(&clock_sysvar_info)?;

The first method is more efficient and does not require that the sysvar account be passed to the program, or specified in the Instruction the program is processing.

Clock

The Clock sysvar contains data on cluster time, including the current slot, epoch, and estimated wall-clock Unix timestamp. It is updated every slot.
Address: SysvarC1ock11111111111111111111111111111111

Layout: Clock

Fields:

slot: the current slot
epoch_start_timestamp: the Unix timestamp of the first slot in this epoch. In the first slot of an epoch, this timestamp is identical to the unix_timestamp (below).
epoch: the current epoch
leader_schedule_epoch: the most recent epoch for which the leader schedule has already been generated
unix_timestamp: the Unix timestamp of this slot.
Each slot has an estimated duration based on Proof of History. But in reality, slots may elapse faster and slower than this estimate. As a result, the Unix timestamp of a slot is generated based on oracle input from voting validators. This timestamp is calculated as the stake-weighted median of timestamp estimates provided by votes, bounded by the expected time elapsed since the start of the epoch.
More explicitly: for each slot, the most recent vote timestamp provided by each validator is used to generate a timestamp estimate for the current slot (the elapsed slots since the vote timestamp are assumed to be Bank::ns_per_slot). Each timestamp estimate is associated with the stake delegated to that vote account to create a distribution of timestamps by stake. The median timestamp is used as the unix_timestamp, unless the elapsed time since the epoch_start_timestamp has deviated from the expected elapsed time by more than 25%.

EpochSchedule

The EpochSchedule sysvar contains epoch scheduling constants that are set in genesis, and enables calculating the number of slots in a given epoch, the epoch for a given slot, etc. (Note: the epoch schedule is distinct from the leader schedule)

Address: SysvarEpochSchedu1e111111111111111111111111
Layout: EpochSchedule

Fees

The Fees sysvar contains the fee calculator for the current slot. It is updated every slot, based on the fee-rate governor.
Address: SysvarFees111111111111111111111111111111111
Layout: Fees

Instructions

The Instructions sysvar contains the serialized instructions in a Message while that Message is being processed. This allows program instructions to reference other instructions in the same transaction. Read more information on instruction introspection.

Address: Sysvar1nstructions1111111111111111111111111
Layout: Instructions

RecentBlockhashes

The RecentBlockhashes sysvar contains the active recent blockhashes as well as their associated fee calculators. It is updated every slot. Entries are ordered by descending block height, so the first entry holds the most recent block hash, and the last entry holds an old block hash.
Address: SysvarRecentB1ockHashes11111111111111111111
Layout: RecentBlockhashes

Rent

The Rent sysvar contains the rental rate. Currently, the rate is static and set in genesis. The Rent burn percentage is modified by manual feature activation.
Address: SysvarRent111111111111111111111111111111111
Layout: Rent

SlotHashes

The SlotHashes sysvar contains the most recent hashes of the slot’s parent banks. It is updated every slot.
Address: SysvarS1otHashes111111111111111111111111111
Layout: SlotHashes

SlotHistory

The SlotHistory sysvar contains a bitvector of slots present over the last epoch. It is updated every slot.
Address: SysvarS1otHistory11111111111111111111111111
Layout: SlotHistory

StakeHistory

The StakeHistory sysvar contains the history of cluster-wide stake activations and de-activations per epoch. It is updated at the start of every epoch.
Address: SysvarStakeHistory1111111111111111111111111
Layout: StakeHistory

EpochRewards

The EpochRewards sysvar tracks the progress of epoch rewards distribution. The sysvar is created in the first block of the epoch, and lasts for several blocks while paying out the rewards. When all rewards have been distributed, the sysvar is deleted. Unlike other sysvars, which almost always exist on-chain, EpochRewards sysvar only exists during the reward period. Therefore, calling EpochRewards::get() on blocks that are outside of the reward period will return an error, i.e. UnsupportedSysvar. This can serve as a method for determining whether epoch rewards distribution has finished.

Address: SysvarEpochRewards1111111111111111111111111
Layout: EpochRewards

LastRestartSlot

The LastRestartSlot sysvar contains the slot number of the last restart or 0 (zero) if none ever happened.

Address: SysvarLastRestartS1ot1111111111111111111111
Layout: LastRestartSlot

Solana ZK Token Proof Program

The native Solana ZK Token proof program verifies a number of zero-knowledge proofs that are tailored to work with Pedersen commitments and ElGamal encryption over the elliptic curve curve25519. The program was originally designed to verify the zero-knowledge proofs that are required for the SPL Token 2022 program. However, the zero-knowledge proofs in the proof program can be used in more general contexts outside of SPL Token 2022 as well.

原生 Solana ZK 代币证明程序可验证许多零知识证明,这些证明专为与椭圆曲线 curve25519 上的 Pedersen 承诺和 ElGamal 加密配合使用而定制。该计划最初旨在验证 SPL Token 2022 计划所需的零知识证明。然而,证明程序中的零知识证明也可以在 SPL Token 2022 之外的更一般的环境中使用。

Program id: ZkTokenProof1111111111111111111111111111111
Instructions: ProofInstruction

Pedersen commitments and ElGamal encryption

The ZK Token proof program verifies zero-knowledge proofs for Pedersen commitments and ElGamal encryption, which are common cryptographic primitives that are incorporated in many existing cryptographic protocols.

ZK Token 证明程序验证 Pedersen 承诺和 ElGamal 加密的零知识证明,这些是常见的加密原语,已纳入许多现有的加密协议中。

ElGamal encryption is a popular instantiation of a public-key encryption scheme. An ElGamal keypair consists of an ElGamal public key and an ElGamal secret key. Messages can be encrypted under a public key to produce a ciphertext. A ciphertext can then be decrypted using a corresponding ElGamal secret key. The variant that is used in the proof program is the twisted ElGamal encryption over the elliptic curve curve25519.

ElGamal 加密是公钥加密方案的流行实例。 ElGamal 密钥对由 ElGamal 公钥和 ElGamal 秘密密钥组成。可以使用公钥对消息进行加密以产生密文。然后可以使用相应的 ElGamal 密钥对密文进行解密。证明程序中使用的变体是椭圆曲线 curve25519 上的扭曲 ElGamal 加密。

The Pedersen commitment scheme is a popular instantiation of a cryptographic commitment scheme. A commitment scheme allows a user to wrap a message into a commitment with a purpose of revealing the committed message later on. Like a ciphertext, the resulting commitment does not reveal any information about the containing message. At the same time, the commitment is binding in that the user cannot change the original value that is contained in a commitment.

Pedersen 承诺方案是加密承诺方案的流行实例。承诺方案允许用户将消息包装到承诺中,以便稍后揭示已提交的消息。与密文一样,生成的承诺不会泄露有关包含消息的任何信息。同时,承诺具有约束力,因为用户无法更改承诺中包含的原始值。

nterested readers can refer to the following resources for a more in-depth treatment of Pedersen commitment and the (twisted) ElGamal encryption schemes.

有兴趣的读者可以参考以下资源,更深入地探讨 Pedersen 承诺和(扭曲的)ElGamal 加密方案。

Notes on the twisted ElGamal encryption
A technical overview of the SPL Token 2022 confidential extension
Pretty Good Confidentiality research paper

The ZK Token proof program contains proof verification instructions on various zero-knowledge proofs for working with the Pedersen commitment and ElGamal encryption schemes. For example, the VerifyRangeProofU64 instruction verifies a zero-knowledge proof certifying that a Pedersen commitment contains an unsigned 64-bit number as the message. The VerifyPubkeyValidity instruction verifies a zero-knowledge proof certifying that an ElGamal public key is a properly formed public key.

ZK Token 证明程序包含有关与 Pedersen 一起使用的各种零知识证明的证明验证说明承诺和 ElGamal 加密方案。例如,VerifyRangeProofU64 指令验证零知识证明,证明 Pedersen 承诺包含无符号 64 位数字作为消息。 verifyPubkeyValidity 指令验证零知识证明,证明 ElGamal 公钥是正确形成的公钥。

Context Data

The proof data associated with each of the ZK Token proof instructions are logically divided into two parts:

  1. The context component contains the data that a zero-knowledge proof is certifying. For example, context component for a VerifyRangeProofU64 instruction data is the Pedersen commitment that holds an unsigned 64-bit number. The context component for a VerifyPubkeyValidity instruction data is the ElGamal public key that is properly formed.
  2. The proof component contains the actual mathematical pieces that certify different properties of the context data.

The ZK Token proof program processes a proof instruction in two steps:

  1. Verify the zero-knowledge proof data associated with the proof instruction.
  2. If specified in the instruction, the program stores the context data in a dedicated context state account.

The simplest way to use a proof instruction is to execute it without producing a context state account. In this case, the proof instruction can be included as part of a larger Solana transaction that contains instructions of other Solana programs. Programs should directly access the context data from the proof instruction data and use it in its program logic.
Alternatively, a proof instruction can be executed to produce a context state account. In this case, the context data associated with a proof instruction persists even after the transaction containing the proof instruction is finished with its execution. The creation of context state accounts can be useful in settings where ZK proofs are required from PDAs or when proof data is too large to fit inside a single transaction.

使用证明指令的最简单方法是在不生成上下文状态帐户的情况下执行它。在这种情况下,证明指令可以作为包含其他 Solana 程序指令的较大 Solana 交易的一部分包含在内。程序应直接从证明指令数据访问上下文数据,并在其程序逻辑中使用它。
或者,可以执行证明指令来生成上下文状态帐户。在这种情况下,即使在包含证明指令的交易完成其执行之后,与证明指令相关联的上下文数据仍然存在。在需要 PDA 提供 ZK 证明或证明数据太大而无法放入单个交易的情况下,创建上下文状态帐户非常有用。

Proof Instructions

The ZK Token proof program supports the following list of zero-knowledge proofs.

ZK Token 证明程序支持以下零知识证明列表。

Proofs on ElGamal encryption​

VerifyPubkeyValidity:
The ElGamal public-key validity proof instruction certifies that an ElGamal public-key is a properly formed public key.
Mathematical description and proof of security: [Notes]

VerifyZeroBalance:
The zero-balance proof certifies that an ElGamal ciphertext encrypts the number zero.
Mathematical description and proof of security: [Notes]

Equality proofs​
VerifyCiphertextCommitmentEquality:
The ciphertext-commitment equality proof certifies that an ElGamal ciphertext and a Pedersen commitment encode the same message.
Mathematical description and proof of security: [Notes]

VerifyCiphertextCiphertextEquality:
The ciphertext-ciphertext equality proof certifies that two ElGamal ciphertexts encrypt the same message.
Mathematical description and proof of security: [Notes]

  • 22
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

0010000100

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值