proof of storage存储证明 filecoin 复制证明

Verifying Storage on Filecoin

Verifying Storage on Filecoin


Verifying Storage on Filecoin


Related Concepts

Verifying Storage on Filecoin

a) Location addressing and content addressing

a.1) Location addressing

Location addressing points us to the location where data is stored by a specific entity.

a.2) Location addressing with URLs

URLs are based on the location where data is stored, not on the contents of the resource stored there. We call this location addressing, and it presents us with some problems.

Ultimately, the contents of a file hosted on the centralized web have no direct relationship with their location-based addresses. If we see a picture of an adorable puppy and are told it’s stored on the web, there’s no way for us to guess the URL that would lead us to the image. We can determine neither the domain, which tells us who’s hosting it, nor the filename.

a.3) content addressing

Content addressing instead provides a unique, content-derived identifier for the data, which we can use to retrieve the data from a variety of sources.

Cryptographic hashing is the most important tool in the toolbox of decentralized data structures. It opens the door to a new form of linking, known as content addressing

Cryptographic hashes can be derived from the content of the data itself, meaning that anyone using the same algorithm on the same data will arrive at the same hash.

Cryptographic hashes are unique. If Grace uses Photoshop to remove a single whisker from that kitty, the updated image will have a new hash. Simply by looking at that hash, even without access to the file itself, it will be easy to tell that the file now contains different data.

b) IPLD

IPLD is the data model of the content-addressable web. It allows us to treat all hash-linked data structures as subsets of a unified information space, unifying all data models that link data with hashes as instances of IPLD.

To make this a reality, a project called InterPlanetary Linked Data (IPLD) is developing an ecosystem of Merkle-DAG-based data formats and their formal descriptions, supporting wide-ranging data interchange.

c) Merkle DAG

Since CIDs can uniquely identify a node, we can use them to express an edge from one node to another.

Q: How to build a Merlke DAG
A:
1)The first step is to encode the leaf nodes of our graph and give each of them a CID.
PS:we’ll simplify the representation of these nodes to two attributes: the name of the file, and the data corresponding to the file’s contents. These attributes, bundled together, make up the data of our node, represented below in the orange box.

在这里插入图片描述

2)The label above the node is a simplified representation of the unique CID that’s derived by passing the data of node itself through our cryptographic hashing algorithm.
PS:this label is not a part of the node itself.

3)We can begin building our Merkle DAG by creating its leaf nodes first—one node for every file in our hierarchy—labelling each with its unique CID:

在这里插入图片描述

4)The node structure for our intermediate nodes—the subdirectories of our hierarchy—has to be a little bit different. Each of these nodes will also contain a name, corresponding to the name of the directory; however, the “content” of a directory node is the list of files and directories it contains, rather than the content of any specific file. We can represent this as a list of CIDs, each of which links to another node in the graph. This list, together with the name of the directory, constitutes the data for these nodes, and from this data we can again derive a CID, as shown below:
在这里插入图片描述

d) CID

about the anatomy of the CID itself, which is used by each of these distributed information systems as the core identifier to reference content.

关于 CID 本身的剖析,每个分布式信息系统都将其用作引用内容的核心标识符。

A content identifier, or CID, is a self-describing content-addressed identifier. It doesn’t indicate where content is stored, but it forms a kind of address based on the content itself. The number of characters in a CID depends on the cryptographic hash of the underlying content, rather than the size of the content itself.

内容标识符 (CID) 是自描述的内容寻址标识符。它不指示内容的存储位置,但它根据内容本身形成一种地址。CID 中的字符数取决于基础内容的加密哈希,而不是内容本身的大小。

Q:How to create CID
A:
The first step to creating a CID is to transform the input data, using a cryptographic algorithm that maps input of arbitrary size (data or a file) to output of a fixed size. This transformation is known as cryptographic hash digest or simply hash.

创建 CID 的第一步是使用加密算法转换输入数据,该算法将任意大小的输入(数据或文件)映射到固定大小的输出。此转换称为加密哈希摘要或简称为哈希。

Multihash

in order to support multiple cryptographic algorithms, we need to be able to know which algorithm was used to generate the hash of specific content.

为了支持多种加密算法,我们需要能够知道使用哪种算法来生成特定内容的哈希。

Multihash format

Multihashes follow the TLV pattern (type-length-value). Essentially, the “original hash” is prefixed with the type of hashing algorithm applied and the length of the hash.

本质上,“原始散列”以所应用的散列算法类型和散列长度为前缀。

type: identifier of the cryptographic algorithm used to generate the hash (e.g. the identifier of sha2-256 would be 18 - 0x12 in hexadecimal) - see the multicodec table for all the identifiers
length: the actual length of the hash (using sha2-256 it would be 256 bits, which equates to 32 bytes)
value: the actual hash value

类型:用于生成哈希的加密算法的标识符(例如,sha2-256 的标识符将是 18(十进制) - 0x12 (十六进制)
长度:哈希的实际长度(使用 sha2-256 它将是 256 位,相当于 32 个字节)
:实际的哈希值

In order to represent a CID as a compact string instead of plain binary (a series of 1s and 0s), we can use base encoding. When IPFS was first created, it used base58btc encoding to create CIDs that looked like this:

为了将CID表示为紧凑的字


以下是本篇文章正文内容

一、Preparing data for storage

Before a system file (e.g. puppy.gif) can be stored on the Filecoin network, it must first be transformed into a Filecoin Piece.

In the first stage of this transformation, the system file is chunked up with UnixFS to create an IPLD DAG (Directed Acyclic Graph). You can learn more about DAGs (a form of merkle tree) . This IPLD DAG has a payload CID, identical to an IPFS CID, which represents the root of the DAG.

The IPLD DAG is then serialized to a CAR file and bit padded to make a Filecoin Piece. (Bit padding adds extra bits to make the piece conform to a standard size.) This piece has a unique piece CID, also known as a CommP (Piece Commitment).

Since payload CIDs and piece CIDs are cryptographic hashes of the data itself, they’re unique, with identical CIDs guaranteeing identical content. Identical IPLD DAGs will produce identical payload CIDs and identical pieces will produce identical piece CIDs

二、Negotiating a storage deal and transferring data

When a client negotiates a storage deal with a miner, they’re hiring them to store a piece of data, which might be a whole or partial file. Miners store these pieces from one or more clients in sectors, the fundamental storage unit used by Filecoin. Sectors come in a variety of sizes, and a client can store data up to the largest sector size per deal.

矿工将这些来自一个或多个客户的Pieces存储到扇区(sectors)中,扇区是filecoin使用的基本存储单元。

A piece CID is wrapped with other deal parameters to create a Deal Proposal. The deal CID contains information about the data itself, in the form of the piece CID, the identities of the miner and client, and other important transaction details.

Filecoin Piece的标识piece CID会与其他交易参数进行封装,形成Deal Proposal,而这个Deal Proposal也会有唯一的CID,名为deal CID,它包含 关于数据本身的信息(piece CID的形式),矿工和客户的身份以及其他重要的交易细节。

The client sends this deal proposal to a miner, who agrees to store their data. Once the miner has confirmed, the client transfers their data to the miner. Once the miner has the data and verifies that it matches the piece CID noted in the deal proposal, they publish the deal proposal on Filecoin’s blockchain, committing both parties to the deal.

客户将deal Proposal发给矿工,一旦矿工同意数据传输给矿工,则客户就会将 数据本身 传输给 矿工,而矿工获得数据,会进行验证它是否与deal proposal中纪录的piece CID相匹配,接着他们就会将deal proposal发布到 filecoin的区块链上。双方交易达成。

三、Proof of Replication (PoRep)

(a)Filling sectors and generating the CommD

填充扇区 并生成 CommD

As the storage miner receives each piece of client data, they place it into a sector. Sectors are the fundamental units of storage in Filecoin, and can contain pieces from multiple deals and clients.

矿工将收到的每一个**piece data (FileCoin Piece)**放入一个扇区中。

Once a sector is full, a CommD (Commitment of Data, aka UnsealedSectorCID) is produced, representing the root node of all the piece CIDs contained in the sector.

一旦扇区满了,就会生成一个 数据承诺,也称为Commitment of Data或UnsealedSector CID,简称CommD。可以理解为一个扇区的唯一标识符,标识该扇区中包含的所有piece CIDs

(b)Sealing sectors and producing the CommR

密封扇区,并生成CommR

Next, a process called sealing takes place.

During sealing, the sector data (identified by the CommD) is encoded through a sequence of graph and hashing processes to create a unique replica(副本). The root hash of the merkle tree of the resulting replica is the CommRLast(副本对应的merkle根). CommRLast由矿工自己私有保管

在密封期间, 扇区数据(由CommD标识)通过一系列图形和哈希过程进行编码,以创建唯一的副本。生成的复制副本的merkle树的根hash值是 CommRLast。

The CommRLast is then hashed together with the CommC (another merkle root output from Proof of Replication). This generates the CommR (Commitment of Replication, aka SealedSectorCID), which is recorded to the public blockchain. The CommRLast is saved privately by the miner for future use in Proof of Spacetime, but is not saved to the chain.

CommRLast与CommC(另一个merkle树根,它来自复制证明)一起哈希,生成CommR,又叫复制承诺,并且纪录到 公有链上,而CommRLast被矿工私下保存,以便矿工将来用于时空证明。

The encoding process is designed to be slow and computationally heavy, making it difficult to spoof. (Note that encoding is not the same as encryption. If you want to store private data, you must encrypt it before adding it to the Filecoin network.)

编码过程设计得缓慢且计算量大,因此难以欺骗。(请注意,编码与加密不同。如果你想存储私人数据,你必须在将其添加到Filecoin网络之前对其进行加密。

The CommR offers the proof we need that the miner is storing a physically unique copy of the client’s data. If you store the same data with multiple storage miners, or make multiple storage deals for the same data with a single miner, each deal will have a different CommR.

CommR提供了客户需要的存储证据,证明矿工正在存储客户数据的物理上唯一的备份。

The sealing process also compresses the Proof of Replication using zk-SNARKs to keep the chain smaller so that it can be stored by all members of the Filecoin network for verification purposes. We’ll learn more about zk-SNARKs in a future lesson.

四、Proof of Spacetime (PoSt)

虽然复制证明运行一次以证明矿工在扇区密封时存储了数据的物理唯一副本,但时空证明(PoSt)重复运行,以证明他们随着时间的推移继续将存储空间专用于相同的数据。

PoSt builds upon several elements created during Proof of Replication: the replica, the privately saved CommRLast, and the publicly known CommR.

First, PoSt randomly selects some leaf nodes of the encoded replica and runs merkle inclusion proofs on them to show that the miner has the specific bytes that should be there. Then, the miner uses the privately stored CommRLast to prove (without revealing its value) that they know of a root for the replica which both agrees with the inclusion proofs and can be used to derive the publicly-known CommR.

The final stage of PoSt compresses these proofs into a single zk-SNARK.

Q: How does the Proof of Spacetime confirm that given data is stored over time?
A: It regularly checks to ensure that a random selection of encoded data is present in the right location.

五、Proof of Spacetime (PoSt)

Both the Proof of Replication and Proof of Spacetime processes in Filecoin use zk-SNARKs for compression.

zk-SNARKs stands for “Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge.” You can think of them as hashes of computations. They let us prove that a proof has been done correctly without having to reveal the details of the proof itself or the underlying data on which it’s based.

The process of creating Filecoin’s zk-SNARKs is computationally expensive (slow), but the resulting end product is small and the verification process is very fast. Compared to the original proofs, zk-SNARKs are tiny, making them efficient to store in a blockchain. For example, a proof that would have taken up hundreds of kilobytes on the Filecoin chain can be compressed to just 192 bytes using a zk-SNARK.

As mentioned previously, everyone running a Filecoin node maintains an up-to-date version of the chain for verification purposes. Keeping each proof small with the assistance of zk-SNARKs minimizes the storage demands placed on each node in the Filecoin network, as well as the length of time it takes to verify a transaction.

Q: Why does Filecoin use zk-SNARKs?
A: To compress proofs to keep the chain smaller

六、Verifying your deal

Once compressed, the key data needed to verify storage is stored on the Filecoin chain, a copy of which is maintained by each user running a node. This allows for Proof of Spacetime to run regularly over time.

$ lotus client list-deals

DealCid:    bafyreiefvrrv5j7omqzfersogg4nqzctyzj66rcmkwkbxxx5prvd5sklci
DealId:     2
Provider:   t01000
State:      StorageDealActive
On Chain?:  Y (epoch 59)
Slashed?:   N
PieceCID:   bafk4chzazx6u4luj34azuit37rlylgrcbgkaakqsjt5avsbolxale2igii3q
Size:       1016
Price:      1000000
Duration:   2744

DealCid: Content identifier (CID) for the deal proposal.
DealId: A unique ID for the deal.
Provider: A unique identifier for the storage provider with whom the deal was made, also known as a storage miner.
State: The state of the deal. This will most often be StorageDealActive once the data is stored and sealed. (Note that currently this will stay as StorageDealActive even after the duration of the deal expires or the miner fails a Proof of Spacetime, so it’s important to refer to the slashed field for the latter case.)
On Chain?: A boolean indicating whether the deal has been stored on the chain. If positive, this field will also indicate the epoch in which the data was stored. An epoch is a specific point on the chain. Lower numbers are further back in the history of the chain, while higher numbers are more recent.
Slashed?: A boolean indicating whether the storage provider has failed a Proof of Spacetime. (If the miner stops storing your data, this value will change to Y and the miner will be penalized.)
PieceCID: A CID (Content Identifier) representing the stored data, also known as CommP (Piece Commitment)
Size: The bytes of data being stored.
Price: The price per epoch in Filecoin Token (FIL) for the storage deal.
Duration: The total duration of the agreed deal in epochs (one iteration of the blockchain, currently equivalent to 30 seconds.


总结

Verifying Storage on Filecoin

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值