The document provides an overview of the InterPlanetary File System (IPFS) and its key components. IPFS aims to create a distributed file system that addresses issues with the existing internet such as bandwidth, latency, offline support, and data security. It utilizes various technologies including distributed hash tables (DHTs), BitTorrent exchanges, and a Merkle directed acyclic graph (DAG) to store and retrieve versioned files in a decentralized manner. The document discusses IPFS concepts like content identifiers (CIDs), IPNS for mutable links, pinning for long-term data retention, and UnixFS for file representation. It also outlines several potential use cases for IPFS and challenges around automatic data replication.
11. The big things before IPFS
• DHT: Ditributed Hash Table
• Kademlia DHT: query is . Used widely by Gnutella and BitTorrent
• Coral DSHT: make the storage and bandwidth usage more efficient than Kademlia
• s/kademlia DHT: add PoW to prevent attack (PoW on node id gen, ).
• BitTorrent: incentified by bit-for-tat and prioritized with rare block
• Git: Merkle DAG
O(log2N )
11
13. What is DHT?
A distributed hash table (DHT) is a class of a decentralized distributed system
that provides a lookup service similar to a hash table: (key, value) pairs are
stored in a DHT, and any participating node can efficiently retrieve the value
associated with a given key. Keys are unique identifiers which map to particular
values, which in turn can be anything from addresses, to documents, to
arbitrary data.[1] Responsibility for maintaining the mapping from keys to
values is distributed among the nodes, in such a way that a change in the set of
participants causes a minimal amount of disruption.
“
13
17. IPFS & BitTorrent
• Similarity:
• exchange of data (blocks) in IPFS is inspired by BitTorrent
• tit-for-tat strategy (if you don’t share, you won’t get)
• get rare pieces first
• Difference:
• separate swarm for each file (BitTorrent), one swarm for all (BitSwap in IPFS)
17
18. IPFS & Git (copied from white paper)
1. Immutable objects represent Files (blob), Directories (tree), and Changes (commit).
2. Objects are content-addressed, by the cryptographic hash of their contents.
3. Links to other objects are embedded, forming a Merkle DAG. This provides many
useful integrity and workflow properties.
4. Most versioning metadata (branches, tags, etc.) are simply pointer references, and
thus inexpensive to create and update.
5. Version changes only update references or add objects.
6. Distributing version changes to other users is simply transferring objects and
updating remote references.
18
22. IPFS Core Parts
• Identities: node identity generation & verification
• Network: p2p
• Routing: DHT
• Exchange: BitSwap
• Objects: Merkle DAG
• Files: versioned file system like Git
• Naming: self-certifying mutable name system
22
23. Exchange: BitSwap
• peers exchange which blocks they have (have_list) and which blocks they are looking
for (want_list) upon connecting
• to decide if a node will actually share data, it will apply its BitSwap Strategy
• based on previous data exchanges between these two peers
• peers keep track of the amount of data they share (builds credit) and the amount of
data they receive (builds debt)
• kept track of in the BitSwap Ledger
• if a peer has credit (shared more than received)
• our node will send the requested block
• if a peer has debt, our node will share or not share
• depending on a deterministic function where the chance of sharing becomes smaller when the
debt is bigger
• a data exchange always starts with the exchange of the ledger, if it is not identical our
node disconnects
23
24. BitSwap Ledger
type Ledger struct {
owner NodeId
partner NodeId
bytes_sent int
bytes_recv int
timestamp Timestamp
}
24
25. BitSwap Spec
// Additional state kept
type BitSwap struct {
ledgers map[NodeId]Ledger
// Ledgers known to this node, inc inactive
active map[NodeId]Peer
// currently open connections to other nodes
need_list []Multihash
// checksums of blocks this node needs
have_list []Multihash
// checksums of blocks this node has
}
type Peer struct {
nodeid NodeId
ledger Ledger
// Ledger between the node and this peer
last_seen Timestamp
// timestamp of last received message
want_list []Multihash
// checksums of all blocks wanted by peer
// includes blocks wanted by peer’s peers
}
// Protocol interface:
interface Peer {
open (nodeid :NodeId, ledger :Ledger);
send_want_list (want_list :WantList);
send_block (block :Block) -> (complete :Bool);
25
27. Naming: add mutability
• The root address of a node is /ipns/
• The content it points to can be changed by publishing an IPFS object to this address
• By publishing, the owner of the node (the person who knows the secret key that was
generated with ipfs init) cryptographically signs this “pointer”.
• This enables other users to verify the authenticity of the object published by the
owner.
• Just like IPFS paths, IPNS paths also start with a hash, followed by a Unix-like path.
• IPNS records are announced and resolved via the DHT.
27
29. IPFS stack
• Moving the data easily and efficiently: libp2p
• Defining the data: IPLD, IPNS
• Using the data: IPFS app
29
30. Concepts
• CID: content identifier. Based on the content’s cryptographic hash.
• DNS link: use DNS TXT records to map a domain name (e.g. ipfs.io) to an IPFS address.
• IPNS: Inter-Planetary Name System is a system for creating and updating mutable
links to IPFS content. IPFS address changes everytime the content changes. A name in
IPNS is the hash of a public key.
• MFS: Mutible File System allows to treat files like a normal file system. It takes care of
all the work of updating links and hashes upon change of file.
• Pinning: IPFS nodes treads data like a cache so if you want something to be retained
long-term you can pin it.
• UinxFS: UnixFS is a data format to respresent files and all their links and metadata,
loosely based on how files work in Unix.
30
39. Wait a moment, why everything starts with Qm ?
• sha2-256
• base58
• multihash
39
40. IPFS use cases
1. As a mounted global filesystem, under /ipfs and /ipns.
2. As a mounted personal sync folder that automatically versions, publishes, and backs
up any writes.
3. As an encrypted file or data sharing system.
4. As a versioned package manager for all software.
5. As the root filesystem of a Virtual Machine.
6. As the boot filesystem of a VM (under a hypervisor).
7. As a database: applications can write directly to the Merkle DAG data model and get
all the versioning, caching, and distribution IPFS provides.
8. As a linked (and encrypted) communications platform.
9. As an integrity checked CDN for large files (without SSL).
10. As an encrypted CDN.
11. On webpages, as a web CDN.
40
41. Problems in IPFS
• Data is not automatically replicated by default
• you may lose your data if nobody is using or pinning it, see this discussion
• at the moment it serves as a filesystem cache
• ipfs cluster allows files to be pinned across a cluster
• IPFS cluster is not efficient on replication
• at the moment, either accept it
• or build your own with eraser code like Reed-Solomon algo
41