SlideShare a Scribd company logo
1 of 19
Kademlia
A distributed hash table for decentralized
peer-to-peer network
By
Priyabrata Dash (AKA Priyab Satoshi)
Twitter: twitmyreview
Medium: medium/Crypt-Bytes-Tech
Agenda
• What is Peer to Peer Network
• What is DHT
• Introduction to Kademlia
• System Design
• Academic Significance
• Uses
• Implementations
What is Peer to Peer
Network
• P2P computing is the sharing of computer resources and services
by direct exchange between systems.
• Peers are equally privileged
• Resources
• processing power
• disk storage or network bandwidth
• A distributed system architecture
• No centralized control
• Typically many nodes, but unreliable and heterogeneous
• Nodes are symmetric in function
• Take advantage of distributed, shared resources
(bandwidth, CPU, storage) on peer-nodes
• Fault-tolerant, self-organizing
• Operate in dynamic environment, frequent join and leave
is the norm
PEER-TO-
PEER VS
CLIENT-
SERVER
Types & Categories of
P2P Systems
Types
• Pure P2P system: a P2P system that has no central service of any
kind
• Hybrid P2P system: a P2P system which depends partially on
central servers or allocates selected functions to a subset of
dedicated peers
Categories
• Unstructured
• No restriction on overlay structures and data placement
• Napster, Gnutella, Kazza, Freenet, Bit torrent
• Structured
• Distributed hash tables (DHTs)
• Place restrictions on overlay structures and data placement
• Chord, Pastry, Tapestry, CAN
What is DHT (Distributed Hash
table)
• A distributed hash table (DHT) is a class of a decentralized distributed system that provides a
lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any
participating node can efficiently retrieve the value associated with a given key.
• Responsibility for maintaining the mapping from keys to values is distributed among the
nodes, in such a way that a change in the set of participants causes a minimal amount of
disruption.
• This allows a DHT to scale to extremely large numbers of nodes and to handle continual node
arrivals, departures, and failures.
• Distribute data over a large P2P network
• Quickly find any given item
• Can also distribute responsibility for data storage
• What’s stored is key/value pairs
• The key value controls which node(s) stores the value
• Each node is responsible for some section of the space
• Basic operations
• Store(key, val)
• val = Retrieve(key)
Introduction to Kademlia
• Distributed Hash Table for decentralized peer to peer
computer network designed by Petar Maymounkov and
David Mazières in 2002
• Specifies the structure of the network and the exchange of
information through node lookups.
• Kademlia nodes communicate among themselves using
UDP.
• Each node is identified by a number or node ID
• The node ID serves not only as identification, but the
Kademlia algorithm uses the node ID to locate values
(usually file hashes or keywords).
• The Kademlia network is made up of a wide range of nodes,
which interact with each other through User Datagram
Protocol (UDP). Each node on the network is identified by a
unique binary number called node ID. The node ID is used
to locate values (block of data) in the Kademlia algorithm.
The values are also interlinked within a Kademlia network
with a specific value’s key, a binary number of fixed length.
• One major goal of P2P systems is object lookup: Given a
data item X stored at some set of nodes in the system, find
it. Unlike Chord, CAN, or Pastry Kademlia uses Tree-based
routing.
Kademlia
Binary Tree
• Treat nodes as leaves of a binary
tree.
• Start from root, for any given node,
dividing the binary tree into a
series of successively lower
subtrees that don’t contain the
node.
• Every node keeps touch with at
least one node from each of its
subtrees. (if there is a node in that
subtree.) Corresponding to each
subtree, there is a k-bucket.
• Every node keeps a list of (IP-
address, Port, Node id) triples, and
(key, value) tuples for further
exchanging information with
others
Kad System Design- Distance Calculation
• Kademlia uses a "distance" calculation between two
nodes
• The closeness between two objects measured as their
bitwise XOR interpreted as an integer.
distance(a, b) = a XOR b
• Distance is computed as the (XOR) of the two node IDs
• Keys and Node IDs have the same format and length
• Exclusive or was chosen because it acts as a distance
function between all the node IDs.
• Specifically:
• The distance between a node and itself is zero
• It is symmetric: the "distances" calculated from
A to B and from B to A are the same
• It follows the triangle inequality: given A, B and
C are vertices (points) of a triangle, then the
distance from A to B is shorter than (or equal to)
the sum of the distance from A to C and the
distance from C to B.
• A basic Kademlia network with 2n nodes will only take
n steps (in the worst case) to find that node.
Kad System Design- Routing tables
• Nodes, files and key words, deploy SHA-1 hash
into a 160 bits space.
• Every entry in a list holds the necessary data
to locate another node. Every node maintains
information about files, key words close to
itself.
• Data in list contains
IP address, port, and node ID of another node
• The nth list must have a differing nth bit from
the node's ID
• The first n-1 bits of the candidate ID must
match those of the node's ID
• First list as 1/2 of the nodes in the network are
far away candidates
• The next list can use only 1/4 of the nodes in
the network (one bit closer than the first), etc.
Kad System Design- Protocol messages
• Kademlia has four messages
• PING — used to verify that a node is still alive.
• STORE — Stores a (key, value) pair in one node.
• FIND_NODE — The recipient of the request will return the k nodes in his own buckets that
are the closest ones to the requested key.
• FIND_VALUE — Same as FIND_NODE, but if the recipient of the request has the requested
key in its store, it will return the corresponding value.
• Each RPC message includes a random value from the initiator. This ensures that
when the response is received it corresponds to the request previously sent
• Every node keeps touch with at least one node from each of its subtrees. (if
there is a node in that subtree.) Corresponding to each subtree, there is a k-
bucket.
• Every node keeps a list of (IP-address, Port, Node id) triples, and (key, value)
tuples for further exchanging information with others
Kad System Design-
Locating nodes
• The most important is to locate the k closest nodes to some given
node ID.
• Kademlia employs a recursive algorithm for node lookups. The
lookup initiator starts by picking a nodes from its closest non-
empty k-bucket.
• The initiator then sends parallel, asynchronous FIND_NODE to the
α nodes it has chosen.
• α is a system-wide concurrency parameter, such as 3.
• the initiator resends the FIND_NODE to nodes it has learned about
from previous RPCs.
• If a round of FIND_NODES fails to return a node any closer than
the closest already seen, the initiator resends the FIND_NODE to
all of the k closest nodes it has not already queried.
• The lookup could terminate when the initiator has queried and
gotten responses from the k closest nodes it has seen.
Kad System Design- Locating resources
• Information is located by mapping it to a key. A hash is typically used for the map. The storer
nodes will have information due to a previous STORE message. Locating a value follows the
same procedure as locating the closest nodes to a key, except the search terminates when a
node has the requested value in his store and returns this value.
• The values are stored at several nodes (k of them) to allow for nodes to come and go and still
have the value available in some node. Periodically, a node that stores a value will explore the
network to find the k nodes that are close to the key value and replicate the value onto them.
This compensates for disappeared nodes.
• Also, for popular values that might have many requests, the load in the storer nodes is
diminished by having a retriever store this value in some node near, but outside of, the k
closest ones. This new storing is called a cache. In this way the value is stored farther and
farther away from the key, depending on the quantity of requests. This allows popular
searches to find a storer more quickly. Because the value is returned from nodes farther away
from the key, this alleviates possible "hot spots". Caching nodes will drop the value after a
certain time depending on their distance from the key.
• Some implementations (e.g. Kad) do not have replication nor caching. The purpose of this is
to remove old information quickly from the system. The node that is providing the file will
periodically refresh the information onto the network (perform FIND_NODE and STORE
messages). When all of the nodes having the file go offline, nobody will be refreshing its
values (sources and keywords) and the information will eventually disappear from the
network.
Kad System Design- Joining the network
• A node that would like to join the net must first go through a bootstrap process. In this phase,
the joining node needs to know the IP address and port of another node—a bootstrap node
(obtained from the user, or from a stored list)—that is already participating in the Kademlia
network. If the joining node has not yet participated in the network, it computes a random ID
number that is supposed not to be already assigned to any other node. It uses this ID until
leaving the network.
• The joining node inserts the bootstrap node into one of its k-buckets. The joining node then
does a FIND_NODE of its own ID against the bootstrap node (the only other node it knows).
The "self-lookup" will populate other nodes' k-buckets with the new node ID, and will
populate the joining node's k-buckets with the nodes in the path between it and the
bootstrap node. After this, the joining node refreshes all k-buckets further away than the k-
bucket the bootstrap node falls in. This refresh is just a lookup of a random key that is within
that k-bucket range.
• Initially, nodes have one k-bucket. When the k-bucket becomes full, it can be split. The split
occurs if the range of nodes in the k-bucket spans the node's own id (values to the left and
right in a binary tree). Kademlia relaxes even this rule for the one "closest nodes" k-bucket,
because typically one single bucket will correspond to the distance where all the nodes that
are the closest to this node are, they may be more than k, and we want it to know them all. It
may turn out that a highly unbalanced binary sub-tree exists near the node. If k is 20, and
there are 21+ nodes with a prefix "xxx0011....." and the new node is "xxx000011001", the
new node can contain multiple k-buckets for the other 21+ nodes. This is to guarantee that
the network knows about all nodes in the closest region.
Academic Significance & benefits
• While the XOR metric is not needed to understand Kademlia, it is critical in the analysis of the protocol.
• The XOR arithmetic forms an abelian group allowing closed analysis. Other DHT protocols and algorithms
require simulation or complicated formal analysis in order to predict network behavior and correctness.
• Using groups of bits as routing information also simplifies the algorithms.
• Since the network is distributed, by definition, there's no one master table of ID->address mappings.
• Nodes don't have to (and usually don't) know about all the other nodes.
• The process of "finding" a node is basically to ask known nodes "closest" to the target not so much about
the target node directly, but about what nodes are closer to the target.
• The result of that query gives you the next group of nodes to query, and the process repeats -- and because
a node would return results that are closer than it is, each iteration tends to find nodes closer and closer to
the target till you finally reach a node that can say "Oh, node X? He's right over there.“
• The nodes in the k-buckets are the stepping stones of routing.
• By relying on the oldest nodes, k-buckets promise the probability that they will remain online.
• DoS attack is prevented since the new nodes find it difficult to get into the k-bucket
Uses
• Some of the Use cases of Kademlia are given below:
• File Sharing
• Botnet
• File System
• IP Overlay Routing
• P2P Network Discovery
• Peer to Peer Messaging
• Kademlia is used in file sharing networks.
• If a node wants to share a file, it processes the contents of the file, calculating from it a
number (hash) that will identify this file within the file-sharing network
• Hashes and the node IDs must be of the same length
• It then searches for several nodes whose ID is close to the hash
• A searching client will use Kademlia to search the network for the node whose ID has the
smallest distance to the file hash, then will retrieve the sources list that is stored in that node
Implementation
• I2P[7]
• Kad Network — developed originally by the eMule community to replace the server-based architecture of the eDonkey2000
network.
• Ethereum — The node discovery protocol in Ethereum's blockchain network stack is based a slightly modified implementation of
Kademlia.[8]
• Overnet network: With KadC a C library for handling its Kademlia is available. (development of Overnet is discontinued)
• BitTorrent Uses a DHT based on an implementation of the Kademlia algorithm, for trackerless torrents.
• Osiris sps (all version): used to manage distributed and anonymous web portal.
• Retroshare — F2F decentralised communication platform with secure VOIP, instant messaging, file transfer etc.
• Tox - A fully distributed messaging, VoIP and video chat platform
• Gnutella DHT — Originally by LimeWire[9][10] to augment the Gnutella protocol for finding alternate file locations, now in use by
other gnutella clients.[11]
• IPFS − A peer-to-peer distributed filesystem.[12]
• TeleHash is a mesh networking protocol that uses Kademlia to resolve direct connections between parties.[13]
• IPFS
• OpenDHT
Reference
• Pease refer below link to access some of the references I used for this
presentation:
https://www.reddit.com/r/indiacryptogrp/comments/8ej9zv/reading
_list_kademlia/
Thank You & QA

More Related Content

What's hot

Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
AbDul ThaYyal
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
Viet-Trung TRAN
 
Group Communication (Distributed computing)
Group Communication (Distributed computing)Group Communication (Distributed computing)
Group Communication (Distributed computing)
Sri Prasanna
 

What's hot (20)

Peer to Peer services and File systems
Peer to Peer services and File systemsPeer to Peer services and File systems
Peer to Peer services and File systems
 
Distance vector routing
Distance vector routingDistance vector routing
Distance vector routing
 
Secure socket layer
Secure socket layerSecure socket layer
Secure socket layer
 
Transmission Control Protocol (TCP)
Transmission Control Protocol (TCP)Transmission Control Protocol (TCP)
Transmission Control Protocol (TCP)
 
IPC
IPCIPC
IPC
 
SMTP Simple Mail Transfer Protocol
SMTP Simple Mail Transfer ProtocolSMTP Simple Mail Transfer Protocol
SMTP Simple Mail Transfer Protocol
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
Secure Socket Layer
Secure Socket LayerSecure Socket Layer
Secure Socket Layer
 
Digital Signature Standard
Digital Signature StandardDigital Signature Standard
Digital Signature Standard
 
Kerberos
KerberosKerberos
Kerberos
 
signal encoding techniques
signal encoding techniquessignal encoding techniques
signal encoding techniques
 
Ports & sockets
Ports  & sockets Ports  & sockets
Ports & sockets
 
Message digest 5
Message digest 5Message digest 5
Message digest 5
 
Google File System
Google File SystemGoogle File System
Google File System
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Transport Layer Security (TLS)
Transport Layer Security (TLS)Transport Layer Security (TLS)
Transport Layer Security (TLS)
 
Group Communication (Distributed computing)
Group Communication (Distributed computing)Group Communication (Distributed computing)
Group Communication (Distributed computing)
 
CS6551 COMPUTER NETWORKS
CS6551 COMPUTER NETWORKSCS6551 COMPUTER NETWORKS
CS6551 COMPUTER NETWORKS
 
Network layer tanenbaum
Network layer tanenbaumNetwork layer tanenbaum
Network layer tanenbaum
 
Unit 3 Network Layer PPT
Unit 3 Network Layer PPTUnit 3 Network Layer PPT
Unit 3 Network Layer PPT
 

Similar to Kademlia introduction

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 

Similar to Kademlia introduction (20)

Peer to peer Paradigms
Peer to peer ParadigmsPeer to peer Paradigms
Peer to peer Paradigms
 
Tapestry
TapestryTapestry
Tapestry
 
Implementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and AkkaImplementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and Akka
 
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
PWL Seattle #16 - Chord: A Scalable Peer-to-peer Lookup Protocol for Internet...
 
UNIT III DIS.pptx
UNIT III DIS.pptxUNIT III DIS.pptx
UNIT III DIS.pptx
 
IRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT ProtocolsIRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT Protocols
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Cassandra
CassandraCassandra
Cassandra
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
 
Agents and P2P Networks
Agents and P2P NetworksAgents and P2P Networks
Agents and P2P Networks
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Routing Protocols of Distributed Hash Table Based Peer to Peer Networks
Routing Protocols of Distributed Hash Table Based Peer to Peer NetworksRouting Protocols of Distributed Hash Table Based Peer to Peer Networks
Routing Protocols of Distributed Hash Table Based Peer to Peer Networks
 
CS8603 DS UNIT 5.pptx
CS8603 DS UNIT 5.pptxCS8603 DS UNIT 5.pptx
CS8603 DS UNIT 5.pptx
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
lecture-2-tcp-ip.ppt
lecture-2-tcp-ip.pptlecture-2-tcp-ip.ppt
lecture-2-tcp-ip.ppt
 
P2P Lecture.ppt
P2P Lecture.pptP2P Lecture.ppt
P2P Lecture.ppt
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 

More from Priyab Satoshi

More from Priyab Satoshi (16)

Introduction to Chatbots
Introduction to ChatbotsIntroduction to Chatbots
Introduction to Chatbots
 
Introduction to IOT security
Introduction to IOT securityIntroduction to IOT security
Introduction to IOT security
 
Introduction to State Channels & Payment Channels
Introduction to State Channels & Payment ChannelsIntroduction to State Channels & Payment Channels
Introduction to State Channels & Payment Channels
 
Introduction to GDPR
Introduction to GDPRIntroduction to GDPR
Introduction to GDPR
 
Cryptocurrency & ICO Regulations in US
Cryptocurrency & ICO Regulations in USCryptocurrency & ICO Regulations in US
Cryptocurrency & ICO Regulations in US
 
Online privacy & security
Online privacy & securityOnline privacy & security
Online privacy & security
 
Introduction to Cognitive Automation
Introduction to Cognitive AutomationIntroduction to Cognitive Automation
Introduction to Cognitive Automation
 
Robotic process automation Introduction
Robotic process automation IntroductionRobotic process automation Introduction
Robotic process automation Introduction
 
Decentralised Exchanges - An Introduction
Decentralised Exchanges - An IntroductionDecentralised Exchanges - An Introduction
Decentralised Exchanges - An Introduction
 
Introduction to Segwit
Introduction to SegwitIntroduction to Segwit
Introduction to Segwit
 
On-chain Crowdfunding & Asset Token
On-chain Crowdfunding & Asset Token On-chain Crowdfunding & Asset Token
On-chain Crowdfunding & Asset Token
 
Introduction to blockchain
Introduction to blockchainIntroduction to blockchain
Introduction to blockchain
 
Blockchain and Decentralization
Blockchain and DecentralizationBlockchain and Decentralization
Blockchain and Decentralization
 
Erc 721 tokens
Erc 721 tokensErc 721 tokens
Erc 721 tokens
 
Cryptocurrency & Regulatory Environment
Cryptocurrency & Regulatory EnvironmentCryptocurrency & Regulatory Environment
Cryptocurrency & Regulatory Environment
 
Understanding blockchain
Understanding blockchainUnderstanding blockchain
Understanding blockchain
 

Recently uploaded

Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理
F
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
F
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理
F
 

Recently uploaded (20)

Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
💚 Call Girls Bahraich 9332606886 High Profile Call Girls You Can Get The S...
💚 Call Girls Bahraich   9332606886  High Profile Call Girls You Can Get The S...💚 Call Girls Bahraich   9332606886  High Profile Call Girls You Can Get The S...
💚 Call Girls Bahraich 9332606886 High Profile Call Girls You Can Get The S...
 
Research Assignment - NIST SP800 [172 A] - Presentation.pptx
Research Assignment - NIST SP800 [172 A] - Presentation.pptxResearch Assignment - NIST SP800 [172 A] - Presentation.pptx
Research Assignment - NIST SP800 [172 A] - Presentation.pptx
 
一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理一比一原版帝国理工学院毕业证如何办理
一比一原版帝国理工学院毕业证如何办理
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
PIC Microcontroller Structure & Assembly Language.ppsx
PIC Microcontroller Structure & Assembly Language.ppsxPIC Microcontroller Structure & Assembly Language.ppsx
PIC Microcontroller Structure & Assembly Language.ppsx
 
Leading-edge AI Image Generators of 2024
Leading-edge AI Image Generators of 2024Leading-edge AI Image Generators of 2024
Leading-edge AI Image Generators of 2024
 
Sensual Call Girls in Tarn Taran Sahib { 9332606886 } VVIP NISHA Call Girls N...
Sensual Call Girls in Tarn Taran Sahib { 9332606886 } VVIP NISHA Call Girls N...Sensual Call Girls in Tarn Taran Sahib { 9332606886 } VVIP NISHA Call Girls N...
Sensual Call Girls in Tarn Taran Sahib { 9332606886 } VVIP NISHA Call Girls N...
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Local Call Girls in Gomati 9332606886 HOT & SEXY Models beautiful and charmi...
Local Call Girls in Gomati  9332606886 HOT & SEXY Models beautiful and charmi...Local Call Girls in Gomati  9332606886 HOT & SEXY Models beautiful and charmi...
Local Call Girls in Gomati 9332606886 HOT & SEXY Models beautiful and charmi...
 
一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理一比一原版犹他大学毕业证如何办理
一比一原版犹他大学毕业证如何办理
 
Call girls Service Canacona - 8250092165 Our call girls are sure to provide y...
Call girls Service Canacona - 8250092165 Our call girls are sure to provide y...Call girls Service Canacona - 8250092165 Our call girls are sure to provide y...
Call girls Service Canacona - 8250092165 Our call girls are sure to provide y...
 

Kademlia introduction

  • 1. Kademlia A distributed hash table for decentralized peer-to-peer network By Priyabrata Dash (AKA Priyab Satoshi) Twitter: twitmyreview Medium: medium/Crypt-Bytes-Tech
  • 2. Agenda • What is Peer to Peer Network • What is DHT • Introduction to Kademlia • System Design • Academic Significance • Uses • Implementations
  • 3. What is Peer to Peer Network • P2P computing is the sharing of computer resources and services by direct exchange between systems. • Peers are equally privileged • Resources • processing power • disk storage or network bandwidth • A distributed system architecture • No centralized control • Typically many nodes, but unreliable and heterogeneous • Nodes are symmetric in function • Take advantage of distributed, shared resources (bandwidth, CPU, storage) on peer-nodes • Fault-tolerant, self-organizing • Operate in dynamic environment, frequent join and leave is the norm
  • 5. Types & Categories of P2P Systems Types • Pure P2P system: a P2P system that has no central service of any kind • Hybrid P2P system: a P2P system which depends partially on central servers or allocates selected functions to a subset of dedicated peers Categories • Unstructured • No restriction on overlay structures and data placement • Napster, Gnutella, Kazza, Freenet, Bit torrent • Structured • Distributed hash tables (DHTs) • Place restrictions on overlay structures and data placement • Chord, Pastry, Tapestry, CAN
  • 6. What is DHT (Distributed Hash table) • A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. • Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. • This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. • Distribute data over a large P2P network • Quickly find any given item • Can also distribute responsibility for data storage • What’s stored is key/value pairs • The key value controls which node(s) stores the value • Each node is responsible for some section of the space • Basic operations • Store(key, val) • val = Retrieve(key)
  • 7. Introduction to Kademlia • Distributed Hash Table for decentralized peer to peer computer network designed by Petar Maymounkov and David Mazières in 2002 • Specifies the structure of the network and the exchange of information through node lookups. • Kademlia nodes communicate among themselves using UDP. • Each node is identified by a number or node ID • The node ID serves not only as identification, but the Kademlia algorithm uses the node ID to locate values (usually file hashes or keywords). • The Kademlia network is made up of a wide range of nodes, which interact with each other through User Datagram Protocol (UDP). Each node on the network is identified by a unique binary number called node ID. The node ID is used to locate values (block of data) in the Kademlia algorithm. The values are also interlinked within a Kademlia network with a specific value’s key, a binary number of fixed length. • One major goal of P2P systems is object lookup: Given a data item X stored at some set of nodes in the system, find it. Unlike Chord, CAN, or Pastry Kademlia uses Tree-based routing.
  • 8. Kademlia Binary Tree • Treat nodes as leaves of a binary tree. • Start from root, for any given node, dividing the binary tree into a series of successively lower subtrees that don’t contain the node. • Every node keeps touch with at least one node from each of its subtrees. (if there is a node in that subtree.) Corresponding to each subtree, there is a k-bucket. • Every node keeps a list of (IP- address, Port, Node id) triples, and (key, value) tuples for further exchanging information with others
  • 9. Kad System Design- Distance Calculation • Kademlia uses a "distance" calculation between two nodes • The closeness between two objects measured as their bitwise XOR interpreted as an integer. distance(a, b) = a XOR b • Distance is computed as the (XOR) of the two node IDs • Keys and Node IDs have the same format and length • Exclusive or was chosen because it acts as a distance function between all the node IDs. • Specifically: • The distance between a node and itself is zero • It is symmetric: the "distances" calculated from A to B and from B to A are the same • It follows the triangle inequality: given A, B and C are vertices (points) of a triangle, then the distance from A to B is shorter than (or equal to) the sum of the distance from A to C and the distance from C to B. • A basic Kademlia network with 2n nodes will only take n steps (in the worst case) to find that node.
  • 10. Kad System Design- Routing tables • Nodes, files and key words, deploy SHA-1 hash into a 160 bits space. • Every entry in a list holds the necessary data to locate another node. Every node maintains information about files, key words close to itself. • Data in list contains IP address, port, and node ID of another node • The nth list must have a differing nth bit from the node's ID • The first n-1 bits of the candidate ID must match those of the node's ID • First list as 1/2 of the nodes in the network are far away candidates • The next list can use only 1/4 of the nodes in the network (one bit closer than the first), etc.
  • 11. Kad System Design- Protocol messages • Kademlia has four messages • PING — used to verify that a node is still alive. • STORE — Stores a (key, value) pair in one node. • FIND_NODE — The recipient of the request will return the k nodes in his own buckets that are the closest ones to the requested key. • FIND_VALUE — Same as FIND_NODE, but if the recipient of the request has the requested key in its store, it will return the corresponding value. • Each RPC message includes a random value from the initiator. This ensures that when the response is received it corresponds to the request previously sent • Every node keeps touch with at least one node from each of its subtrees. (if there is a node in that subtree.) Corresponding to each subtree, there is a k- bucket. • Every node keeps a list of (IP-address, Port, Node id) triples, and (key, value) tuples for further exchanging information with others
  • 12. Kad System Design- Locating nodes • The most important is to locate the k closest nodes to some given node ID. • Kademlia employs a recursive algorithm for node lookups. The lookup initiator starts by picking a nodes from its closest non- empty k-bucket. • The initiator then sends parallel, asynchronous FIND_NODE to the α nodes it has chosen. • α is a system-wide concurrency parameter, such as 3. • the initiator resends the FIND_NODE to nodes it has learned about from previous RPCs. • If a round of FIND_NODES fails to return a node any closer than the closest already seen, the initiator resends the FIND_NODE to all of the k closest nodes it has not already queried. • The lookup could terminate when the initiator has queried and gotten responses from the k closest nodes it has seen.
  • 13. Kad System Design- Locating resources • Information is located by mapping it to a key. A hash is typically used for the map. The storer nodes will have information due to a previous STORE message. Locating a value follows the same procedure as locating the closest nodes to a key, except the search terminates when a node has the requested value in his store and returns this value. • The values are stored at several nodes (k of them) to allow for nodes to come and go and still have the value available in some node. Periodically, a node that stores a value will explore the network to find the k nodes that are close to the key value and replicate the value onto them. This compensates for disappeared nodes. • Also, for popular values that might have many requests, the load in the storer nodes is diminished by having a retriever store this value in some node near, but outside of, the k closest ones. This new storing is called a cache. In this way the value is stored farther and farther away from the key, depending on the quantity of requests. This allows popular searches to find a storer more quickly. Because the value is returned from nodes farther away from the key, this alleviates possible "hot spots". Caching nodes will drop the value after a certain time depending on their distance from the key. • Some implementations (e.g. Kad) do not have replication nor caching. The purpose of this is to remove old information quickly from the system. The node that is providing the file will periodically refresh the information onto the network (perform FIND_NODE and STORE messages). When all of the nodes having the file go offline, nobody will be refreshing its values (sources and keywords) and the information will eventually disappear from the network.
  • 14. Kad System Design- Joining the network • A node that would like to join the net must first go through a bootstrap process. In this phase, the joining node needs to know the IP address and port of another node—a bootstrap node (obtained from the user, or from a stored list)—that is already participating in the Kademlia network. If the joining node has not yet participated in the network, it computes a random ID number that is supposed not to be already assigned to any other node. It uses this ID until leaving the network. • The joining node inserts the bootstrap node into one of its k-buckets. The joining node then does a FIND_NODE of its own ID against the bootstrap node (the only other node it knows). The "self-lookup" will populate other nodes' k-buckets with the new node ID, and will populate the joining node's k-buckets with the nodes in the path between it and the bootstrap node. After this, the joining node refreshes all k-buckets further away than the k- bucket the bootstrap node falls in. This refresh is just a lookup of a random key that is within that k-bucket range. • Initially, nodes have one k-bucket. When the k-bucket becomes full, it can be split. The split occurs if the range of nodes in the k-bucket spans the node's own id (values to the left and right in a binary tree). Kademlia relaxes even this rule for the one "closest nodes" k-bucket, because typically one single bucket will correspond to the distance where all the nodes that are the closest to this node are, they may be more than k, and we want it to know them all. It may turn out that a highly unbalanced binary sub-tree exists near the node. If k is 20, and there are 21+ nodes with a prefix "xxx0011....." and the new node is "xxx000011001", the new node can contain multiple k-buckets for the other 21+ nodes. This is to guarantee that the network knows about all nodes in the closest region.
  • 15. Academic Significance & benefits • While the XOR metric is not needed to understand Kademlia, it is critical in the analysis of the protocol. • The XOR arithmetic forms an abelian group allowing closed analysis. Other DHT protocols and algorithms require simulation or complicated formal analysis in order to predict network behavior and correctness. • Using groups of bits as routing information also simplifies the algorithms. • Since the network is distributed, by definition, there's no one master table of ID->address mappings. • Nodes don't have to (and usually don't) know about all the other nodes. • The process of "finding" a node is basically to ask known nodes "closest" to the target not so much about the target node directly, but about what nodes are closer to the target. • The result of that query gives you the next group of nodes to query, and the process repeats -- and because a node would return results that are closer than it is, each iteration tends to find nodes closer and closer to the target till you finally reach a node that can say "Oh, node X? He's right over there.“ • The nodes in the k-buckets are the stepping stones of routing. • By relying on the oldest nodes, k-buckets promise the probability that they will remain online. • DoS attack is prevented since the new nodes find it difficult to get into the k-bucket
  • 16. Uses • Some of the Use cases of Kademlia are given below: • File Sharing • Botnet • File System • IP Overlay Routing • P2P Network Discovery • Peer to Peer Messaging • Kademlia is used in file sharing networks. • If a node wants to share a file, it processes the contents of the file, calculating from it a number (hash) that will identify this file within the file-sharing network • Hashes and the node IDs must be of the same length • It then searches for several nodes whose ID is close to the hash • A searching client will use Kademlia to search the network for the node whose ID has the smallest distance to the file hash, then will retrieve the sources list that is stored in that node
  • 17. Implementation • I2P[7] • Kad Network — developed originally by the eMule community to replace the server-based architecture of the eDonkey2000 network. • Ethereum — The node discovery protocol in Ethereum's blockchain network stack is based a slightly modified implementation of Kademlia.[8] • Overnet network: With KadC a C library for handling its Kademlia is available. (development of Overnet is discontinued) • BitTorrent Uses a DHT based on an implementation of the Kademlia algorithm, for trackerless torrents. • Osiris sps (all version): used to manage distributed and anonymous web portal. • Retroshare — F2F decentralised communication platform with secure VOIP, instant messaging, file transfer etc. • Tox - A fully distributed messaging, VoIP and video chat platform • Gnutella DHT — Originally by LimeWire[9][10] to augment the Gnutella protocol for finding alternate file locations, now in use by other gnutella clients.[11] • IPFS − A peer-to-peer distributed filesystem.[12] • TeleHash is a mesh networking protocol that uses Kademlia to resolve direct connections between parties.[13] • IPFS • OpenDHT
  • 18. Reference • Pease refer below link to access some of the references I used for this presentation: https://www.reddit.com/r/indiacryptogrp/comments/8ej9zv/reading _list_kademlia/