Route Puts/Gets Through the Overlay - Introduction to P2P Computing

DTU Informatics

Department of Informatics and Mathematical Modelling

Step 4: Route Puts/Gets Through the Overlay

• Recursive routing: the initiator starts the process, contacted nodes forward the message

• Iterative routing: the initiator personally contact the nodes at each routing step

DTU Informatics

Department of Informatics and Mathematical Modelling

Key-Based Routing

• DHT also known as Key-Based Routing (KBR) [Chord, Pastry, Overnet, Kad, eMule]

• KBR works as follows:

‣ nodes are (randomly) assigned unique node identifiers (nodeId)

‣ given a key k, the node whose nodeId is numerically closest to k among all nodes in the network is known as the root of key k

‣ given a key k, a KBR algorithm can route a message to the root of k in a small number of hops, usually O(log n)

‣ the location of an object with id objectId is tracked by the root of k = objectId

‣ thus, one can find the location of an object by routing a message to the root of k = objectId and querying the root for the location of the object

DTU Informatics

Department of Informatics and Mathematical Modelling

Routing Around Failures (1)

• Under churn, neighbors may have failed

• To detect failures, acknowledge each hop (recursive routing)

DTU Informatics

Department of Informatics and Mathematical Modelling

Routing Around Failures (2)

• If we don't receive ack or response, resend through a diﬀerent neighbor

DTU Informatics

Department of Informatics and Mathematical Modelling

Routing Around Failures (3)

• Must compute timeouts carefully

‣ If too long, increase put/get latency

‣ If too short, get message explosion

• Parallel sending could be a design decision (see Kademlia)

DTU Informatics

Department of Informatics and Mathematical Modelling

Computing Good Timeouts

• Use TCP-style timers

‣ Keep past history of latencies

‣ Use this to compute timeouts for new requests

• Works fine for recursive lookups

‣ Only talk to neighbors, so history small, current

• In iterative lookups, source directs entire lookup

‣ Must potentially have good timeout for any node

DTU Informatics

Department of Informatics and Mathematical Modelling

Recovering from Failures

• Can't route around failures forever

‣ Will eventually run out of neighbors

• Must also find new nodes as they join

‣ Especially important if they're our immediate predecessors or successors

DTU Informatics

Department of Informatics and Mathematical Modelling

Recovering from Failures

• Reactive recovery

‣ When a node stops sending acknowledgments, notify other neighbors of potential replacements

• Proactive recovery

‣ Periodically, each node sends its neighbor list to each of its neighbors

DTU Informatics

Department of Informatics and Mathematical Modelling

DHT: Pros and Cons

• Advantages:

‣ completely decentralized (no need for superpeers)

‣ routing algorithm achieves low hop count (O(log n))

‣ storage cost per node: O(log n)

‣ if a data item is stored in the system, the DHT guarantees that the data is found

• Disadvantages:

‣ objects are tracked by unreliable nodes (which may disconnect)

‣ keyword-based searches are more diﬃcult to implement than with superpeers (because objects are located by their objectid)

‣ the overlay must be structured according to a given topology in order to achieve a low hop count

‣ routing tables must be updated every time a node joins or leaves the overlay

DTU Informatics

Department of Informatics and Mathematical Modelling

Comparison of Basic Lookup Concepts

DTU Informatics

Department of Informatics and Mathematical Modelling

Data Location: Superpeers

• Two-level overlay: use superpeers to track the locations of an object [Gnutella 2, BitTorrent]

‣ Each node connects to a superpeer and advertises the list of objects it stores

‣ Search requests are sent to the supernode, which forwards them to other super nodes

‣ Advantages: highly scalable

‣ Disadvantages:

✓superpeers must be reliable, powerful and well connected to the Internet (expensive)

✓superpeers must maintain large state

✓the system relies on a small number of superpeers

DTU Informatics

Department of Informatics and Mathematical Modelling

Data location: Superpeers Example

• A two-level overlay is a partially centralized system

• In some systems, superpeers may be disconnected (e.g., BitTorrent)

DTU Informatics

Department of Informatics and Mathematical Modelling

Data Location - Loosely Structured Overlays

• Loosely structured networks: use hints for the location of objects [Freenet]

‣ Nodes locate objects by sending search requests containing the objectId

‣ Requests are propagated using a technique similar to flooding

‣ Objects with similar identifiers are grouped on the same nodes

DTU Informatics

Department of Informatics and Mathematical Modelling

Loosely Structured Overlays (cont.)

• A search response leaves routing hints on the path back to the source

• Hints are used when propagating future requests for similar object ids

DTU Informatics

Department of Informatics and Mathematical Modelling

Loosely Structured Overlays: Pros and Cons

• Advantages:

‣ no topology constraints, flat architecture

‣ searches are more eﬃcient than with plain flooding

• Disadvantages:

‣ does not support keyword-based searches

‣ search requests have a TTL

‣ do not guarantee a low number of hops, nor that the object will be found

DTU Informatics

Department of Informatics and Mathematical Modelling

Data Location - Classification

• Classification of some (well known) P2P middleware according to structure and decentralisation

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Protocol

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella: Brief History

• Nullsoft (a subsidiary of AOL) released Gnutella on March 14th, 2000, announcing it on Slashdot

• AOL removed Gnutella from Nullsoft servers on March 15th, 2000

• After a few days, the Gnutella protocol was reverse-engineered

• Napster was shutdown in early 2001, spurring the popularity of Gnutella

• On October 2010, LimeWire (a popular client) was shutdown by court's order

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella

• Gnutella is a protocol for peer-to-peer search, consisting of:

‣ A set of message formats

✓5 basic message types

‣ A set of rules governing the exchange of messages

✓Broadcast

✓Back-propagate

✓Handshaking

‣ An hostcache for node bootstrap

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Topology: Unstructured

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Routing

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Routing

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Routing

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Routing

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Messages

• Each message is composed of:

‣ A 16-byte ID field uniquely identifying the message

✓randomly generated

✓not related to the address of the requester (anonymity)

✓used to detect duplicates and route back-propagate messages

‣ A message type field

✓PING, PONG

✓QUERY, QUERYHIT

✓PUSH(for rewalls)

‣ A Time-To-Live (TTL) Field

‣ Payload length

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Messages

• PING (broadcast)

‣ Used to maintain information about the nodes currently in the network

‣ Originally, a “who's there" flooding message

‣ A peer receiving a ping is expected to respond with a pong message

• PONG (back-propagate)

‣ A ping message has the same ID of the corresponding ping message

‣ Contains:

✓address of connected Gnutella peer

✓total size and total number of files shared by this peer

DTU Informatics

Department of Informatics and Mathematical Modelling

Gnutella Messages

• QUERY (broadcast)

‣ The primary mechanism for searching the distributed network

‣ Contains the query string

‣ A servent is expected to respond with a QUERYHIT message if a match is found against its local data set

• QUERYHIT (back-propagate)

‣ The response to a query

‣ Has the same ID of the corresponding query message

‣ Contains enough info to acquire the data matching the corresponding query

✓IP Address + port number

✓List of file names

DTU Informatics

Department of Informatics and Mathematical Modelling

Beyond the Original Gnutella

• Several problems in Gnutella 0.4 (the original one):

‣ PING-PONG traﬃc

✓More than 50% of the traﬃc generated by Gnutella 0.4 is PING-PONG related

‣ Scalability

✓Each query generates a huge amount of traﬃc - e.g. TTL = 6; d = 10 ==> 10⁶ messages

✓Potentially, each query is received multiple times from all neighbors

In document Introduction to P2P Computing (Sider 49-78)