Why Gnutella scales quite well

You might have read in some (almost ancient) papers, that a network like Gnutella can't scale. So I want to show you, why the current Version of Gnutella does scale, and does it well.

In earlier versions, up to v0.4, Gnutella was a a pure broadcast network. That means, that every search request did reach every participant, so the number of search requests hitting each node was for an optimal network exactly equal to the number of requests, made by nodes who were in the network. And you can see easily why that can't scale.
But that was only true for Gnutella 0.4.

In the current incarnation of Gnutella (Gnutella 0.6), Gnutella is no longer a pure Broadcast network. Instead, only the smallest percentage of the traffic is done via broadcast.

If you want to read about the methods used to realize this, please have a look at the GnuFU guide (english, german).

Here I want to limit it to the statement, that the first two hops of a search request are governed via Dynamic Querying, which stops the request as soon as it has enough sources (this stops a search as soon as it gets about 250 results), and that the last two hops are governed via the Query Routing Protocol, which ensures, that a search request reaches only those hosts, which can actually have the file (which is only about 5% of the nodes).

So in todays reality, Gnutella is a quite structured and very flexible network.

To scale it, Ultrapeers can increase their number of connections from their current 32 upwards, which makes Dynamic Querying (DQ) and the Query Routing Protocol (QRP) even more effective.

In the case of DQ most queries for popular files will still provide enough results after the same number of clients have been contacted, so increasing the number of connections won't change the network traffic at all which is caused by the first two steps.

In the case of QRP, queries wil still only reach the hosts, which can have the file, and if Ultrapeers are connected to more nodes at the same time (by increasing the number of connections), it will provide more results for each connection, so DQ will stop even earlier than with fewer connections per Ultrapeer.

So Gnutella is now far from a broadcast model, and the act of increasing the size of the Gnutella Network can even increase its efficiency for popular files.

For rare files, QRP kicks in with full force, and even though DQ will likely check all other nodes for content, QRP will make sure that only those nodes are reached, which can have the content, which might be only 0.1% of the net or even far less.

Here, increasing the number of nodes per Ultrapeer means that nodes with rare files are in effect closer to you than before, so Gnutella also gets more efficient when you increase the network size, when rare file searches are your major concern.

So you can see, that Gnutella has become a network, which scales extremly well for keyword searches, and due to that it can also very efficiently be used to search for metadata and similar concepts.

The only thing which Gnutella can't do well are searches for strings which aren't seperate words (for example file-hashes), because that kills QRP, so they will likely not reach (m)any hosts. For these types of searches, the Gnutella developers work on a DHT (Distributed Hash Table), which will only be used, if the string can't be split into seperate words, and that DHT will most likely be Kademlia, which is also proven to work quite well.

And with that, the only problem which remains in need of fixing is spam, because that inhibits DQ when you do a rare search, but I am sure that the devs will also find a way to stop spamming, and even with spam, Gnutella is quite effective and consumes very little bandwidth, when you are acting as a leaf, and only moderate bandwidth when you are acting as ultrapeer.

Some figures as finishing touch:

  • Leaf network traffic: About 1kB/s if you add outgoing and incoming traffic, which is about the seventh part of the speed of a 56k modem.
  • Ultrapeer traffic: About 7kB/s, outgoing and incoming added together, which is about one full ISDN line of less than 1/8th of a DSLs outgoing speed.

Have fun with Gnutella!
- ArneBab 08:14, 15. Nov 2006 (CET)

PS: This guide ignores, that requests must travel through intermediate nodes. But since those nodes make up only about 3% of the network and only 3% of those nodes will be reached by a (QRP-routed) rare file request, it seems safe to ignore these 0.1% of the network in the calculations for the sake of making it easier to follow them mentally (QRP takes care of that).

Inhalt abgleichen
Willkommen im Weltenwald!

Beliebte Inhalte

sn.1w6.org news