peer-to-peer networks that worked
core challenges, routing, and privacy.
This 90 minutes talk describes how peer-to-peer networks managed to interactively search 4 million computers with the bandwidth of 2004, which goals they failed to reach, and what information the networks expose as part of their operation.
It starts with the core challenges all fully distributed systems face and then describes how four peer to peer systems solved them: Gnutella (for efficient keyword search), Kademlia (for efficient hash search), BitTorrent (for its simplification), and Freenet/Hyphanet (for privacy and small world routing).
I gave it as guest talk for a resilient networking lecture at KIT. The slides are created from the same source as this article.
Intro
intro
peer-to-peer: where decentralization works/worked
DO NOT OBEY IN ADVANCE
- You are the ones who build the rules of the future.
- You are the last line of defense against horrors.
- If you do not build it, it will not be built (well enough to work).
- Whatever you are told: you are not replaceable.
Do not obey in advance.
https://media.ccc.de/v/38c3-opening-ceremony#t=1343
You knew whom you invited.
Why?
Understand p2p-networks that worked in real life
after a few days (and especially nights) of nervous full-site tinkering, it turned a 40 minute deploy process into one that lasted just 12 seconds!1
- Bittorrent-Deployment: https://vimeo.com/11280885
Understand why deployment improved but did not match expectations.
Spoiler: Cut-through routing.
Me
- Since 2004 in p2p-Development
- Since 2013 with competence
- Since 2017 Release-Manager of the Freenet/Hyphanet project
- Until 2017 PhD and PostDoc in physics
- Since 2017 Software Developer as job
Topics
Topics today
- Core challenges
- Gnutella (the first widespread, fully distributed p2p network)
- Kademlia (the most widespread DHT)
- BitTorrent
- Freenet/Hyphanet
p2p?
- peer-to-peer (p2p)
- peers (equal partners) cooperate to grant a service to each other.
Core challenges
challenges
Topics
The core challenges p2p systems have to solve or circumvent.
Core requirements
Entry: How to find the right place?
- First addresses: How to find addresses of other nodes?
- Choose connections: With whom should I connect?
Routing-Information: Which data do nodes need?
Structured vs. unstructured
- Unstructured
- First addresses: A Simple list
- Choose connections: Choose at will
- Routing-Information: Exchange explicitly
- Structured
- First addresses: Needs topology2
- Choose connections: Only some are useful
- Routing-Information: By choosing peers
- Can I reach all directly?
Search: What to look for?
- Keyword: Gnutella, Skype (before MS)
- Content-Hash: Kademlia, BitTorrent VHT, Freenet/Hyphanet
- Public Key: Freenet/Hyphanet

- BitTorrent VHT
- Distributed Hash Table, a DHT
- DHT
- Distributed Hash Table
- Public Key
- The mirror of the private key in asymmetric cryptography
Search: Where do I find what I need?
Two concepts:
- Search Paths to existing data: Gnutella
- Put the data in the right place: Kademlia, BitTorrent VHT, Freenet/Hyphanet
Distribution: How to avoid bottlenecks?
- Centralized: Streaming by the ISP via multicast
- Swarming: users take part in the distribution
- Coordinated by a central place: BitTorrent (Tracker)
- Coordinated by other users: Gnutella (Download-Mesh)
- Independently distributed chunks: Freenet/Hyphanet3

- Download-Mesh:
- name of the protocol
- Tracker:
- A server coordinating a BitTorrent-Swarm
Communication: How can information flow?
- One-to-one (PM/DM/msg/Anruf/…)
- Group discussion (Chat, Forum, Video call, …)
- Public discourse
- Learn about new content
- Metadata about content (comments, rating, …)

Resistance against disruption
- Disruption
Anything that reduces the quality of the service for users
- Required on every level
- Entry: connecting to attackers
- Search: spam, misinformation
- Distribution: poison files
- Communication: spam, harassment and censorship4
Gnutella
Gnutella
Gnutella
On March 14th, 2000, … an early version … with a note: “Justin and Tom work for Nullsoft, makers of Winamp and Shoutcast. See? AOL can bring you good things!” …
AOL ordered him to take the program down immediately … calling Gnutella an “unauthorized freelance project.” … hackers had gone … to reverse-engineer it … into the hands of the open-source community …
— The World's Most Dangerous Geek; Interviewed by David Kushner; RollingStone.com; January 13, 2004.
The simple TCP based protocol improved. By 2008 about 50 million people used it every day. It mostly disappeared after the main development companies lost in court. The tech is almost forgotten.
Topics
- Users perspective: this was Gnutella
- Entry: GWebCaches
- Search: Slow-Start + Keyword-Multicast
- Distribution: Download-Mesh
- Communication: See new files and brows the collection
- Resistance against disruption: Heuristics or rating matrices
Users perspective
- 50 million nodes
- Search by filename and ID3 tags
- Filter to see only creative commons licensed contentnp
- Search by newest files
- Download from many sources without central coordination
- Audio streaming around 2004 (“preview”)
- LimeWire, Bearshare, Shareaza, Phex, gtk-gnutella, …
How it looked

von Wikipedia: LimeWire_screen.png
Entry
Entry: addresses
- List of the recently connected good nodes
- UDP Host-Caches: Tiny servers that collected IP-lists and provided the most recent ones
Example: GhostWhiteCrab5
Entry: Connecting
HTTP-Handshake with feature negotiation, then binary over TCP socket + out of band answers via UDP.
Search and distribution
Search process
Distribution in Gnutella: Out-of-Band
Originally sent back via the search path, but:
- 5 Steps
- Average life time of a node:6 2h
- => Half the connections broke after 24 minutes
Solution: Download-Mesh, independent of the search.
Disadvantage: All IPs always visible.

Distribution: Download-Mesh
- Standard HTTP Range-Requests
- Content-Addressed: HOST/uri-res/raw/urn:sha1:HASH7
- 5 additional Headers:8
- X-Alt
- Checked source for the file, IP/Port
- X-NAlt
- Unreachable source or node with corrupted data. IP/Port
- X-Gnutella-Content-URN
- Merkle-Tree Root-Hash
- X-Thex-URI
- /uri-res/N2X?urn:sha1:HASH;MERKLE_TREE_ROOT
- X-Available-Ranges
- bytes 0-10,20-30 (example)
Limitations and summary
Communication: Weakness
- Chat never worked well
- No lasting contact to others
- Working:
- “What’s new?” (via LimeWire: See new files)
- Search collection (see all shared files)

Resistance against disruption: Heuristics as Spam-Filter
Similar to E-Mail spam filters, partially user-configurable.
Reduced Spam to 10-20% of the results.9
Resistance against disruption: object trust via Credence
- Every correctly named/good file: 1.0
- Every falsely named/bad file: -1.0
- Ratings of others multiplied with the correlation of files both rated.
With proof that spammers would have to rate enough files correctly to spread false ratings that multiple spamming parties would cancel each other out.
Never added to a mainstream program; LimeWire already in court.
Privacy and Summary
Privacy
- IP visible in searches and downloads (originally not; direct downloads as optimization)
- Ultrapeers know their leafs
- Downloader of a file knows all other downloaders (but not how much all have)
Remaining weaknesses 2010
- 10-20% Spam-Results despite 50 million users
- Credence was never widespread
- One step flooding: Windows (home)limited the connection count
- Required parameter-adaption during growth
- No comments, Peer-Chat never worked well
- Bad support for Asian fonts
Summary Gnutella
- Efficient Search for keywords
- TCP-based binary protocol, 50 mio users, 1kiB/s Leaf, 14kiB/s Ultrapeer
- Entry: WebCache-Server + Exchange QRT (like Bloom-Filters)10
- Search: Slow-Start + QRT Routing
- Distribution: Download-Mesh (mini-network per download)
- Resistance against disruption: Heuristics or object trust
Kademlia
Kademlia
Kademlia
Lookup in a distributed Hash-Table (DHT) with xor-metric.
- Users’ perspective
- Search
- Entry (uses search)
Search
Search in Kademlia
- Every node has a random ID
- Search by Hash → Distributed Hash Table
- Distance between Hash und ID via xor-Metric11
- Step by step in O(log(N)) to the nearest node
Simlar: Chord, Pastry.

Search for b91

Store
- Search for node nearest to the hash.
- STORE: Hash + Value.
Think put(key, value)
⇒ “Distributed Hash Table”.
Entry
Entry in Kademlia
- Needs contact to at least one existing node.
- Search your own ID:
FIND_NODE
(near = responsible for ID) - Response contains addresses and IDs
of nodes touched Contacted nodes also keep your address and ID.
Privacy and Sumary
Privacy
- IPs of downloaders and uploaders known to responsible nodes
- Hashes of searches & searcher IP known to nodes on the path
- All addresses must be reachable globally
Summary
- Distance: key-hash XOR node-ID
- Search: ask closest known node for better nodes
- knows more close nodes than remote nodes
- Store works like searching: store where a search would land
- Entry:
- Search for own ID
- Touched nodes use address and ID
BitTorrent Downloads
BitTorrent
BitTorrent
- Most widespread solution for swarming
- BitTorrent, IPFS, Blizzard-Updater
- Upload to get faster Downloads
- Coordinated by centralized Tracker: avoids complexity
- No decentralized search
Users’ perspective
- Information from Tracker-sites
- Download with torrent-file or Magnet-Link
- Supports folders
- Today: ipfs: Websites via BitTorrent
- Supports NAT-Traversal
- Can hide the IP via Tor (as SOCKS5 proxy)
Distribution
Concept of BitTorrent
Torrent-File
- Tracker URL(-s)
- Hashes for Chunks
- Names of the File(-s)
- Can contain folders12

Incentive to Upload
- Fraction Upload/Download is checked
- Freeloaders13 are throttled by other clients (choked: lower download rate)
- Published research gave strong focus on the incentive, in practical use the forums make more of a difference
- Tracker with login: private groups
Extended usage
Related
- VHT in addition/instead of Tracker possible (Kademlia)
- Free and Open Protocol with many implementations
- Development within the Community
- IPFS uses Torrents as decentralized cache for Websites
Privacy and summary
Privacy
- Tracker knows
- who downloads
- how much they have
- how much they upload
- Users know what the tracker tells them
- No search, except by Hash: private trackers stay private
- IP hiding via Tor is possible but blocked by most trackers
Summary
- Tracker and Clients
- Tracker: statistics and coordination
- Torrent-File with Piece-Info
Freenet/Hyphanet
Hyphanet
Freenet/Hyphanet
Censorship-resistant, privacy respecting communication on friend-to-friend network
Decentralized database with pubkey-access.
Users’ perspective
- Web-interface: decentralized websites
- Plugins with E-Mail and Microblogging
- External programs like Chat and forums with Freenet/Hyphanet as database using HTTP-like API (FCP)
Entry
Entry in Freenet/Hyphanet
- Opennet:
- Similar to Kademlia: choose known Seednode14, Seednode searches ID → node references
- Difference to Kademlia: Not just IP: referenz with keys
- Friend-to-Friend:
- Fixed connections
Knoten swap their IDs, to reconstruct
the social Small-World-Network
⇒ minimize Overlay-costs.
Small-World-Network (scale free network)
- Many short and few long connections.
- 6 deegrees of separation via snail mail: Our acquaintances formm a small-world network
- Kleinberg-Network: Probability to be connected: \(\frac{1}{d^x}\), d = distance, x = dimension.
- Freenet/Hyphanet: \(x=1\)
Search
Freenet/Hyphanet Search
- Like Kademlia, but forwarded hop by hop → no global reachability or visibility
- Can search by public key
- Keyspace: [0.0 : 0.1)


Types of Keys
- CHK: Content Hash
- KSK: Keyword Subspace: Password
- SSK: Signed Subspace: Public Key
- USK: Updatable Subspace: SSK with version
Format: XXK@routing,encryption/tarball-name/path/to/file.ext
Can omit path and name (smaller → optimization).
Distribution
Distribution in Freenet/Hyphanet
- Network saves content → distributed Cache
- Files saved encrypted, as 32 kiB fragments with 100% redundancy
- Manifest contains keys of the fragments as CHKs
- Limited lifetime: Effectively LRU-Cache:15
- Saving overwrites randomly chosen fragments
- Access restores overwritten fragments
- Upload to exsting key+path: collision
→ Effectively immutable

Usage
Freenet/Hyphanet as Database
- Search by Public Key + Path
- → personal keyspace
- → tarballs for structured data (i.e. website)
- → pub-sub-protocols on decentralized database
- → Websites, Forums, Chat, …
Optimization: Subscribe to keys to be able to watch 10k keys and see updates quickly.
Getting low latency
- Up to 1kiB, raw, realtime mode: <20s; 30s – 90s RTT
- Larger files, in manifest (folder): ~5 min
Must do it exactly right or it will be slow. Like 14kB websites.
Spam-protection
WoT (Web of Trust): One of two possibilities in actual use. The other is FMS (Freenet/Hyphanet Message System).
- ID = USK
- Trust -100 to 100
- Rank: Distance → capacity
- Rank 1 40 % to 1 %
- rank 1: 100 trust = +40 points score.
- Score: Sum of all ratings: trust * rank
- Can scale to arbitrary size at 22 messages per day and person16
Communication via Freenet/Hyphanet
Privacy and Summary
Privacy
- Seednodes can enumerate all opennet nodes.
- With pure Friend-to-Friend mode, ISPs may recognize a high volume of encrypted UDP packets.
- Directly connected peers can see that a search was started by or forwarded by a node (HTL with randomized decrement).
- With friend-to-friend mode, only trusted nodes can be directly connected. Needs key exchange with friends.
- Nodes can see whether chunks for which they are the best location are accessed but not by whom or how often.
- IDs in communication are stable pseudonyms: rotated session keys — ratchet — only in custom implementations.
Summary
- Entry: Search at a seednode for my ID
- Search: Greedy Hash search on Small World Network
- Distribution: Chunk-Tree with Redundancy
- Communication:
- Entry: Seed-keys + CAPTCHA-Queue
- Search: Index-Pages, Subscribe for Updates
- Distribution: Upload files, websites
- Propagating Trust with slowly rising visibility
Closing
Aside
Aside
- Guarantees
- Magnet-Links
- WebRTC
- Lessons learned
Guarantees
They scale by giving few guarantees. From strong to weak:
- Tampering: All networks here prevent tampering with a file being downloaded.
- Access control: You need the keys to files to gain access: hashes or public keys.
- Availability: None of the p2p networks here guarantee it:
- Data may not exist
- Connections may break
- Names may be wrong
To give availability guarantees, take CRDTs or similar as starting point for proofs.
Magnet-Links
magnet:?xt=urn:bitprint:TIGER_TREE.SHA1 &xt=urn:btih:BITTORRENT_INFO_HASH &xt=urn:sha1:HASH &xl=LENGTH &dn=NAME &as=LINK_WITHOUT_HASH &xs=LINK_WITH_HASH &kt=SEARCH_STRING
Netzwerk-unabhängig, Link zu HTTP und p2p-Quellen, weitverbreitet
WebRTC
- Runs in the Browser (Javascript)
- Provides Audio, Video, …, and Peer-Socket
- First connection moderated by server – avoids many problems
- p2p-Systems, that don’t need installation
- Example: WebTorrent https://webtorrent.io/
Lessons Learned
Thoughts
- All Upload Queues are always full. Like all disks are always full. Ask the Large Scale Data Facility at KIT/SCC.
- Optimization for ISPs often thought about: Prefer clients in same (sub-)net. Gnutella: „p4p“. Pastry (Windows) uses it according to Ghosh.
- Example for non-greedy routing19: Random Walk in ants (programm). Did not gain traction.
- Throwing money on problems: MaidSafe had 2000$ hardware cost per month. Shut that down 2019.20 Freenet/Hyphanet has <20$ cost per month.
Fallacies of distributed Systems, extended version
- The network is reliable
- The network is secure
- The network is homogeneous
- Topology does not change
- Latency is zero
- Bandwidth is infinite
- Transport cost is zero
- There is one administrator
- Hard disks don’t fail
Files stay intact - Power is stable
- IPs are reachable
- Constant factors are negligible
- APIs stay compatible
- Textfiles are simple
- Hard disks don’t fail
Legacy
Current developments
What’s happening today:
- Spritely Golem: p2p distributable content for the fediverse21
- Decentralized Internet and Privacy at FOSDEM22
- DAT, GNUnet, Fediverse, Tor, …
- In Karlsuhe: 23. Gulaschprogrammiernacht: https://entropia.de/GPN23/en 19. bis 22. Juni 2025
Summary
Summary: Core challenges
Summary: Implementations
Einstieg | Suche | |
---|---|---|
Gnutella | WebCache | Slow-Start + Keyword-Multicast |
Kademlia | Search own ID | xor-Hash-hierarchy |
BitTorrent | Tracker-URL | Kademlia / Tracker / Web |
Freenet/Hyphanet | Seed-Nodes search ID | Greedy on Small World |
WebRTC | WebRTC Server | - |
Verteilung | Störung | |
---|---|---|
Gnutella | Alt+NAlt, Range, Merkle-Tree | Heuristik/Credence |
Kademlia | various | - |
BitTorrent | Torrent | Rating on Tracker |
Freenet/Hyphanet | Chunk-Tree with Redundancy | Propagating Trust |
WebRTC | - | - |
Good luck!
My wish is that 5 years from now
some of you look back and say:
“What I learned in the p2p lecture
was one of the pillars of my success.”
Appendix
\appendix
German Backup Slides
Weiteres
Weitere Knoten finden: X-Try
Beim Handshake (wie HTTP):
When rejecting a connection, a servent MUST, if possible, provide the remote host with a list of other Gnutella hosts, so it can try connecting to them. This SHOULD be done using the X-Try header. An X-Try header can look like: X-Try:1.2.3.4:1234,3.4.5.6:3456
Weitere Knoten finden: Pong
Pong messages contains information about a Gnutella host. The message has the following fields Bytes: Description: 0-1 Port number. The port number on which the responding host can accept incoming connections. 2-5 IP Address. The IP address of the responding host. Note: This field is in big-endian format. … * When a Ping message is received (TTL>1 and it was at least one second since another Ping was received on that connection), a servent MUST, if possible, respond with a number of Pong Messages. These pongs MUST have the same message ID as the incoming ping, and a TTL no lower than the hops value of the ping.
→ http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html
Größe der Query Routing Tabellen in Gnutella
- Hashes: Normalisierte Suchwörter in der Suchanfrage oder im Dateinamen
- Größe: Variabel, Default in LimeWire 128kiB, interpolation auf größere und kleinere Tabellen möglich.
- Aktuell verfügbare Quelle: BitSetQRTTableStorage.java
- Hash-Funktion pro Suchwort: HashFunction.java
Suche 4: Dateien nach Hash finden
Gnutella Routing Experiment
- Peers: Tisch + davor + dahinter
- Letzte 2 Hops
- Suche nach Namen
- Hash = 1. Buchstabe
- QRT25: Hash der Namen der Peers
- Intra-UP QRT: QRTs der Peers, zusammengefasst
Was müsst ihr vorher austauschen?
Warum p2p?
- Skalierbarkeit
- Ein einzelner Server bricht bei etwa 100k Anfragen pro Sekunde ein. dwd bei Sturm Sabine 2020?
- Mit Nutzung wachsen
- Ähnliche Infrastruktur für 1000 Leute oder 10 Millionen Leute
- Infrastrukturkosten
- 100k€ pro Jahr = Entwickler oder Entwicklerin
Warum nicht p2p?
Schlüssel zum Licht

Störquellen
- Sammeln am Flipchart
- Quellen
- Parasiten: Bessere Leistung auf Kosten Anderer (leecher).
- Trolle: Kein Finanzinteresse, minimale Resourcen, nutzen jegliche Lücke.
- Spammer: Erfolg durch Verbreitung eigener Inhalte.28
- Konkurrenten: Erfolg durch verringerte Qualität des Systems.
- Angreifer: Erfolg durch Schädigung von Nutzern.
Weitere Eigenschaft: Grad der Verteilung
Serverkoordinierte Teilgruppen bis vollständig dezentrale Interaktion.
Suche 1: Slow-Start
„Dynamic Querying“ (DQ)
- Leaf fragt einen UP nach dem anderen. Stoppt nach „genug“ Ergebnissen (um die 100).
- UP fragt Leafs und andere UPs. Stoppt nach „genug“ Ergebnissen.
Suche 2: Keyword-Multicast
Query Routing Protocol (QRP)
- Suchwörter normalisiert:29 lowercase, keine Akzente, …
- Query Routing Table (QRT): Set mit schwachen Hashes von normalisierten Suchwörtern
- Automatisch hochskaliert für gewünschten Füllgrad
Intra-Ultrapeer-QRP:
- Vereinigung der Tabellen
Ähnlich: Bloom-Filter
Suche abschicken
<15 bytes GUID>0x00 0x80 ; message type: Query 0x07 ; TTL: 7 0x00 ; Hops 0 0x00,0x00,0x09 ; payload length, max: 4kiB 0x00,0x00 ; min speed test foo ; payload: search criteria 0x00 ; null-terminator, begins extensions
- GUID
- Globally Unique ID. Zufällig erstellt, um Schleifen zu vermeiden.
Mutability: O(1) Zugriff auf neuste Version
- Nutzende:
SSK@…/meine-seite-1/...
→SSK@…/meine-seite-2/activelink.png
- Optimiert:
USK@…/meine/seite/1
SSK@[key]/[sitename]-DATEHINT-[year]
HINT 46 2013-7-5
DATEHINT-[year], DATEHINT-[year]-WEEK-[week], DATEHINT-[year]-[month], DATEHINT-[year]-[month]-[day]
Capacity

- Rank 1 40 %. rank 1: 100 trust, 40 Punkte als Score.
- Rank 2 16 %
- Rank 3 6 %
- Rank 4 2 %
- Rank 5 und niedriger: 1 %
Integer-Mathematik: 2 * 6 / 100 = 0
.
Theoretische und gemessene Link-Längen

Swapping: Friend-to-Friend wird Small World
Mein Ziel
Ich will, dass Sie die Fähigkeiten erwerben, unter denen zu sein, die die Deployment Zeit um Größenordnungen verringern, ohne dabei die Kosten dafür zu zahlen, Torrents als Blackbox zu sehen.
- Torrent
- Bezeichnung für eine BitTorrent-Datei oder eine von BitTorrent verwaltete Datei.
- BitTorrent
- Ein p2p-System zum Verteilen großer Datenmengen; Verwaltung läuft auf zentralisierten Trackern
Projektideen
- Download-Mesh implementieren
- Nur Range-Requests + magnet für Quellen
- Quellen-Gossip via XAlt30
- Mit Merkle-Tree oder hashliste für chunks und mit XNalt
- Suche über WebRTC in Javascript
- flooding über vereinfachtes Binärprotokoll
- QRP / QRT
- Sharing als Upload in local storage
- GGEP: Generic Gnutella Extension Protocol; Binarprotokoll für beliebige Daten.
Footnotes:
Topology: structure of the network.
Reduces Swarming to downloading many single files but requires caching: saving transported files temporarily.
“The Internet treats censorship as a malfunction and routes around it.” – John Perry Barlow
gwc resource: https://github.com/gtk-gnutella/gwc
2h lifetime are surprisingly persistent.
Could this be used in a webshop? Who needs which guarantees?
Set of weak hashes of the search words, number of keys scaled and interpolated dynamically
xor-Metric: 4 xor 2 ⇒ 100 xor 010 ⇒ 110 ⇒ 6.
Freeloader: People who don’t upload. Also: „Leech“. Inverse: „Seed“.
Seednode: known node that arranges connections to others.
LRU: Least Recently Used. Delete oldest first.
CAPTCHA: Usually images with letters that are part of a watched key.
Gossip: attach information to normal communication to distribute it.
Greedy-Routing: forward requests with local information to the best node.
\tiny https://fosdem.org/ — viele Vorträge zu decentralization, privacy, …
Magnet-Links liefern Infos für Downloads in leicht kopierbarem Link.
kt=…: Suchanfrage, wurde kaum genutzt. Weiteres: https://en.wikipedia.org/wiki/Magnet_URI_scheme#Design
QRT: Query Routing Table.
dwd: Deutscher Wetterdienst.
⇒ gibt es eine einfachere Lösung?
Werbung ist Spam durch die genutzte Plattform.
ungelöst: Japanische oder Chinesische Zeichen.
XAlt/XNalt: Header, der gute / kaputte Quellen beschreibt.