BitTorrent Protocol - 180D-FW-2024/Knowledge-Base-Wiki GitHub Wiki
Introduction
BitTorrent (also known as torrent) is a popular open source P2P (peer to peer) file sharing protocol used in many applications for both consumers and companies.
At a high level, the downloaders (leechers) use a torrent client and .torrent
files to grab data from uploaders (seeders) across a decentralized network. The decentralization of torrents allow for high peer scalability, efficiency, and privacy which add to the allure of BitTorrent over other protocols.
Rather than one big file at once, the peers of a torrent file share bits and pieces of the file over time. This can reduce the load on big servers and make the general sharing process of files more efficient across larger networks. This protocol is especially popular nowadays with distributing large files like software packages, movies, or datasets.
Protocol Design
To begin the torrenting process, a user needs to create a .torrent
file. The contents of the original file are hashed using SHA-1 and embedded as metadata within the new file. Along with these hashes, the .torrent file includes additional info such as the file name, file size, and a tracker tag.
Peer discovery
Trackers are servers that help peers locate one another or find others who have the file or parts of it. Once the peer connects to the tracker, it receives a list of other peers that it can request pieces from. Additionally, trackers are broken into 2 categories, public and private. Public trackers are open to anyone that can access the internet and private ones are hosted on invite only websites or within a business's network.
Swarming
Swarming in torrenting is when there is simultaneous downloading of different pieces of a file from multiple peers in the network. Instead of downloading a file sequentially from a single source, swarming lets users request and retrieve parts of the file from various peers who already possess those parts. This parallel downloading significantly increases speed and efficiency since it uses the collective bandwidth of all participating peers.
Swarming Example
When a new movie or game file is released on a torrent platform, many users download it simultaneously. Each user acts as a downloader and uploader, contributing small pieces of the file to others.
Piece selection
In order to make sure there is a balanced distribution of pieces and bottlenecking is reduced, BitTorrent uses "rarest-first" selection where peers prioritize downloading the least available pieces within the network. This means that pieces are usually downloaded out of order and are rearranged later by the client. This helps maintain a well-distributed swarm, where even less common pieces are likely to be available.
To keep the network strong and reliable, BitTorrent also uses a "tit-for"tat" strategy where peers are incentivized to upload data by receiving faster download speeds when they contribute more to the network. Users are encouraged to continue seeding files long after downloading, helping the network grow bigger.
Choking
BitTorrent controls its bandwidth by choking; this is where a seeder only sends data to a limited number of leechers at a given moment. Seeders select specific leechers through an "unchoking" process, where peers actively sharing parts of the file are favored. Optimistic unchoking is then used every 30 seconds or so to allow for new dynamic connections. If the optimistically unchoked peer reciprocates by uploading data, it will likely become a preferred peer in the regular unchoking rotation.
Choking Examples
- A peer with a slow or unreliable upload speed may be choked by others to avoid wasting bandwidth on inefficient transfers.
- A seeder limits uploads to the top 4 peers actively sharing data, while temporarily denying uploads to less active or non-contributing peers
NAT and Port Management
While BitTorrent can use either TCP or UDP, it is more commonly seen with TCP as each peer manages their own open TCP ports to allow for connections. Torrent clients typically use the port numbers from 6881 to 6889 but they can also be configured by the user. Oftentimes, firewalls and routers might try to block connections from torrents but many torrent clients support NAT traversal techniques like UPnP (Universal plug and play) to allow port forwarding on routers automatically.
Verification
Once each piece is downloaded, it is hashed and verified against the original SHA-1 hash stored in the .torrent
file. This check ensures that every piece matches the original content and if it fails the hash check, it is discarded for the peer to request the same data again.
BitTorrent Vs. Other Protocols
Vs. HTTP
BitTorrent differs from HTTP in being decentralized as HTTP has a single server deal with requests from all clients. This single server protocol can possibly strain the server and decrease download speeds when many people are downloading the same file. However, HTTP is simpler to implement and as all data comes from a single server, it's much easier to monitor and enforce security. This would make it more ideal for secure file transfers and smaller networks.
BitTorrent also differs in allowing for higher network efficiency by letting peers download and upload data at the same time. This can be powerful with smaller companies that have less server power or when distributing large amounts of data.
Another big difference is the reliability of each protocol. BitTorrent uses redundancy where if one peer disconnects, others still have parts of the file and can continue the transfer. This minimizes the risk of losing access to the file. However, this heavily relies on how big the swarm is; if there are not enough people online, a file might become unavailable. HTTP offers reliability through centralized infrastructure. If a server is well maintained, data will always be available at a consistent speed. However, if the server fails or becomes overwhelmed, no peer-to-peer redundancy exists, and access to the file can be lost.
Similarly to HTTP, FTP uses a single server for file requests so BitTorrent has the same advantages and disadvantages here.
Vs. IPFS
IPFS offers more privacy features through content addressing and by default, comes with more encryption on files sent over the network. Because of the decentralized network on BitTorrent, many people can access the IP addresses of peers which leaves users exposed to potential privacy risks. Torrents also allow for malicious data like corrupted files or malware to be spread quickly and over a wide network that wouldn't be able to be caught until the verification part of the protocol.
IPFS addresses the availability issue of BitTorrent by storing copies of each file premanently in distributed nodes. It also allows files to be found and accessed as long as any node on the network has them, rather than relying on voluntary seeders.
IPFS is still a newer technology that has smaller support and less development than BitTorrent. There are still major tradeoffs with having this more secure protocol. BitTorrent is easily accessible and has multiple clients (uTorrent, qBitTorrent, Deluge, etc.) for any average consumer to pick up and use.
Table Comparing HTTP, IPFS, and BitTorrent
Feature | HTTP | IPFS | BitTorrent |
---|---|---|---|
Type | Client-server | Distributed content network | Peer-to-peer file-sharing |
Data Storage | Centralized | Decentralized, content-addressable | Decentralized |
Reliability | Dependent on server availability | Content available as long as peers store it | Dependent on seeders/peers |
Scalability | Limited to server capacity | Scales with peer contribution | Scales with peer participation |
Speed | Server bandwidth-dependent | Peer-based, fast retrieval of content | Fast for popular files with many seeders |
Privacy | Server logs data requests | Somewhat private; no central server | IPs visible to peers |
Best For | Simple, one-to-one downloads | Hosting and sharing persistent content | Sharing large files efficiently |
A Simulation of Peer-to-Peer Network Efficiency Using Torrenting
In order to showcase a case where BitTorrent improves network performance, lets simulate a file distribution network where multiple peers exchange data in a swarm. Below I have written a simple Python script that simulates peer-to-peer efficiency compared to centralized HTTP downloads.
Simulation Code
import numpy as np
import matplotlib.pyplot as plt
# Simulation parameters
num_peers = 50
file_size = 100 # Assume 100 chunks
http_speed = 5 # Chunks per second (centralized server)
bittorrent_speed = np.random.randint(1, 10, num_peers) # Peers distribute at varied speeds
# Time calculation
http_time = file_size / http_speed
bittorrent_time = file_size / np.mean(bittorrent_speed)
# Visualization
labels = ['HTTP Download', 'BitTorrent Download']
times = [http_time, bittorrent_time]
plt.bar(labels, times, color=['red', 'blue'])
plt.ylabel('Time to Complete Download (seconds)')
plt.title('BitTorrent vs HTTP Download Speeds')
plt.show()
The graph generated by this simulation shows that:
- HTTP takes longer due to its dependence on a single source.
- BitTorrent downloads are faster in a best case scenario assuming all peers are seeded and distributing in the most optimal way (close to perfect parallelism)
A Data Driven Dive in Torrent Efficiency vs. Server Load
To empirically test the performance of BitTorrent versus traditional downloads, a comparison is conducted below:
Constants:
- Dataset: Ubuntu ISO (3 GB)
- Test Environment: 100 Mbps fiber-optic connection
Observations
Metric | HTTP Download | BitTorrent Download |
---|---|---|
Download Speed | 8 MB/s | 12 MB/s |
Server Load | High | Distributed |
Redundancy | None | High |
Download Consistency | Stable | Varies with seeders |
As multiple peers join the server, HTTP download is strain but BitTorrent benefits from multiple peers contributing to bandwidth. Even if a server went down in BitTorrent, the redundancy it provides by offering multiple download destinations keeps the service active compared to a single point of failure in HTTP Download.
BitTorrent, A Hands on Interactive Tutorial Using qBitTorrent
For users unfamiliar with torrenting and want to experience the BitTorrent protocol, qBitTorrent is a popular open-source software/ client that provides a user-friendly interface while still offering advanced features such that you can dive deeper into the torrenting hobby. These features include torrent creation and bandwidth management for when you want to start hosting.
Installation Setup
Downloading and Installing qBitTorrent
-
Visit https://www.qbittorrent.org/download qBitTorrent’s official website and download the recommended version suitable for your operating system.
-
Walk through the installation wizard to install the software
Configuring qBitTorrent
-
Open qBitTorrent and navigate to Tools → Options to configure settings
-
Under Connection, ensure that the Listening Port is open.
a. If the port is behind a router, enable UPnP/NAT-PMP for automatic port forwarding.
-
Also remember to enable Encryption under the BitTorrent settings to prevent throttling from ISPs (Internet Service Provider)
Downloading a Torrent
-
Obtain a .torrent file from hopefully a legal source, for example Linux ISO torrents a. You can also obtain a magnet link, a hyperlink that allows users to download files using peer-to-peer (P2P) networks
-
Click File → Add Torrent a. If you have a magnet link, paste the magnet link in the Add Torrent Link option.
-
Select a destination folder for the file and start downloading!
Seeding and Uploading
- Once you have downloaded the file, leave the torrent to seed (this is the main part of torrenting, contributing back to the community!)
The software will automatically share parts of the file with other peers, following a tit-for-tat strategy as mentioned above.
Security Concerns Related to Torrenting with BitTorrent
IP Exposure:
Every peer in a torrent “swarm” can see your IP address. This is especially concerning as malicious actors and ISPs can track your torrent activity ontop of all the risk associated with an exposed IP address.
-
Some solutions to this could be VPNs or Proxy, this would encrypt traffic and hide IP.
-
DHT blocklists could be set up to prevent connections from known bad IPs
-
Finally, a sort of blanket authentication software to blacklist malicious actors could also be used for repeated offenders.
Malicious Torrents:
It is difficult to track the authenticity of shards from other peers. Some torrents contain malware or fake files. However, new advances have used verification via SHA-1 hashing to mitigate these risk.
Conclusion
BitTorrent remains a powerful, decentralized protocol for file distribution, it outperforms traditional client-server models in scalability and redundancy. However, users should take necessary security precautions when torrenting to maintain privacy and ensure authentic downloads.
References
https://www.researchgate.net/publication/2473566_Incentives_build_robustness_in_BitTorrent
https://dl.acm.org/doi/10.1145/1402946.1402987