BitTorrent Protocol - 180D-FW-2024/Knowledge-Base-Wiki GitHub Wiki

Introduction

BitTorrent (also known as torrent) is a popular open source P2P (peer to peer) file sharing protocol used in many applications for both consumers and companies.

At a high level, the downloaders (leechers) use a torrent client and .torrent files to grab data from uploaders (seeders) across a decentralized network. The decentralization of torrents allow for high peer scalability, efficiency, and privacy which add to the allure of BitTorrent over other protocols.

Rather than one big file at once, the peers of a torrent file share bits and pieces of the file over time. This can reduce the load on big servers and make the general sharing process of files more efficient across larger networks. This protocol is especially popular nowadays with distributing large files like software packages, movies, or datasets.

image

Protocol Design

To begin the torrenting process, a user needs to create a .torrent file. The contents of the original file are hashed using SHA-1 and embedded as metadata within the new file. Along with these hashes, the .torrent file includes additional info such as the file name, file size, and a tracker tag.

Peer discovery

Trackers are servers that help peers locate one another or find others who have the file or parts of it. Once the peer connects to the tracker, it receives a list of other peers that it can request pieces from. Additionally, trackers are broken into 2 categories, public and private. Public trackers are open to anyone that can access the internet and private ones are hosted on invite only websites or within a business's network.

Swarming

Swarming in torrenting is when there is simultaneous downloading of different pieces of a file from multiple peers in the network. Instead of downloading a file sequentially from a single source, swarming lets users request and retrieve parts of the file from various peers who already possess those parts. This parallel downloading significantly increases speed and efficiency since it uses the collective bandwidth of all participating peers.

Swarming Example

When a new movie or game file is released on a torrent platform, many users download it simultaneously. Each user acts as a downloader and uploader, contributing small pieces of the file to others.

Piece selection

In order to make sure there is a balanced distribution of pieces and bottlenecking is reduced, BitTorrent uses "rarest-first" selection where peers prioritize downloading the least available pieces within the network. This means that pieces are usually downloaded out of order and are rearranged later by the client. This helps maintain a well-distributed swarm, where even less common pieces are likely to be available.

To keep the network strong and reliable, BitTorrent also uses a "tit-for"tat" strategy where peers are incentivized to upload data by receiving faster download speeds when they contribute more to the network. Users are encouraged to continue seeding files long after downloading, helping the network grow bigger.

Choking

BitTorrent controls its bandwidth by choking; this is where a seeder only sends data to a limited number of leechers at a given moment. Seeders select specific leechers through an "unchoking" process, where peers actively sharing parts of the file are favored. Optimistic unchoking is then used every 30 seconds or so to allow for new dynamic connections. If the optimistically unchoked peer reciprocates by uploading data, it will likely become a preferred peer in the regular unchoking rotation.

Choking Examples

  1. A peer with a slow or unreliable upload speed may be choked by others to avoid wasting bandwidth on inefficient transfers.
  2. A seeder limits uploads to the top 4 peers actively sharing data, while temporarily denying uploads to less active or non-contributing peers

NAT and Port Management

While BitTorrent can use either TCP or UDP, it is more commonly seen with TCP as each peer manages their own open TCP ports to allow for connections. Torrent clients typically use the port numbers from 6881 to 6889 but they can also be configured by the user. Oftentimes, firewalls and routers might try to block connections from torrents but many torrent clients support NAT traversal techniques like UPnP (Universal plug and play) to allow port forwarding on routers automatically.

Verification

Once each piece is downloaded, it is hashed and verified against the original SHA-1 hash stored in the .torrent file. This check ensures that every piece matches the original content and if it fails the hash check, it is discarded for the peer to request the same data again.

BitTorrent Vs. Other Protocols

Vs. HTTP

BitTorrent differs from HTTP in being decentralized as HTTP has a single server deal with requests from all clients. This single server protocol can possibly strain the server and decrease download speeds when many people are downloading the same file. However, HTTP is simpler to implement and as all data comes from a single server, it's much easier to monitor and enforce security. This would make it more ideal for secure file transfers and smaller networks.

BitTorrent also differs in allowing for higher network efficiency by letting peers download and upload data at the same time. This can be powerful with smaller companies that have less server power or when distributing large amounts of data.

Another big difference is the reliability of each protocol. BitTorrent uses redundancy where if one peer disconnects, others still have parts of the file and can continue the transfer. This minimizes the risk of losing access to the file. However, this heavily relies on how big the swarm is; if there are not enough people online, a file might become unavailable. HTTP offers reliability through centralized infrastructure. If a server is well maintained, data will always be available at a consistent speed. However, if the server fails or becomes overwhelmed, no peer-to-peer redundancy exists, and access to the file can be lost.

Similarly to HTTP, FTP uses a single server for file requests so BitTorrent has the same advantages and disadvantages here.

Vs. IPFS

IPFS offers more privacy features through content addressing and by default, comes with more encryption on files sent over the network. Because of the decentralized network on BitTorrent, many people can access the IP addresses of peers which leaves users exposed to potential privacy risks. Torrents also allow for malicious data like corrupted files or malware to be spread quickly and over a wide network that wouldn't be able to be caught until the verification part of the protocol.

IPFS addresses the availability issue of BitTorrent by storing copies of each file premanently in distributed nodes. It also allows files to be found and accessed as long as any node on the network has them, rather than relying on voluntary seeders.

IPFS is still a newer technology that has smaller support and less development than BitTorrent. There are still major tradeoffs with having this more secure protocol. BitTorrent is easily accessible and has multiple clients (uTorrent, qBitTorrent, Deluge, etc.) for any average consumer to pick up and use.

Table Comparing HTTP, IPFS, and BitTorrent

Feature HTTP IPFS BitTorrent
Type Client-server Distributed content network Peer-to-peer file-sharing
Data Storage Centralized Decentralized, content-addressable Decentralized
Reliability Dependent on server availability Content available as long as peers store it Dependent on seeders/peers
Scalability Limited to server capacity Scales with peer contribution Scales with peer participation
Speed Server bandwidth-dependent Peer-based, fast retrieval of content Fast for popular files with many seeders
Privacy Server logs data requests Somewhat private; no central server IPs visible to peers
Best For Simple, one-to-one downloads Hosting and sharing persistent content Sharing large files efficiently

A Simulation of Peer-to-Peer Network Efficiency Using Torrenting

In order to showcase a case where BitTorrent improves network performance, lets simulate a file distribution network where multiple peers exchange data in a swarm. Below I have written a simple Python script that simulates peer-to-peer efficiency compared to centralized HTTP downloads.

Simulation Code

import numpy as np
import matplotlib.pyplot as plt

# Simulation parameters
num_peers = 50
file_size = 100  # Assume 100 chunks
http_speed = 5  # Chunks per second (centralized server)
bittorrent_speed = np.random.randint(1, 10, num_peers)  # Peers distribute at varied speeds

# Time calculation
http_time = file_size / http_speed
bittorrent_time = file_size / np.mean(bittorrent_speed)

# Visualization
labels = ['HTTP Download', 'BitTorrent Download']
times = [http_time, bittorrent_time]

plt.bar(labels, times, color=['red', 'blue'])
plt.ylabel('Time to Complete Download (seconds)')
plt.title('BitTorrent vs HTTP Download Speeds')
plt.show()

The graph generated by this simulation shows that:

  1. HTTP takes longer due to its dependence on a single source.
  2. BitTorrent downloads are faster in a best case scenario assuming all peers are seeded and distributing in the most optimal way (close to perfect parallelism)

A Data Driven Dive in Torrent Efficiency vs. Server Load

To empirically test the performance of BitTorrent versus traditional downloads, a comparison is conducted below:

Constants:

  • Dataset: Ubuntu ISO (3 GB)
  • Test Environment: 100 Mbps fiber-optic connection

Observations

Metric HTTP Download BitTorrent Download
Download Speed 8 MB/s 12 MB/s
Server Load High Distributed
Redundancy None High
Download Consistency Stable Varies with seeders

As multiple peers join the server, HTTP download is strain but BitTorrent benefits from multiple peers contributing to bandwidth. Even if a server went down in BitTorrent, the redundancy it provides by offering multiple download destinations keeps the service active compared to a single point of failure in HTTP Download.

BitTorrent, A Hands on Interactive Tutorial Using qBitTorrent

For users unfamiliar with torrenting and want to experience the BitTorrent protocol, qBitTorrent is a popular open-source software/ client that provides a user-friendly interface while still offering advanced features such that you can dive deeper into the torrenting hobby. These features include torrent creation and bandwidth management for when you want to start hosting.

Installation Setup

Downloading and Installing qBitTorrent

  1. Visit https://www.qbittorrent.org/download qBitTorrent’s official website and download the recommended version suitable for your operating system.

  2. Walk through the installation wizard to install the software

Configuring qBitTorrent

  1. Open qBitTorrent and navigate to Tools → Options to configure settings

  2. Under Connection, ensure that the Listening Port is open.

    a. If the port is behind a router, enable UPnP/NAT-PMP for automatic port forwarding.

  3. Also remember to enable Encryption under the BitTorrent settings to prevent throttling from ISPs (Internet Service Provider)

Downloading a Torrent

  1. Obtain a .torrent file from hopefully a legal source, for example Linux ISO torrents a. You can also obtain a magnet link, a hyperlink that allows users to download files using peer-to-peer (P2P) networks

  2. Click File → Add Torrent a. If you have a magnet link, paste the magnet link in the Add Torrent Link option.

  3. Select a destination folder for the file and start downloading!

Seeding and Uploading

  1. Once you have downloaded the file, leave the torrent to seed (this is the main part of torrenting, contributing back to the community!)

The software will automatically share parts of the file with other peers, following a tit-for-tat strategy as mentioned above.

Security Concerns Related to Torrenting with BitTorrent

IP Exposure:

Every peer in a torrent “swarm” can see your IP address. This is especially concerning as malicious actors and ISPs can track your torrent activity ontop of all the risk associated with an exposed IP address.

  1. Some solutions to this could be VPNs or Proxy, this would encrypt traffic and hide IP.

  2. DHT blocklists could be set up to prevent connections from known bad IPs

  3. Finally, a sort of blanket authentication software to blacklist malicious actors could also be used for repeated offenders.

Malicious Torrents:

It is difficult to track the authenticity of shards from other peers. Some torrents contain malware or fake files. However, new advances have used verification via SHA-1 hashing to mitigate these risk.

Conclusion

BitTorrent remains a powerful, decentralized protocol for file distribution, it outperforms traditional client-server models in scalability and redundancy. However, users should take necessary security precautions when torrenting to maintain privacy and ensure authentic downloads.


References

https://www.researchgate.net/publication/261495357_Improvements_on_the_security_of_P2P_file-sharing_system_based_on_JXTA

https://www.researchgate.net/publication/2473566_Incentives_build_robustness_in_BitTorrent

https://dl.acm.org/doi/10.1145/1402946.1402987

https://www.researchgate.net/publication/221260588_Challenges_and_Directions_for_Monitoring_P2P_File_Sharing_Networks_-_or_-_Why_My_Printer_Received_a_DMCA_Takedown_Notice

https://dl.acm.org/doi/10.1145/1030194.1015508

https://www.bittorrent.org/beps/bep_0003.html