P2P Concepts - Hive2Hive/Hive2Hive GitHub Wiki


TomP2P

Hive2Hive relies on modern peer-to-peer (P2P) technology that is implemented by TomP2P, the most advanced open-source DHT.

TomP2P provides a distributed, multi-key value infrastructure that uses an iterative routing approach. This enables the DHT put and get operations to use up to 4 different key dimensions to distinguish different data objects that are stored in the DHT. All keys have a key length of 160 bits.

Hive2Hive makes use of the location-, content- and version key dimensions.

Object Location Key Content Key Version Key
User Profile hash(User Credentials) USER_PROFILE no version control
User Locations hash(User ID) USER_LOCATIONS no version control
User Encryption Public Key hash(User ID) USER_PUBLIC_KEY no version control
User Profile Task hash(User ID) timestamp no version control
Meta File hash(File Encryption Public Key) META_FILE no version control
File Chunk hash(random String) FILE_CHUNK timestamp + hash(file chunk)

Location Keys

The location key is the first key dimension considered in the TomP2P DHT routing. It refers to a peer ID in the DHT.

In Hive2Hive, data objects are stored by using such location keys because they refer to the peer which is responsible for it. A data object's location key is derived from either a public key or custom string:

  • Public Key: Because a public key cannot be chosen manually, a (uniform) distribution of the data is achieved in the long-run.
  • Custom String: This method allows to explicitly select a peer for the storage.

Example

Location Keys

  • If the location key is Alice’s user ID, 'Alice' (hash = 5147), the content is stored on the peer with peer ID 4607 because it is the closest one. In contrast, the user ID 'Bob' (hash = 1375) would lead to peer 1032.
  • If the location key is some public key (e.g., hash = 269), the content is stored on the peer with peer ID 243.

Domain Keys

The domain key is the second key dimension considered in the TomP2P DHT routing. It allows an application to store data objects under the same Location Key, but in different domains.

This key dimension is not used with Hive2Hive.

Content Keys

The content key is the third key dimension considered in the TomP2P DHT routing. It allows to store multiple data objects under the same Domain Key), each having a different content key.

In Hive2Hive, such content keys are used to distinguish data objects that are stored under the same location key, such as the User Profile, the User Locations and the User Encryption Public Key. Hive2Hive uses hashes of predefined constants, depending on the type of the data object to be stored. This reduces the probability of hash collisions. In the case of User Profile Tasks, timestamps are used as content keys in order to guarantee a chronological ordering.

Version Keys

The version key is the fourth key dimension considered in the TomP2P DHT routing. It allows to store multiple versions of data objects under the same Content Key), each having a different version key.

In Hive2Hive, these version keys consist of a timestamp and a hash over the data to be put. They are used to implement the Versioning & Conflict Management of File Chunks.

Replication

In a DHT, all data is spread over the network. On each participating peer, some storage space is used to store (foreign) data. Since a peer-to-peer network has to deal with churn (i.e., high fluctuation of online/offline switches), that data needs to be replicated on multiple peers. This is due to the replication factor that typically is between 3 and 6. Hive2Hive allows to configure this replication factor. The replication work itself is managed by the TomP2P library.

This introduces the usage of a multiple of the storage space in the network, compared to the space used on the local disk. Thus, Hive2Hive uses a so-called Time-To-Live mechanism since a careful clean-up management is crucial.

Time-To-Live (TTL)

The time-to-live concept helps to keep the network clean in the long-run. Content that is stored in the DHT and reaches the TTL is automatically removed from the network. By default TTL for all data objects is more than 1 year, but can be cofigured. Moreover, the TTL on an active data object is refreshed from time to time.


Peer Management

User Client

A so-called user client refers to one of a user's machines in the network. This terminology is required because Hive2Hive allows users to use multiple client machines simultaneously. These different machines represent different peers, with different peer IDs, in the network. Each peer is responsible for another key-range in the DHT.

Master Client

The master client refers to the main User Clients in the network. Normally, this is just the first client that joined. Upon friendly leave, another user user client becomes the master client. In case of an unfriendly leave, the remaining user clients detect the absence of a master client because it cannot be contacted anymore. The selection of a new master client is implicit: The user client with the smallest peer ID becomes the new master client.

The master client is responsible for the following:

Client Session