overall concept - Hive2Hive/Hive2Hive GitHub Wiki

To understand how Hive2Hive works under the hood, it is necessary to understand the overall concept. This page explains how the components are associated and interact.

Don't be shocked, it's actually pretty easy!

Distributed Hash Table (DHT)

Hive2Hive relies on modern peer-to-peer (P2P) technology that is implemented by TomP2P, the most advanced open-source distributed hash table (DHT).

A DHT is a network to which different machines, called peers or nodes, can join. A DHT is a so-called structured overlay-network because all peers together form a ring (or a tree) structure. Every peer receives an ID that is used to find and locate it within the DHT. With this, data objects can be stored in a distributed manner by asking peers to store it. Thus, every peer can store and retrieve data from any other peer. So as to store a data object, its is assigned an ID within the range of all peer IDs. Having the same ID ranges helps assigning responsibilities over data objects: For any given data object ID, the peer with the closest peer ID is responsible for the storage and replication of this object.

To see how Hive2Hive uses such DHT technology, please refer to the TomP2P wiki section.

User Management

User Profile

The central element of the user management in Hive2Hive is the user profile. This user profile contains all relevant information about a user in the network. Thus, a user’s profile has not much in common with those profiles on social platforms like Facebook. In contrast, it must be kept private!

A user profile holds the following information:

Although the user profile itself must remain private, the user ID can be made public, e.g., to retrieve invitations.

User Locations

Every user using Hive2Hive can potentially possess several client machines. These clients may even be online at the same time. So in order to know what clients of a user are online, some lookup mechanism for their respective locations is required. For this reason, a per-user list, called user locations, is published publicly in the DHT, such that everyone can find it.

Thus, deriving the Location Key for these locations is easy and can be achieved by hashing the User ID.

The user location list contains a list of all online clients (IP address and port).

The user locations should always remain up-to-date. When a user's client logs in, a reference to it must be added to the list. When a user's client logs out, the reference must be removed (friendly logout). Since unfriendly logouts need to be considered as well, the user locations are checked and cleaned every time a client detects an inconsistent state.

The user locations are used for the following:

Messaging

File Management

File Tree

In order to keep track of a user's files, her User Profile contains a tree of indices. This index tree is equitable to the file tree on the user's local disk.

For every file or folder that gets stored with Hive2Hive, an index is created:

File Index

A file index comprises the following:

The File Encryption Key Pair is used to derive the Location Key of the file's Meta File.

MD5 Version Hash

Each file index also contains a MD5 Hash of the newest version of the associated file. With the aid of this hash, the synchronization process is much easier. Comparing changes on the local disk and in the network is much faster than re-hashing all files when a comparison takes place. As a drawback, however, this hash needs to be updated as soon as the content of the associated file is updated.

Folder Index

In contrast to files, a folder can be shared. Thus, a folder index holds the following:

An Authentication key pair:
- User Authentication Key Pair - If the folder is not shared.
- Shared Authentication Key Pair - If the folder is shared.
Sharer List

Sharer List

In order to keep track of all users that share the associated folder, a list of users having access to the folder is kept in the folder index. This list is considered when sending Notifications as soon as a file within the folder has been added, updated, moved or deleted.

Meta File

A meta file is a separate object in the DHT. It contains the following meta information about the associated file:

Access Permissions of different users
List of File Versions

Deriving the Location Key for a meta file can be achieved by hashing the File Encryption Public Key that is stored in the File Index.

File Version

A file version represents a single version in time of a file. It contains the following information:

Version Counter
File Size
Date
List of File Chunks

File Chunk

For a better distribution of all data in the network, files are chunked to a user-configurable size. All chunks of a file are encrypted with the same public key that is stored in the Meta File. The Location Key of each chunk is randomly generated, ensuring a uniform distribution among all peers in the DHT.

Chunks are the essential data parts in the DHT. To achieve an efficient replication, the chunk size should be rather small and proportional to the assumed bandwith. The bandwidth may differ for each application: Some may need to synchronize chunks over the Internet while others only act in a LAN with higher throughput, lower error rates and latency.

Hive2Hive Model

This class diagram shows the above mentioned components and shows their relations. All elements inherit from NetworkContent that ensures the proper handling of serialization, versioning, conflict handling and time-to-live.

Hive2Hive Class Diagram