Creating and using hashes - TheRhythmVerse/website GitHub Wiki

Hashes are a way to univocally identify content over the RhythmVerse. They are primarily used to pull data from the API. Their purpose is to identify content for gameplay purposes.

First of all, some terminology:

Container
It's the format in which the files used by the game are stored. It can be a file, like a .zip file, or a directory uncompressed

Game files
A generic description for what a game loads and let players play, the content of the container

Song
A recording the game file is based on

Release
A unique entity for a game file. Each game file has one release and can have multiple updates

Version
An update to a game file for a release

Gameplay file
This is the data file that contains information needed by the game for gameplay purposes: the notes the players must hit, the lyrics, etc.

Configuration file
This is the data file that contains information needed by the game to describe the content: song title, artist, creator, etc.

Authoring
The way a chart has been authored on all difficulties

Challenges of creating a hash

Creating a unique hash for a file carries a set of challenges.

Container and mutability

The same release can have multiple game files scattered around multiple sources and on those sources the game files can have different containers. A creator could publish a release on two different Google Drives and use two different containers, for example a .zip file for one drive and the uncompressed folder for the other. Maybe the uncompressed folder has a checksum file added to it, the .zip file doesn't, etc. We might get different hashes for the exact same release and version.

Availability of full content

RhythmVerse indexes a number of formats and sources and for some we might not have access to all files within the game file. For example, we might have acquired .mid file and .ini file for a commercially available game, which we use for indexing purposes, but not the encrypted audio file which we don't need. For that reason, the hash RhythmVerse produces would be different from the actual hash of the game files.

Relevance of content to hash

Not all files within a container are game files and not all game files are relevant to a unique hash. For example, a container might contain, as noted previously, a checksum file that is irrelevant to the game. It might also contain game files that are cosmetic and irrelevant to the identification of the release (for example and updated album art)

Hashes in the RhythmVerse

To address all the issues discussed, we decided to create a flexible sistem of hashes that only takes into account essential game files. Only the gameplay file and the configuration file are needed to univocally identifa a release, regardless of any minor update. Our API will return information about the creator, the artist, difficulty, etc. and all that information can be reliably served just by knowing which gameplay and configuration files make up that release. We have 3 kinds of hashes and 2 methods of identification. We hash both the gameplay file and the configuration file separately, and then we hash them both together.

Game files hash

This is the most reliable hash to use when accessing the Rhythmverse API. Always use this hash when you have control of both the gameplay file and the configuration file.

Gameplay file hash

As a general rule, the gameplay file itself is a very good predictor of which song and authoring the game files are from, so when it comes to acquiring information from the RhythmVerse for scoring purposes, it can be reliable enough. However, it isn't a completely reliable predictor of a release. Two releases might share the same identical gameplay files because they come from 2 creators and the only difference could be in album artwork, etc. For that reason, the configuration file is used to compose the game files hash: it includes information on the creator.

Creating hashes to provide to Rhythmverse API

To make a call to the RhythmVerse API to get information based on content you have you need a hash. Here is how you can calculate hashes in various coding languages. The code assumes you already have extracted the gameplay file and the configuration file (if available for the format): because this depends on format, platform and your code logic that part is outside the scope of this guide. You can use this code in a method or function and return the hashes, of course.

PHP

// Initialize the gamefile_hash hash
$gamefile_hash = hash_init('md5');

// Provide a gameplay_file_path and create the gameplay file hash, which is mandatory
$gameplay_file_MD5 = md5_file($gameplay_file_path);

// Update the gamefile_hash with the mandatory gameplay file hash
hash_update($gamefile_hash, $gameplay_file_MD5);

// If the file format uses a configuration file, you must hash it
if(isset($configuration_file_path)){
$configuration_file_MD5 = md5_file($configuration_file_path);
hash_update($gamefile_hash, $configuration_file_MD5);
}

// Finalize the gamefile_hash hash
$gamefile_MD5 = hash_final($gamefile_hash );

Python

import hashlib

# Initialize the gamefile_hash hash
gamefile_hash = hashlib.md5()

# Provide a gameplay_file_path and create the gameplay file hash, which is mandatory
with open(gameplay_file_path, 'rb') as file:
gameplay_file_MD5 = hashlib.md5(file.read()).hexdigest()

# Update the gamefile_hash with the mandatory gameplay file hash
gamefile_hash.update(gameplay_file_MD5.encode('utf-8'))

# If the file format uses a configuration file, you must hash it
try:
with open(configuration_file_path, 'rb') as file:
configuration_file_MD5 = hashlib.md5(file.read()).hexdigest()
# Update the gamefile_hash with the configuration file hash
gamefile_hash.update(configuration_file_MD5.encode('utf-8'))
except NameError:
# configuration_file_path is not set
pass

# Finalize the gamefile_hash hash
gamefile_MD5 = gamefile_hash.hexdigest()

TypeScript

import { createHash } from 'crypto';
import { readFileSync } from 'fs';

// Provide a gameplay_file_path and create the gameplay file hash, which is mandatory
const gameplayFileMD5 = (filePath: string): string => {
const fileBuffer = readFileSync(filePath);
return createHash('md5').update(fileBuffer).digest('hex');
};

// Initialize the gamefile_hash hash
const gamefileHash = createHash('md5');

// Assume gameplay_file_path is defined somewhere
const gameplay_file_path: string = 'path/to/gameplay/file';
const gameplayFileMD5Hex = gameplayFileMD5(gameplay_file_path);
gamefileHash.update(gameplayFileMD5Hex);

// If the file format uses a configuration file, you must hash it
let configurationFileMD5Hex: string | undefined = undefined;
const configuration_file_path: string | undefined = 'path/to/configuration/file'; // or undefined if not used
if (configuration_file_path) {
configurationFileMD5Hex = gameplayFileMD5(configuration_file_path);
gamefileHash.update(configurationFileMD5Hex);
}

// Finalize the gamefile_hash hash
const gamefileMD5 = gamefileHash.digest('hex');