Creating and using hashes - TheRhythmVerse/website GitHub Wiki
Hashes are a way to univocally identify content over the RhythmVerse. They are primarily used to pull data from the API. Their purpose is to identify content for gameplay purposes.
First of all, some terminology:
Container
It's the format in which the files used by the game are stored. It can be a file, like a .zip file, or a directory uncompressed
Game files
A generic description for what a game loads and let players play, the content of the container
Song
A recording the game file is based on
Release
A unique entity for a game file. Each game file has one release and can have multiple updates
Version
An update to a game file for a release
Gameplay file
This is the data file that contains information needed by the game for gameplay purposes: the notes the players must hit, the lyrics, etc.
Configuration file
This is the data file that contains information needed by the game to describe the content: song title, artist, creator, etc.
Authoring
The way a chart has been authored on all difficulties
Challenges of creating a hash
Creating a unique hash for a file carries a set of challenges.
Container and mutability
The same release can have multiple game files scattered around multiple sources and on those sources the game files can have different containers. A creator could publish a release on two different Google Drives and use two different containers, for example a .zip file for one drive and the uncompressed folder for the other. Maybe the uncompressed folder has a checksum file added to it, the .zip file doesn't, etc. We might get different hashes for the exact same release and version.
Availability of full content
RhythmVerse indexes a number of formats and sources and for some we might not have access to all files within the game file. For example, we might have acquired .mid file and .ini file for a commercially available game, which we use for indexing purposes, but not the encrypted audio file which we don't need. For that reason, the hash RhythmVerse produces would be different from the actual hash of the game files.
Relevance of content to hash
Not all files within a container are game files and not all game files are relevant to a unique hash. For example, a container might contain, as noted previously, a checksum file that is irrelevant to the game. It might also contain game files that are cosmetic and irrelevant to the identification of the release (for example and updated album art)
Hashes in the RhythmVerse
To address all the issues discussed, we decided to create a flexible sistem of hashes that only takes into account essential game files. Only the gameplay file and the configuration file are needed to univocally identifa a release, regardless of any minor update. Our API will return information about the creator, the artist, difficulty, etc. and all that information can be reliably served just by knowing which gameplay and configuration files make up that release. We have 3 kinds of hashes and 2 methods of identification. We hash both the gameplay file and the configuration file separately, and then we hash them both together.
Game files hash
This is the most reliable hash to use when accessing the Rhythmverse API. Always use this hash when you have control of both the gameplay file and the configuration file.
Gameplay file hash
As a general rule, the gameplay file itself is a very good predictor of which song and authoring the game files are from, so when it comes to acquiring information from the RhythmVerse for scoring purposes, it can be reliable enough. However, it isn't a completely reliable predictor of a release. Two releases might share the same identical gameplay files because they come from 2 creators and the only difference could be in album artwork, etc. For that reason, the configuration file is used to compose the game files hash: it includes information on the creator.
Creating hashes to provide to Rhythmverse API
To make a call to the RhythmVerse API to get information based on content you have you need a hash. Here is how you can calculate hashes in various coding languages. The code assumes you already have extracted the gameplay file and the configuration file (if available for the format): because this depends on format, platform and your code logic that part is outside the scope of this guide. You can use this code in a method or function and return the hashes, of course.
PHP
// Initialize the gamefile_hash hash
$gamefile_hash = hash_init('md5');
// Provide a gameplay_file_path and create the gameplay file hash, which is mandatory
$gameplay_file_MD5 = md5_file($gameplay_file_path);
// Update the gamefile_hash with the mandatory gameplay file hash
hash_update($gamefile_hash, $gameplay_file_MD5);
// If the file format uses a configuration file, you must hash it
if(isset($configuration_file_path)){
$configuration_file_MD5 = md5_file($configuration_file_path);
hash_update($gamefile_hash, $configuration_file_MD5);
}
// Finalize the gamefile_hash hash
$gamefile_MD5 = hash_final($gamefile_hash );
Python
import hashlib
# Initialize the gamefile_hash hash
gamefile_hash = hashlib.md5()
# Provide a gameplay_file_path and create the gameplay file hash, which is mandatory
with open(gameplay_file_path, 'rb') as file:
gameplay_file_MD5 = hashlib.md5(file.read()).hexdigest()
# Update the gamefile_hash with the mandatory gameplay file hash
gamefile_hash.update(gameplay_file_MD5.encode('utf-8'))
# If the file format uses a configuration file, you must hash it
try:
with open(configuration_file_path, 'rb') as file:
configuration_file_MD5 = hashlib.md5(file.read()).hexdigest()
# Update the gamefile_hash with the configuration file hash
gamefile_hash.update(configuration_file_MD5.encode('utf-8'))
except NameError:
# configuration_file_path is not set
pass
# Finalize the gamefile_hash hash
gamefile_MD5 = gamefile_hash.hexdigest()
TypeScript
import { createHash } from 'crypto';
import { readFileSync } from 'fs';
// Provide a gameplay_file_path and create the gameplay file hash, which is mandatory
const gameplayFileMD5 = (filePath: string): string => {
const fileBuffer = readFileSync(filePath);
return createHash('md5').update(fileBuffer).digest('hex');
};
// Initialize the gamefile_hash hash
const gamefileHash = createHash('md5');
// Assume gameplay_file_path is defined somewhere
const gameplay_file_path: string = 'path/to/gameplay/file';
const gameplayFileMD5Hex = gameplayFileMD5(gameplay_file_path);
gamefileHash.update(gameplayFileMD5Hex);
// If the file format uses a configuration file, you must hash it
let configurationFileMD5Hex: string | undefined = undefined;
const configuration_file_path: string | undefined = 'path/to/configuration/file'; // or undefined if not used
if (configuration_file_path) {
configurationFileMD5Hex = gameplayFileMD5(configuration_file_path);
gamefileHash.update(configurationFileMD5Hex);
}
// Finalize the gamefile_hash hash
const gamefileMD5 = gamefileHash.digest('hex');