Design & Analysis Draft - ntopper/MD5AwSum GitHub Wiki

Definition

System definition:

  • MD5Awsum is a checksum tool that calculates the MD5 hash of a given file and attempts to identify the file using a repository of hash tables. This tool will also be used to maintain a local repository of hash tables downloaded from the internet. Alternatively, given a MD5 checksum, the tool will attempt to perform a reverse look-up given the same set of repositories.

Why is it important?

  • In verifying files, it saves several steps. The traditional method of verifying the integrity of a file downloaded from the internet is to first find its MD5 checksum from the file’s source, calculate the hash on your own computer, then compare the calculated hash to the one that’s online. Our tool is one step; rather than the user going online and finding the hashes, the tool maintains a repository of hashes that can be checked against for the user and present the data to the user.

This encourages people to stop downloading bad files / verify the integrity of their files and adds security.

Analysis

Inputs:

  • md5awsum [options] : go to this filepath, calculate MD5sum of this filepath, go through hash table repository, compare that sum to every sum in hash table repository, every time you find a sum that matches that hash repository, print that data
  • md5awsum –add (-a): MD5awsum looks in its own config file and downloads the file at the given URL to a directory specified in a config file. It then attempts to parse the downloaded file as XML and if the file is valid and formatted as a hash table (in the format we specify), it adds each entry to our master hash table, giving it a unique identifier associated with the URL it was downloaded from. It adds that URL with the unique identifier to the configuration file. The temporary file is then deleted.
  • md5awsum –remove (-r): looks in config file for unique identifier at this URL, removes URL from config file. Given that unique identifier, it goes thru the master file and deletes the hashes that have that identifier.
  • md5awsum –update [] (-u): if given a URL, it performs all the actions of the remove command and then performs all the actions of the add command with that same URL.
  • md5awsum –lookup // –string (-l): given a hash, searches master hash table for matches of that hash. Treats it as a word file.
  • md5awum –help: gives an explanation of all the commands/options

Outputs:

Flow/logic required:

Design:

  • We’re defining module as each class and the file main/md5awsum.

How many modules?

  • Search: Input handler/parser
  • MD5lib: performs the hashing
  • Rainbowtable: performs the file input parsing and searching
  • Repository manager: handles managing the (adding/removing) config files and adding/removing to the master hash table repository
  • Configuration
  • libcURL: prewritten library being used to handle downloading files from the internet
  • XML parser: prewritten XML parser library

Classes/methods for each module/component?

  • MD5lib:
    • Public method constructor (takes filename and calls private method digest on itself, stores given filename as a private variable),
    • Public method get (returns a string (private property hash))
    • Private method: void digest (takes no parameters; calculates hash of the file specified in constructor)
    • Private property: filename (string), hash (string)
  • Rainbowtable:
    • Constructor (takes filename and stores filename in a private string variable)
    • Public method search (takes in a string which is a hash)
  • Search:
    • Constructor
    • Public method search: search through table at location filename by calling upon XML parser class, search thru every entry. For each entry it finds a matching hash property, it will print out all the properties of the given entry to the standard output
    • Private string (filename specified in constructor)
    • “Tree”: instance of XML table from XML parser class
  • Repository manager:
    • Constructor (doesn't take instance of config class anymore, config class is singleton)
    • Add method (takes in a string which is a URL): ^see analysis section
    • Remove method: ^ see analysis section
    • Update method (calls remove method and then add method)
    • Method getCommand (command line args): returns int value which reps specific command
  • Configuration (takes URLs in and out of repository file): • There only needs to be one shared instance, so this should be a singleton (rather than passing around an instance) • Constructor (doesn’t take in any variables): parses in the config file
    • Add method (takes in URL): adds URL entry to end of a map called URLmap as the key. Value for the key will be the ID that the map is associated with in the rainbowtable. Then rewrites the config file based on its own entries (values in that map and also a string attribute which is the location of the hash table) • Remove method (takes in URL)
    • Public method lookup (takes in int): goes to key value pair and finds the URL associated with the given integer that it’s looking up and returns it as a string
  • Main (command line variables): sends to input handler, performs switch, each case will perform appropriate actions
  • Shared classes/methods across all modules?

Execution Plan

  • Divide coding tasks? Teams
    • Team 1: Code hash and rainbowtable module
    • Team 2: Code repository manager and configuration module
    • Team 3: Code main and command line parser
  • What will the makefile look like?
  • Planned deadlines to carry out implementation/testing on each module
    • November 14: 1st unit test 3 weeks before project deadline of Dec. 5
    • November 21: 2nd unit test (fix problems found from 1st test)
⚠️ **GitHub.com Fallback** ⚠️