3. How it works - grahamjevon/ReG GitHub Wiki

In a nutshell

When you run the program, it will import your selected Excel or csv file, generate references, and finally export your transformed dataset as a new Excel or csv file. The original file will remain unchanged.

The program also provides some data validation. Most notably, it checks the data for single child records.

Let's walk through the program

When you run the program, the following sequence will occur:

1. Enter a prefix [Mandatory]

The prefix you enter will be applied to the beginning of every reference number that is generated.

2. Enter the name of the file you want to upload [Mandatory]

You need to enter the full filename including the file extension.

Example: DatasetFilename.xlsx

3. Select worksheet [Conditional]

If the worksheet name in the configuration file is not present in the uploaded file, the program will display a list of the worksheet names that are present. It will ask you to enter the name of the worksheet you want to upload.

Screenshot of ReG software

4. Select hierarchy column [Conditional]

If the configured name of the column containing the hierarchy data is not present in the uploaded file, the program will display a numbered list of all the column names. It will ask you to enter the number of the column that contains the hierarchy data.

Screenshot of ReG software

5. Overwrite data in the "Reference" column [Conditional]

If the "Reference" column already exists in the uploaded file AND it contains data, the program will display the contents of that column. It will ask you if you wish to continue. If you continue the data in that column will be overwritten.

  • To continue, enter: Y
  • To quit, enter: N

Screenshot of ReG software

6. Unexpected data in the hierarchy column [Conditional]

If the hierarchy column contains data not found in the configured hierarchy, the program will notify you. It will display a list of expected values and a separate list of unexpected values.

Screenshot of ReG software

You have two options to resolve this:

  • Amend unexpected values
  • Build a bespoke hierarchy

If you choose not to proceed with either option, the program will quit because it needs to recognise all of the data in the hierarchy column in order to generate structured reference numbers.

7. Single child records [Conditional]

If the hierarchy contains single child records (e.g. a Series that contains only one File), the program will inform you. It will:

  • Tell you the total number of single child records found
  • List every spreadsheet row number where these are found (these lists will be grouped by hierarchical level)
  • Export a subset of the data that contains just the single child records and their parents.

Screenshot of ReG software

The lists of row numbers and and the exported subset will help the user investigate why these single child records are present, and whether or not they should remain.

You then have five options:

  1. Delete children - Delete all of the the single child records
  2. Delete parents - Delete all of the parents of single child records
  3. Keep - Keep all the records (i.e. do not delete any records)
  4. Choose by level - Make the above three decisions on a level-by-level basis (e.g. you might want to keep all of the single child records at "File" level, but delete single child records at the "Item" level.
  5. Cancel - Quit the program

Exported files [Mandatory and Conditional]

When the program is finished, it will automatically export at least one file. This will contain all the generated references. It will also contain any changes to the data that you made along the way (e.g. deleted records or altered terms in the hierarchy column).

The program might also generate two additional files:

  • If any single child records are found, a subset of data containing just the single child records and their parents will be exported.

  • If you choose to delete any records, a subset of data of the deleted records will be exported in a separate file. This will provide an audit trail of deleted records should you need to subsequently check this.

The filenames of any exported files will include a timestamp indicating when the program was run. This will ensure that every exported file is unique.

Screenshot of ReG software