CSV Dump Comparison - x-ian/dha-mis-migration GitHub Wiki

Raw data comparison

Instead of relying on spot checks and most likely incomplete UI tests, as much as possible the whole migration should be tested automatically and as a whole.

The 'raw' datasets (before and after migration) can be quite easily compared. For this CSV dumps from every table are generated both from the old Access backend DB as well the new MSSQL DB.

CSV data dump from Access

  1. Open VB Editor of DB
  2. Import file dha-mis-migration/access/CsvExport.bas
  3. Run VB Macro ExportAll (CSV files are created in My Documents)
  4. Post process obs file
sed -i.bak 's/-0.00/0.00/g'  obs-access.csv   
sed -i.bak 's/5.0e-03/0.00/g'  obs-access.csv   
sed -i.bak 's/9.9e-03/0.00/g'  obs-access.csv   
sed -i.bak 's/3.0e-03/0.00/g'  obs-access.csv   
sed -i.bak 's/2.0e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/7.0e-03/0.00/g'  obs-access.csv   
sed -i.bak 's/9.69e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/9.69e-06/0.00/g'  obs-access.csv 
sed -i.bak 's/8.0e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/3.52e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/2.17e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/9.8e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/9.0e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/9.8e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/1.0e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/9.0e-03/0.00/g'  obs-access.csv 
sed -i.bak 's/5.0e-04/0.00/g'  obs-access.csv 
sed -i.bak 's/387304.0/387304.00/g'  obs-access.csv 
sed -i.bak 's/133323.0/133323.00/g'  obs-access.csv 
sed -i.bak 's/133000.0/133000.00/g'  obs-access.csv 

CSV data dump from SQL Server

  1. Create local login user root/root (in SSMS under tree element database/security) and run these commands in sqlcmd prompt:
use HIVData
go
exec sp_addrolemember 'db_owner', 'root'
go
  1. Check file dha-mis-migration/nodesjs/tedious-csv-dump.js for correct credentials and DB connect configs
  2. Run Batch file dha-mis-migration/nodesjs/tedious-csv-dump.bat (CSV files are created in Desktop/csv-export-sql)

CSV data dump from SQL Server through Access

Comparing CSV data dumps within the Access connections (one when connecting to Access backend DB and the other one when connecting to the MSSQL backend). But as a new column rowversion/timestamp is added through the migration, the dumps aren't easily comparable and require a post-process to remove this data again for every table.

For the obs table with a bash shell these statements can be used to compare the dumps

awk -F',' '{print $1,$2,$3,$4}' obs_msqsql.csv | tr ' ' ',' > obs_mssql2.csv
diff -w obs_access.csv obs_mssql2.csv

Compare all CSV files of both directories

E.g. under Windows with WinMerge or under *nix with diff. Note that obs-access.csv results in an out-of-memory error of WinMerge.

Expected differences are mainly due to different representations of the value 0 within float datatype.

Table Def overview/comparison

Besides the raw data, also the table structure and its details can be externally verified.

Access:

  1. Import file dha-mis-migration/access/CsvExportTableDefinition.bas
  2. Run Macro ExportallTableDefs (CSV file is created in Desktop)

SQL Server:

  1. Check file dha-mis-migration/nodesjs/tedious-tabledef-dump.js for correct credendials and DB connect configs
  2. Run Batch file dha-mis-migration/nodesjs/tedious-tabledef-dump.bat (CSV file is created in Desktop)

CSV dump from Queries

Similar to the raw data of every single table, all queries are supposed to still return the same values. Sadly Access doesn't allow to script the invocation of queries with parameters without any user interaction, so for now only param-less (Non-action) queries are included in the scripts.

With Access Backend:

  1. Open frontend

  2. Run script ExportAllQueries

  3. Copy all CSV files from Documents to new folder (e.g. csv-export-de-access-queries)

  4. Copy Desktop\File.txt to new folder

  5. Sort all lines for each file. Should be possible with PowerShell, but took forever: gci | ForEach-Object { gc $.FullName | sort > "$($.BaseName)" }. Instead under Unix this was much faster: find . -type f -exec sort -o {} {} ;

  6. Repeat for other frontends

MSSQL Backend:

  1. Repeat steps from above Access Backend and adjust accordingly, e.g. create folder csv-export-de-sql-queries

Compare results:

  1. Use a file diff (e.g. WinMerge under Windows) to search for (relevant) differences

Conclusion

With the defined migration process and the above comparisons no relevant difference in the data is detected pre- and post-migration.

⚠️ **GitHub.com Fallback** ⚠️