CSV Dump Comparison - x-ian/dha-mis-migration GitHub Wiki
Instead of relying on spot checks and most likely incomplete UI tests, as much as possible the whole migration should be tested automatically and as a whole.
The 'raw' datasets (before and after migration) can be quite easily compared. For this CSV dumps from every table are generated both from the old Access backend DB as well the new MSSQL DB.
- Open VB Editor of DB
- Import file dha-mis-migration/access/CsvExport.bas
- Run VB Macro ExportAll (CSV files are created in My Documents)
- Post process obs file
sed -i.bak 's/-0.00/0.00/g' obs-access.csv
sed -i.bak 's/5.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.9e-03/0.00/g' obs-access.csv
sed -i.bak 's/3.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/2.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/7.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.69e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.69e-06/0.00/g' obs-access.csv
sed -i.bak 's/8.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/3.52e-03/0.00/g' obs-access.csv
sed -i.bak 's/2.17e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.8e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.8e-03/0.00/g' obs-access.csv
sed -i.bak 's/1.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/9.0e-03/0.00/g' obs-access.csv
sed -i.bak 's/5.0e-04/0.00/g' obs-access.csv
sed -i.bak 's/387304.0/387304.00/g' obs-access.csv
sed -i.bak 's/133323.0/133323.00/g' obs-access.csv
sed -i.bak 's/133000.0/133000.00/g' obs-access.csv
- Create local login user root/root (in SSMS under tree element database/security) and run these commands in sqlcmd prompt:
use HIVData
go
exec sp_addrolemember 'db_owner', 'root'
go
- Check file dha-mis-migration/nodesjs/tedious-csv-dump.js for correct credentials and DB connect configs
- Run Batch file dha-mis-migration/nodesjs/tedious-csv-dump.bat (CSV files are created in Desktop/csv-export-sql)
Comparing CSV data dumps within the Access connections (one when connecting to Access backend DB and the other one when connecting to the MSSQL backend). But as a new column rowversion/timestamp is added through the migration, the dumps aren't easily comparable and require a post-process to remove this data again for every table.
For the obs table with a bash shell these statements can be used to compare the dumps
awk -F',' '{print $1,$2,$3,$4}' obs_msqsql.csv | tr ' ' ',' > obs_mssql2.csv
diff -w obs_access.csv obs_mssql2.csv
E.g. under Windows with WinMerge or under *nix with diff. Note that obs-access.csv results in an out-of-memory error of WinMerge.
Expected differences are mainly due to different representations of the value 0 within float datatype.
Besides the raw data, also the table structure and its details can be externally verified.
Access:
- Import file dha-mis-migration/access/CsvExportTableDefinition.bas
- Run Macro ExportallTableDefs (CSV file is created in Desktop)
SQL Server:
- Check file dha-mis-migration/nodesjs/tedious-tabledef-dump.js for correct credendials and DB connect configs
- Run Batch file dha-mis-migration/nodesjs/tedious-tabledef-dump.bat (CSV file is created in Desktop)
Similar to the raw data of every single table, all queries are supposed to still return the same values. Sadly Access doesn't allow to script the invocation of queries with parameters without any user interaction, so for now only param-less (Non-action) queries are included in the scripts.
With Access Backend:
-
Open frontend
-
Run script ExportAllQueries
-
Copy all CSV files from Documents to new folder (e.g. csv-export-de-access-queries)
-
Copy Desktop\File.txt to new folder
-
Sort all lines for each file. Should be possible with PowerShell, but took forever: gci | ForEach-Object { gc $.FullName | sort > "$($.BaseName)" }. Instead under Unix this was much faster: find . -type f -exec sort -o {} {} ;
-
Repeat for other frontends
MSSQL Backend:
- Repeat steps from above Access Backend and adjust accordingly, e.g. create folder csv-export-de-sql-queries
Compare results:
- Use a file diff (e.g. WinMerge under Windows) to search for (relevant) differences
With the defined migration process and the above comparisons no relevant difference in the data is detected pre- and post-migration.