SQLDiff - w4111/w4111.github.io GitHub Wiki

Group Members:

  • ah4100 Robin Hwang
    • wrote Problem and Tech's solution, Alternatives + Pro/Cons, 4111 Relevance
  • cr2157 Christian Romero
    • Wrote the tutorial, example

Name of the Technology/Tool

Database Comparison

When developing and maintaining databases, developers need to compare databases to see/understand how they differ. It could be something from comparing a prod database to the dev version of that database, checking how a database changed after running a migration, or doing due diligence on data consistency. SQLDIFF is a command-line utility that comes with SQLite specifically designed to solve the issues of directly comparing SQLite databases. Similar to the comparison function EXCEPT in SQL, where EXCEPT compares the rows between two tables, SQLDIFF will compare the difference in rows between two SQLite databases. Instead of having to manually check each table one-by-one which is time-intensive, especially as databases get bigger, SQLDiff will allow you to find the exact rows that match and don't match between two databases. The following example illustrates a very common use case:

Developers often test new features in a developer database before changes are committed to the production database. Say a dev creates a feature that requires a phone number and so they add a 'phone_number' column to the USERS table. The feature tests well and is committed to production. However, the prod database wasn't updated to have a 'phone_number' column for the USERS table. Thus when the feature is accessed the application crashes since the feature is trying to pull from a column that doesn't exist. With a simple check using SQLDiff, it would've been immediate that there are no equivalent rows between the prod/dev database in the USERS table since the dev database has an extra column. This is a great example of how comparing two databases can be a useful check for developers.

Alternatives and Pros/Cons of SQLDIFF

Although SQLDIFF does solve the unique problem of Database comparison, it's scope is quite limited as it is completely restricted to SQLite. Thus, we have competitors for both SQLite and DBMS's in general.

  • SQLite DIFF is essentially a paid version of SQLDIFF that is more robust. Some key differences are 1) schema is compared in addition to the data. So it's not just rows that are being compared but rather tables, columns, Triggers, Views, and Indexes. SQLite Diff gives a much more in-depth true comparison between two database files. 2) There is also a graphical component, changed/updated schema and data are displayed with yellow highlighting to show the difference between two files in a user interface.

  • RedGate SQL Data Compare is a powerful enterprise tool that works for SQL Server. Data Compare does a multi-layered analysis that looks at table structures, column definitions, relationships between tables, stored procedures, and security settings. Each element is examined to find the differences between two databases in a visual interface that shows exactly where the differences are. It goes beyond just comparing row by row per table and looks at the fundamental structure of the database. Of course, this is an enterprise tool so it's going to be significantly more robust than a small add-on like SQLite DIFF or an included functionality like SQLDIFF.

These two examples highlight the types of alternatives to SQLite's SQLDIFF, on one hand you have small applications like SQLite DIFF that are a bit more robust and make it easier to spot differences in data. On the other hand, you have powerful enterprise level applications that not only show you the difference between table data but also goes one step further and shows the full difference between internal table structure. Most popular databases have some form of built-in functionality similar to SQLDiff or have paid enterprise applications that do what SQLDiff does and more. Overall, database comparison tools all can point to differences in data (i.e. records), and thus the use case as always will determine the tool we choose. If we are just purely looking for differences in records and nothing more then SQLDIFF is the most straightforward and easiest to work with, but if more functionality such as an analysis or data visualization is needed then it makes more sense to use a more robust paid tool such as RedGate's SQL Data Compare.

SQLDIFF and 4111

  • The core functionality of SQLDIFF is built upon the idea of set difference, similar to EXCEPT, although in this case the set difference is comparing the whole database file as opposed to just tables.

  • The actual matching mechanism SQLDIFF is using can be either the row ID of tables or a primary key, SQLDIFF will compare the values of the rowid/primary key of records from both databases. As discussed in class, this ensures that the rows being matched are unique ensuring that the match is accurate. Once SQLDIFF matches either the specified rowid/primary key from both records it will then check the contents of both records to determine a match. This approach applies the concept of record equivalence we discussed in class, where two records are compared field by field to determine if they truly contain the same information and are indeed equivalent. In addition, the concept of entity integrity is baked in because SQLDIFF assumes that each record will have a unique identifier. (Although records generally will have unique identifiers it is not always 100% true that records will have a primary key)

  • ACID PRINCIPLES

SQLite is a transactional database where all changes/queries are Atomic, Consistent, Isolated and Durable. Specifically, SQLite implements serializable transactions that are ACID regardless of crashes. This is very important as this allows SQLDIFF to be accurate and consistent.

  1. Atomicity - Since Atomicity guarantees that transactions are either fully complete or do not have an effect, this ensures that when two SQLite Databases are being compared SQLDIFF will always be comparing complete, valid states of the databases and not their messy in-between states. Meaning that SQLDIFF never sees partial or incomplete transaction effects as this would make comparisons meaningless because partial/incomplete transactions don't represent the true current state of the database.

  2. Consistency - Consistency ensures that DBMS's always satisfy integrity constraints, this is a need for SQLDIFF because if the databases it is comparing don't satisfy integrity constraints then it would be impossible to compare records. When SQLDIFF matches records using primary keys or rowids, it relies on this consistency property to ensure these identifiers maintain their meaning and uniqueness across both database states. Without consistency, there is no guarantee that these identifiers are consistent rendering comparison meaningless.

  3. Isolation - This property ensures that SQLDIFF gets a stable, non-changing view of the two compared databases during the time it is reading records. Although databases are always being adjusted, isolation guarantees that when SQLDIFF is taking place both databases are non-changing for the time of the execution. Without isolation, SQLDIFF can't make an accurate comparison as records could potentially be changing dynamically.

  4. Durability - Conversely, SQLDIFF can also help us confirm this concept by examining the differences between databases after a crash to see the differences in data. However, similar to above, durability does assure that the committed transactions are persistent giving us the current-most accurate state of the database regardless of failures and crashes of the compared databases. This allows SQLDIFF to make the most accurate up-to-date reads regardless of crashes and failures.

So in turn, SQLDIFF is built on various ideas and concepts we have learned in class. These concepts ensure that the databases and records being compared will be accurate during the time of comparison ensuring that SQLDIFF can produce the necessary, correct results and therefore find meaningful differences of the compared databases at hand.

Tutorial

You must download sqldiff through here and extract

Google Colab Tutorial