Using ViewMaterializer - adamfoneil/SqlServerUtil GitHub Wiki

Use ViewMaterializer with SQL Server change tracking as a way to optimize performance of slow views by incrementally synchronizing them to an ordinary table. Ordinary tables are much faster to query.

Imagine you have a view in your SQL Server database that provides a lot of calculations, but the view is too slow to query at a volume your users want to see. I've seen good results with indexed views for this problem, but I've also seen them provide no benefit without a clear reason. Indexed views also make it harder to implement changes later specifically because, by definition, they're schema bound. For these reasons, I wanted to provide a way to leverage SQL Server change tracking to address the slow view problem, while requiring minimal application code. Change tracking by itself is really powerful, and great for this kind of problem. But it has several moving pieces that are hard to put together yourself if you're starting from scratch. Thus, the ViewMaterializer<T> class makes this great SQL Server feature easier to leverage.

Before you begin, note that SQL Server change tracking needs to be turned on in your database, and then needs to be turned on for select tables. In my integration test, the code for this is here, but in a real app, you would turn these on and leave them on rather than toggling them constantly like my test is doing.

I'll refer to this example from my integration test, but I'll introduce it in pieces.

To use ViewMaterializer<T>, the first thing you need to figure out are the primary key columns of the view you want to optimize. This is based on the assumption that a view -- slow or otherwise -- is aggregating data along one or more reporting dimensions. The T generic argument represents these dimensions. In my example, I'm using a table of made-up sales data, and the view I'm optimizing groups by three columns. The combination of these columns Region, Year, and ItemClass describes a unique row in this view. Therefore, in order to implement a ViewMaterializer for it, we need to create a class with these three properties:

public class SalesMaterializerResult
{
    public string Region { get; set; }
    public int Year { get; set; }
    public string ItemClass { get; set; }
}

Next, we can create our ViewMaterializer class like this:

public class SalesMaterializer : ViewMaterializer<SalesMaterializerResult>
{
}

But this code won't compile because ViewMaterializer<T> is abstract, meaning it has several members we need to implement. Let's start with the easiest. We need to tell our materializer class what our (slow) view is along with the table we'll be incrementally merging into. We implement the SourceView and IntoTable properties accordingly. You can find the view definition here. It's not necessary for the IntoTable to exist. I typically put such tables in an rpt schema to indicate they are "reporting" tables. The IntoTable is created dynamically at runtime here.

public class SalesMaterializer : ViewMaterializer<SalesMaterializerResult>
{
    protected override string SourceView => "dbo.SalesHistoryTotals";
    protected override string IntoTable => "rpt.SalesHistoryTotals";
}

The final piece we have to implement is the GetChangesAsync abstract method. This is where change tracking comes in, and is how we tell the ViewMaterializer what view data has changed since the last time it was merged. This part is harder to explain, so I'll just show the code, then unpack it piece by piece:

public class SalesMaterializer : ViewMaterializer<SalesMaterializerResult>
{
    protected override string SourceView => "dbo.SalesHistoryTotals";
    protected override string IntoTable => "rpt.SalesHistoryTotals";

    protected override async Task<IEnumerable<SalesMaterializerResult>> GetChangesAsync(SqlConnection connection, long version)
    {
        return await connection.QueryAsync<SalesMaterializerResult>(
            @"SELECT 
                [sh].[Region],
                YEAR([sh].[Date]) AS [Year], 	
                UPPER(LEFT([sh].[ItemNumber], 2)) AS [ItemClass]
           FROM 
                CHANGETABLE(CHANGES dbo.SalesHistory, @version) AS [changes]
	        INNER JOIN [dbo].[SalesHistory] [sh] ON [changes].[Id]=[sh].[Id]
           GROUP BY
	        [sh].[Region],
	        YEAR([sh].[Date]),
	        UPPER(LEFT([sh].[ItemNumber], 2))", new { version });
        }
    }

The idea behind GetChangesAsync is to ask change tracking what has changed in our base table dbo.SalesHistory since the last time it was synchronized. That's what CHANGETABLE(CHANGES dbo.SalesHistory, @version) means. But we need the data about changes to have our dbo.SalesHistoryTotals view columns. Since the CHANGETABLE alone doesn't have the Region, Year, and ItemClass columns, I join it to the base table dbo.SalesHistory then perform the same grouping that the SourceView does. In this way, we can get incremental changes to the base table in a way that's compatible with the view we're merging.

The version argument is a magic number incremented by SQL Server, and managed internally by the ViewMaterializer<T> class. This is used to identify rows modified or deleted after the last time the view was synchronized. Every time you call the ExecuteAsync method to perform a merge, this version number is captured so that only rows modified after the last merge are found.

In the integration test example, you can see that I create random data, then perform two arbitrary updates of it, each time merging changes to the IntoTable. At the end of the test, I assert that the source view and the reporting table have identical data.