Full usage walkthrough (tutorial) - Jusas/AzureTableDataStore GitHub Wiki

Overview

This page is acts as a guide to all the operations the TableDataStore provides, a sort of tutorial that should cover pretty much everything.

Constructors

There are two constructors for TableDataStore<TData>, one that takes connection strings and another that takes StorageCredentials and StorageSharedKeyCredential. You can also define whether you want automatic creation of storage table and blob container if they do not yet exist. You may also name the instance and define the RowKey and PartitionKey properties for the entity data model which is required when the data model properties are not explicitly marked with the TableRowKey and TablePartitionKey attributes.

Examples below:

var storageConnStr = "DefaultEndpointsProtocol=https;AccountName=...";
var tableName = "mytable";
var containerName "myblobs";

new TableDataStore<MyEntityType>(
    tableStorageConnectionString: storageConnStr, 
    blobStorageConnectionString: storageConnStr,
    tableName: tableName, 
    createTableIfNotExist: true, 
    blobContainerName: containerName,
    createContainerIfNotExist: true,
    storeName: "default",
    partitionKeyProperty: "UserGroup",
    rowKeyProperty: "UserId"
);

var tableCreds = new StorageCredentials("mytablestorage", "...");
var blobCreds = new StorageSharedKeyCredential("mytablestorage", "...");

new TableDataStore<MyEntityType>(
    tableStorageCredentials: tableCreds, 
    blobStorageCredentials: blobCreds,
    tableName: tableName, 
    createTableIfNotExist: true, 
    blobContainerName: containerName,
    createContainerIfNotExist: true,
    storeName: "default",
    partitionKeyProperty: "UserGroup",
    rowKeyProperty: "UserId"
);

Additional settings and client side validation

Since parallelism is desirable, when we do not use batching and when we do blob operations we try to execute these tasks in parallel. The number of parallel table tasks is limited by the property ParallelTableOperationLimit and for blob tasks property ParallelBlobBatchOperationLimit.

Additionally if you'd prefer to spot possible issues early you'd want to enable client side validation by setting UseClientSideValidation to true. The client side validation validates the entity properties before trying to make any table API calls according to the documented table entity rules and throws an AzureTableDataStoreEntityValidationException<TData> if there were any validation issues.

EntityPropertyConverterOptions can be changed if the default value of _ as entity property name delimiter isn't suitable for you (the delimiter character cannot be present in property names).

// This defaults to false
store.UseClientSideValidation = true;

// Allows setting how many asynchronous Azure Storage Table operations (sub-batches in 
// batch inserts/merges, or individual inserts/merges when not using batching) can be 
// run in parallel per call.
// This defaults to 8.
store.ParallelTableOperationLimit = 8;

// Allows setting how many parallel asynchronous Azure Storage Blob operations can be 
// initiated per a sub-batch of 100 entities (so a maximum of 
// ParallelTableOperationLimit * ParallelBlobBatchOperationLimit blob operations can 
// be running in parallel).
// This defaults to 8.
store.ParallelBlobBatchOperationLimit = 8;

// You can override the property name delimiter.
store.EntityPropertyConverterOptions = new EntityPropertyConverterOptions() 
    { PropertyNameDelimiter = "_" };

Dependency Injection

The TableDataStore implements the interfaces ITableDataStore<TData> and INamedTableDataStore. Commonly instances are injected with DI, and this can be easily achieved. In your standard ASP.NET Core use case:

public void ConfigureServices(IServiceCollection services)
{
    var storageConnStr = "DefaultEndpointsProtocol=https;AccountName=...";
    var tableName = "mytable";
    var containerName "myblobs";

    services.AddTransient<ITableDataStore<MyEntityType>>(serviceProvider =>
        new TableDataStore<MyEntityType>(
            tableStorageConnectionString: storageConnStr, 
            tableName: tableName, 
            createTableIfNotExist: true, 
            blobContainerName: containerName,
            createContainerIfNotExist: true));
}

Now, let's say you have multiple instances of the same type that you want to inject, perhaps targeting another storage. To identify them you can use the store name:

services.AddTransient<ITableDataStore<MyEntityType>>(serviceProvider =>
        new TableDataStore<MyEntityType>(
            storeName: "store1",
            tableStorageConnectionString: storageConnStr, 
            tableName: tableName, 
            createTableIfNotExist: true, 
            blobContainerName: containerName,
            createContainerIfNotExist: true));

services.AddTransient<ITableDataStore<MyEntityType>>(serviceProvider =>
        new TableDataStore<MyEntityType>(
            storeName: "store2",
            tableStorageConnectionString: storageConnStr2, 
            tableName: tableName, 
            createTableIfNotExist: true, 
            blobContainerName: containerName,
            createContainerIfNotExist: true));

...and in the constructor where you want to use them, you'll need to inject the IEnumerable<ITableDataStore<MyEntityType>>. You can also leverage an extension method for choosing the named instance:

public class MyClass
{
    private ITableDataStore<MyEntityType> _tds;

    public MyClass(IEnumerable<ITableDataStore<MyEntityType>> tableDataStores)
    {
        _tds = tableDataStores.NamedInstance("store2");
    }
}

Inserts

Task InsertAsync(BatchingMode batchingMode, params TData[] entities);

Inserts a new entity into table storage (and blobs to blob storage). Usage is very simple:

public class MyEntityType
{
    [TableRowKey]
    public string Id { get; set; }

    [TablePartitionKey]
    public string Group { get; set; }

    public string Text { get; set; }
}


var myEntity = new MyEntityType() 
{
    Id = "test1",
    Group = "test",
    Text = "abcdef"
};

await store.InsertAsync(BatchingMode.None, myEntity);

Now let's say we have 150 entities we need to insert. Now batching starts to become useful:

var myEntities = Enumerable.Range(0,150).Select(x => new MyEntityType() 
{
    Id = $"test{x}",
    Group = "test",
    Text = "abcdef"
}).ToArray();

await store.InsertAsync(BatchingMode.Strong, myEntities);

What happens here is that the Strong batching mode combines these entities into batches of max 100 entities and there's only one API call made per batch (makes a total of 2 batches) so this is significantly faster than making 150 API calls to table storage.

Let's say our entity has a LargeBlob property in it, meaning that each entity's blob will need to be uploaded as well:

public class MyEntityType
{
    [TableRowKey]
    public string Id { get; set; }

    [TablePartitionKey]
    public string Group { get; set; }

    public string Text { get; set; }

    public LargeBlob ImageFile { get; set; }
}

var myEntities = Enumerable.Range(0, 150).Select(x => new MyEntityType() 
{
    Id = $"test{x}",
    Group = "test",
    Text = "abcdef",
    ImageFile = new LargeBlob("default.png", new FileStream("files/default.png", ...), "image/png")
}).ToArray();

// We must use loose batching if we want to use any batching here.
await store.InsertAsync(BatchingMode.Loose, myEntities);

So here we will perform the table insert operations as batched operations, again 2 table API calls will be made but we will also have to make 150 blob storage API calls - one per entity per LargeBlob. Internally each table API call is followed by the related blob API calls and the blob calls will only be made if the parent table operation succeeds. It may also be that some of the blob operations fail for some reason and you'll need to figure out what to do in those cases. But for now, let's just assume everything went ok.

Replacing entities

So let's assume we have the previous 150 entities with blobs in the storage. Now we want to replace a few of those.

Task InsertOrReplaceAsync(BatchingMode batchingMode, params TData[] entities);

As the name implies we may insert or replace entities with this method. Contrary to the InsertAsync method, this one allows replacing existing entities. Replacing really works just like insert. It's worth noting that when you replace an entity, LargeBlob blobs will also get replaced. If you set a LargeBlob property to null, it effectively deletes the existing blob.

Again, when working with LargeBlob properties we have the options of BatchingMode.None or BatchingMode.Loose.

var myEntities = Enumerable.Range(0, 10).Select(x => new MyEntityType() 
{
    Id = $"test{x}",
    Group = "test",
    Text = $"value {x}",
    ImageFile = new LargeBlob($"entity.png", new FileStream($"files/entity{x}.png", ...), "image/png")
}).ToArray();

await store.InsertOrReplaceAsync(BatchingMode.Loose, myEntities);

Merging (updating) entities

Entities can be updated either in full, or you can select only some properties to be updated. We can also use ETags to ensure we're updating the current version (optimistic concurrency). There are two overloads, one that takes the plain TData type entities and one that takes DataStoreEntity<TData> type entities. This DataStoreEntity wrapper class contains the ETag and Timestamp properties and that overload is to be used when you want to target specific ETags. Likewise the GetWithMetadataAsync, FindWithMetadataAsync, ListWithMetadataAsync and EnumerateWithMetadataAsync methods return entities wrapped into the DataStoreEntity<TData> class.

Task MergeAsync(BatchingMode batchingMode, Expression<Func<TData, object>> selectMergedPropertiesExpression, 
    LargeBlobNullBehavior largeBlobNullBehavior = LargeBlobNullBehavior.IgnoreProperty,
    params TData[] entities);

Task MergeAsync(BatchingMode batchingMode, Expression<Func<TData, object>> selectMergedPropertiesExpression, 
    LargeBlobNullBehavior largeBlobNullBehavior = LargeBlobNullBehavior.IgnoreProperty,
    params DataStoreEntity<TData>[] entities);

So to update the Text and ImageFile properties of some entities, using optimistic concurrency, we'd do:

// Returns IList<DataStoreEntity<MyEntityType>>
var existingEntities = await store.FindWithMetadataAsync(x => x.Id == "test1" || x.Id == "test2");

existingEntities[0].Value.Text = "updated";
existingEntities[0].Value.ImageFile = new LargeBlob("updated.png", new FileStream(...), "image/png");
existingEntities[1].Value.Text = "updated";
existingEntities[1].Value.ImageFile = null;

await store.MergeAsync(BatchingMode.Loose, entity => new { entity.Text, entity.ImageFile }, 
    LargeBlobNullBehavior.DeleteBlob /* one blob will get deleted */, existingEntities);

So in the above example we updated the Text and ImageFile properties of two entities and effectively deleted the blob of one entity since we set the property to null and called the MergeAsync method with LargeBlobNullBehavior.DeleteBlob behavior (if we had used the LargeBlobNullBehavior.IgnoreProperty behavior the null value would have been ignored). We also used the ETags (transparently) to make sure that we applied this update to the specific versions of the entities, i.e. if the entities had been updated in the meanwhile, this operation would fail.

Select expressions

The Merge operations allow a select expression that chooses which properties of the entity are to be merged (akin to a patch operation). This expression can be left as null, in which case all properties are updated. If not, an expression returning a new anonymous object is expected. This is perhaps not the optimal way, but was selected on the basis of not needing to type property names as strings.

So a valid merge select expression will look like:

entity => new 
{
    entity.MyInt,
    entity.MyStringValue,
    entity.MySubObject.MySomeValue,
    entity.MySubObject.MyOtherValue,
    entity.MyLargeBlob
}

Note: if you have object properties (a class with properties), it is not enough to just select the parent object: the expression must contain the exact properties that are being selected, i.e. in the above case entity.MySubObject is not enough and would not include entity.MySubObject.MySomeValue and entity.MySubObject.MyOtherValue.

Retrieving entities

Get

To get a single (or the first returned) entity that matches the query, use the GetAsync/GetWithMetadataAsync methods:

Task<TData> GetAsync(Expression<Func<TData, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null);
Task<TData> GetAsync(Expression<Func<TData, DateTimeOffset, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null);

Task<DataStoreEntity<TData>> GetWithMetadataAsync(
    Expression<Func<TData, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null);
Task<DataStoreEntity<TData>> GetWithMetadataAsync(
    Expression<Func<TData, DateTimeOffset, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null);

Examples:

// Get a single entity.
var test1 = await store.GetAsync(x => x.Id == "test1");

// Get a single entity with Timestamp and ETag.
var test1WithMetadata = await store.GetWithMetadataAsync(x => x.Id == "test1");

// Get a single entity whose Group is "test" and Timestamp is older than 2019-01-01
var someOldEntity = await store.GetAsync((x, timestamp) => x.Group == "test" && 
    timestamp < DateTimeOffset.Parse("2019-01-01T00:00:00.000Z"));

// Get a single entity, but retrieve only the selected properties (Text).
var test2 = await store.GetAsync(x => x.Id == "test2", x => new { x.Text });

Find

To find all (or a limited set of) matching entities, use the FindAsync and FindWithMetadataAsync methods:

Task<IList<TData>> FindAsync(
    Expression<Func<TData, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null, 
    int? limit = null);
Task<IList<TData>> FindAsync(
    Expression<Func<TData, DateTimeOffset, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null, 
    int? limit = null);

Task<IList<DataStoreEntity<TData>>> FindWithMetadataAsync(
    Expression<Func<TData, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null, 
    int? limit = null);
Task<IList<DataStoreEntity<TData>>> FindWithMetadataAsync(
    Expression<Func<TData, DateTimeOffset, bool>> queryExpression, 
    Expression<Func<TData, object>> selectExpression = null, 
    int? limit = null);

Examples:

// Get 20 first entities with Group (PartitionKey) "test".
var test20 = await store.FindAsync(x => x.Group == "test", null, 20);

// Get all "old" entities, retrieve only the Text property, with ETags and Timestamps.
var oldTime = new DateTimeOffset(2019, 1, 1, 0, 0, 0, TimeSpan.Zero);
var oldOnes = await store.FindWithMetadataAsync((x, timestamp) => x.Group == "test" && 
    timestamp < oldTime, x => new { x.Text });

Query examples

The queries are always entered as LINQ expressions and can handle all the basic operators: The operators ==, !=, >, >=, <, <= and ! are supported. They can be combined with parenthesis to combine more complex conditions. Additionally the expressions allow for a separate DateTimeOffset parameter which represents the entity's Timestamp.

The expressions are later converted into filter strings, and not every trick in the book has been covered. Most of the time there should be no issues, but do not attempt anything complex, including but not limited to:

  • Contains() operations - strings only support ==, !=, >, >=, <, <= operators
  • queries to collections - collections are not supported by Azure Tables and therefore are not searchable
  • The plain boolean expression is not supported (entity => entity.IsAlive) while the ! operator is supported (entity => !entity.IsAlive) at least for now, for technical reasons (so you'll just have to go with entity => entity.IsAlive == true)

Strings can be compared in expressions with the >, >=, <, <= operators with the help of the AsComparable() string extension. Example of this is shown in the examples below.

// Example entity type.
public class MyLittleEntity
{
    [TableRowKey]
    public string MyEntityId { get; set; }
    [TablePartitionKey]
    public string MyEntityPartitionKey { get; set; }

    public string MyStringValue { get; set; }
    public List<string> MyListOfStrings { get; set; }
    public DateTime MyDate { get; set; }
    public Guid MyGuid { get; set; }
    public long MyLong { get; set; }
    public int MyInt { get; set; }
    public bool MyBool { get; set; }
    public byte[] MyBytes { get; set; }
    public EntitySubClass MySubClass { get; set; }
    public Dictionary<string, EntitySubClass> MyDictionaryOfSubClasses { get; set; }
    public LargeBlob MyLargeBlob { get; set; }
}

public class EntitySubClass
{
    public string MyValue { get; set; }
}

// Example expressions:

await store.GetAsync(entity => entity.MyEntityId == "123" && entity.MyEntityPartitionKey == "items");

await store.FindAsync(entity => entity.MyLong > 5 && 
    (entity.MyInt == 0 || entity.MyDate > DateTime.UtcNow));

await store.FindAsync(entity => entity.MySubClass.MyValue == "Hello" || !entity.MyBool);

// Can use lt, lte, gt, gte operators with AsComparable()
await store.FindAsync(entity => entity.MyStringValue.AsComparable() > "Hello".AsComparable());

// DateTimeOffsets can be used.
var past = new DateTimeOffset(1999, 1, 1, 0, 0, 0, TimeSpan.Zero);
await store.FindAsync( (entity, timestamp) => entity.MyLong < 99 && timestamp < past );



// Things that may seem valid but will not work:

// Invalid: collections are not supported in queries and are serialized into strings in table rows
await store.FindAsync(entity => entity.MyDictionaryOfSubClasses["one"].MyValue == "foo");

// Invalid: table storage does not support this
await store.FindAsync(entity => entity.MyStringValue.Contains("a"));

// Invalid: no operator present, unable to interpret, use entity.MyBool == true
await store.FindAsync(entity => entity.MyBool);

// Invalid: LargeBlobs are not stored in separate properties, instead they're serialized
// into a single JSON string, and therefore cannot be used in queries.
await store.FindAsync(entity => entity.MyLargeBlob.Filename == "image.jpg");

List

To list all/limited number of entities you may use the ListAsync and ListWithMetadataAsync methods:

Task<IList<TData>> ListAsync(
    Expression<Func<TData, object>> selectExpression = null, 
    int? limit = null);
Task<IList<DataStoreEntity<TData>>> ListWithMetadataAsync(
    Expression<Func<TData, object>> selectExpression = null, 
    int? limit = null);

Example:

// Gets ALL the entities in the table.
var entities = await ListAsync();

Enumerate

Lastly if you wish to "enumerate" through entities, page by page, the EnumerateWithMetadataAsync methods will allow you to do that:

Task EnumerateWithMetadataAsync(Expression<Func<TData, bool>> queryExpression, 
    int entitiesPerPage, EnumeratorFunc<DataStoreEntity<TData>> enumeratorFunc, 
    TableContinuationToken continuationToken = null);
Task EnumerateWithMetadataAsync(Expression<Func<TData, DateTimeOffset, bool>> queryExpression, 
    int entitiesPerPage, EnumeratorFunc<DataStoreEntity<TData>> enumeratorFunc, 
    TableContinuationToken continuationToken = null);

The EnumeratorFunc<DataStoreEntity> is a delegate that gets called once per page:

public delegate Task<bool> EnumeratorFunc<T>(List<T> entities, TableContinuationToken continuationToken) 
    where T:new();

If the function returns false, the enumeration will stop. If it returns true, the enumeration will continue to the next page.

Examples:

async Task<bool> MyEnumerator<DataStoreEntity<MyEntityType>>(List<DataStoreEntity<MyEntityType>> entities, 
    TableContinuationToken continuationToken)
{
    foreach(var entity in entities) 
    {
        // Do something
    }
    // Return true to continue enumeration.
    // We could for example collect entities from one page of results to display on a web page.
    // In that case we'd return false, save the TableContinuationToken, and then to get the next page
    // we could call the EnumerateWithMetadataAsync method using that continuation token.
    return true;
}

await store.EnumerateWithMetadataAsync(x => x.Group == "test", 100, MyEnumerator);

Enumeration is useful when you want to do things with a large set of results without actually retrieving them all at once. It can also be used to create paged result sets.

Counting rows

Since Azure Table Storage does not really provide a simple method to count rows, a workaround has been provided:

Task<long> CountRowsAsync();
Task<long> CountRowsAsync(Expression<Func<TData, bool>> queryExpression);

The methods simply count rows by retrieving results. For large tables this can take a lot of time since each result set can contain a maximum of 1000 entities and therefore for large tables multiple requests have to be made.

Deleting entities

Deleting entities is fairly straightforward. However once again if your entities contain LargeBlob properties you need to keep in mind that a deletion requires at least two operations, one to table storage and one to blob storage. As with any other requests, batching is possible and the same limitations with LargeBlob properties apply.

Task DeleteAsync(BatchingMode batchingMode, params TData[] entities);
Task DeleteAsync(BatchingMode batchingMode, params (string partitionKey, string rowKey)[] entityIds);
Task DeleteAsync(BatchingMode batchingMode, Expression<Func<TData, bool>> queryExpression);

Deletion can be done by supplying existing entities to DeleteAsync method. It's notable that the only thing that really matters is that the RowKey and PartitionKey properties of the entities have been set. Alternatively you can just call the method with PartitionKey + RowKey pairs (Tuples). It's also possible to use a query to select the entities to delete - this operation is a convenience method which first performs a query to retrieve the entities and then performs the deletion on those entities.

Examples:

// Delete with known partition and row keys
await store.DeleteAsync(BatchingMode.None, ("test", "test1"), ("test", "test2"));

// Grab some entities and call delete on them
var someEntities = await store.FindAsync(x => x.Group == "test");
await store.DeleteAsync(BatchingMode.Loose, someEntities);

// Find and delete - effectively doing exactly the same thing as above.
await store.DeleteAsync(BatchingMode.Loose, x => x.Group == "test");

Deleting table and blob container

The entire table and the related blob container can be deleted with a single call. This may be useful if you're using tables for temporary storage.

await store.DeleteTableAsync(deleteBlobContainer: true);
⚠️ **GitHub.com Fallback** ⚠️