Azure Table Storage (ATS) is Microsoft’s cloud-based offering for massively, scalable, non-relational data storage. First introduced in 2012, its popularity has grown exponentially due to its promises of cheap data storage, lightning fast performance and instant scalability.
However, it’s worth noting that while performance is top-tier, there are still things that you as a developer need to be aware of in order to keep your data flowing as quickly as possible.
Before going through these points it’s worth first covering the main performance aspects to consider when deciding on an ATS implementation.
Unlike a traditional relational database, entities in ATS are indexed by only two fields; the PartitionKey and the RowKey. Each of those properties are strings which can hold up to a maximum of 1Kb of data.
The PartitionKey (PK) identifies the collection which an entity is stored in. This is the key to the scalability of ATS as your partitions will be spread out across multiple servers in the Azure cloud infrastructure. Under normal conditions a single partition has a throughput of 500 entities per second, although this can go up or down dependent on server load. This is worth bearing in mind when dealing with areas of your data which will be accessed very often.
The RowKey (RK) uniquely identifies an entity within each partition. The PK and RK together form the primary key of the entity stored in ATS.
The approach you use to creating your PK determines what kind of performance and scalability you will see from ATS. For example, if you have a few partitions with each holding a large number of entities you will limit the throughput and scaling possibilities of that partition, but conversely may be able to use Entity Group Transactions on any entity. Similarly, if you have lots of partitions with few entities the table will be massively scalable, but if you need to request a large range of data from it may require multiple server calls.
As such, the way in which PK are generated is vital to how your data will be accessed and needs to be considered on a case by case basis.
Now that your data is held in ATS and you’ve got a plan for structuring your partitions, you can retrieve your data. The fields you are querying in the table determine what type of query will be made to the table store. There are four types of query; Point, Row scan, Partition scan and Table scan.
The most performant of these is the Point query, which provides both the exact PK and RK. As such, ATS can use the clustered index in order to retrieve it.
A Row scan query is where you provide the PartitionKey only and are looking for entities solely within a single partition. Given that this type of query does not cross partition boundaries, it’s performance is excellent.
A Partition scan query provides multiple PK values. As such it requires ATS to scan through multiple partitions which may reside on multiple servers. This type of query should be avoided where possible.
By far the worst performant query is the Table scan. This type of query requires ATS to scan every entity in the table across all partitions, which, as mentioned above, can be on different servers. Therefore, it may require multiple calls to retrieve data as well as having to compare potentially millions of records. As you can imagine this type of query does not scale and needs to be avoided at all costs.
Use point queries wherever possible
Structure your entities so that commonly accessed, critical information is retrievable from a point query:
PartitionKey eq ‘20180717’ and RowKey eq ‘transaction_00897’
Use partition scan queries to retrieve collections of data
Where you need to retrieve a collection of entities use a PartitionKey to group it:
PartitionKey eq ‘20180717’ and UserId eq ‘usr_00198930’
Don’t cross partitions when querying
Including multiple PartitionKeys in the query will result in a Partition scan query which should be avoided:
(PartitionKey eq ‘20180717’ or PartitionKey eq ‘20180716’) and RowKey eq ‘usr_00198930’
Never use table scan queries
They cross partitions which may be on different servers and offer the worst performance possible in ATS:
Name eq ‘John Doe’ and Age ge 18
Use the appropriate PartitionKey granularity for your use case
The most performant method of structuring your partitions will depend on how much data you want to store in a partition and how often its accessed. It’s best to plan and test your intended structure to ensure its performance and scalability once it’s populated with data.