how.wtf

Implement version control in DynamoDB

· Thomas Taylor

Amazon DynamoDB is a fully managed service provided by AWS that enables developers to quickly store data for their applications. In this article, I will showcase how to implement version control in DynamoDB for recording changes to data over time.

What is version control in DynamoDB

DynamoDB does not support native version control on a per-item basis. If you need to record changes to your data over time, it must be handled via the application. Luckily, there is a paradigm that supports storing multiple versions of the same data: duplication.

How to implement versioning in DynamoDB

We have a few options for storing versioned data in DynamoDB. For the purposes of this tutorial, we will use a single table design: i.e., using a primary key and a sort key to manage multiple data types.

Creating a table with versioning in DynamoDB

For the remaining sections of this tutorial, we’ll leverage the same single table.

Create a table

To begin, let’s define a table using the AWS CLI:

 1aws dynamodb create-table \
 2    --table-name table \
 3    --attribute-definitions \
 4        AttributeName=PK,AttributeType=S \
 5        AttributeName=SK,AttributeType=S \
 6    --key-schema \
 7        AttributeName=PK,KeyType=HASH \
 8        AttributeName=SK,KeyType=RANGE \
 9    --provisioned-throughput \
10        ReadCapacityUnits=5,WriteCapacityUnits=5

I specified a primary key named PK and a sort key of SK.

Time versioning

Time-based versioning may be useful for applications that need to store the status of data at certain timed intervals.

DynamoDB key structure for time-based versioning

Inserting example data

To showcase the power of DynamoDB, let’s insert some values for a file object with an identifier of 1.

1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#1"},"SK":{"S":"2024-01-13T11:25:27-05:00"}}'

and another:

1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#1"},"SK":{"S":"2024-01-13T11:32:13-05:00"}}'

and one more:

1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#1"},"SK":{"S":"2024-01-13T11:37:58-05:00"}}'

Query for latest versions

Because the timestamps are sortable, we can leverage DynamoDB to perform the following requests:

  1. Grab the last 100 versions
  2. Grab all the versions by a specific interval (year, month, day, etc.)

Grab the latest 100 versions (or up to the DynamoDB limits):

1aws dynamodb query \
2    --table-name table \
3    --key-condition-expression "PK=:pk" \
4    --expression-attribute-values '{":pk":{"S":"file#1"}}' \
5    --no-scan-index-forward

The --no-scan-index-forward flag is important to sort the records in descending order rather than the default of ascending.

Output:

 1{
 2    "Items": [
 3        {
 4            "PK": {
 5                "S": "file#1"
 6            },
 7            "SK": {
 8                "S": "2024-01-13T11:37:58-05:00"
 9            }
10        },
11        {
12            "PK": {
13                "S": "file#1"
14            },
15            "SK": {
16                "S": "2024-01-13T11:32:13-05:00"
17            }
18        },
19        {
20            "PK": {
21                "S": "file#1"
22            },
23            "SK": {
24                "S": "2024-01-13T11:25:27-05:00"
25            }
26        }
27    ],
28    "Count": 3,
29    "ScannedCount": 3,
30    "ConsumedCapacity": null
31}

Grab all versions by a specific interval:

Using the begins_with or between operators, we can query for specific dates.

In the case below, I want to query everything that starts with 2024-01-13T11:3:

1aws dynamodb query \
2    --table-name table \
3    --key-condition-expression "PK=:pk and begins_with(SK, :sk)" \
4    --expression-attribute-values '{":pk":{"S":"file#1"},":sk":{"S":"2024-01-13T11:3"}}' \
5    --no-scan-index-forward

Output:

 1{
 2    "Items": [
 3        {
 4            "PK": {
 5                "S": "file#1"
 6            },
 7            "SK": {
 8                "S": "2024-01-13T11:37:58-05:00"
 9            }
10        },
11        {
12            "PK": {
13                "S": "file#1"
14            },
15            "SK": {
16                "S": "2024-01-13T11:32:13-05:00"
17            }
18        }
19    ],
20    "Count": 2,
21    "ScannedCount": 2,
22    "ConsumedCapacity": null
23}

Number versioning

For applications that want to maintain a “latest” version with the ability to rollback to a prior version, a number-based versioning paradigm will be optimal.

DynamoDB key structure for number-based versioning

Inserting example data

To showcase the power of DynamoDB, let’s insert some values for a file object with an identifier of 2 and 2 different versions.

The first item will be the metadata for file#2. This contains the attributes for the file#2 when the application needs to fetch the latest version with the appropriate values.

1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#2"},"SK":{"S":"metadata"},"version":{"S":"2"},"foo":{"S":"baz"}}'

The second item will contain version 1’s information.

1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#2"},"SK":{"S":"version#1"},"version":{"S":"1"},"foo":{"S":"bar"}}'

The third item will contain version 2’s information.

1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#2"},"SK":{"S":"version#2"},"version":{"S":"2"},"foo":{"S":"baz"}}'

For this method, we duplicate the attributes and values of version#2 onto the main metadata object.

Query for latest versions

Let’s query for all versions:

1aws dynamodb query \
2    --table-name table \
3    --key-condition-expression "PK=:pk and begins_with(SK, :sk)" \
4    --expression-attribute-values '{":pk":{"S":"file#2"},":sk":{"S":"version#"}}' \
5    --no-scan-index-forward

Output:

 1{
 2    "Items": [
 3        {
 4            "version": {
 5                "S": "2"
 6            },
 7            "SK": {
 8                "S": "version#2"
 9            },
10            "PK": {
11                "S": "file#2"
12            },
13            "foo": {
14                "S": "baz"
15            }
16        },
17        {
18            "version": {
19                "S": "1"
20            },
21            "SK": {
22                "S": "version#1"
23            },
24            "PK": {
25                "S": "file#2"
26            },
27            "foo": {
28                "S": "bar"
29            }
30        }
31    ],
32    "Count": 2,
33    "ScannedCount": 2,
34    "ConsumedCapacity": null
35}

The user decides they want version#1 to be the selected version for file#2. To satisfy the request perform the following steps:

  1. Modify the metadata item’s version attribute to 1
  2. Duplicate the version’s attributes onto the metadata item
1aws dynamodb put-item \
2    --table-name table \
3    --item '{"PK":{"S":"file#2"},"SK":{"S":"metadata"},"version":{"S":"1"},"foo":{"S":"bar"}}'

Next time we fetch the latest version it’ll point to version 1:

1aws dynamodb get-item \
2    --table-name table \
3    --key '{"PK":{"S":"file#2"},"SK":{"S":"metadata"}}'

Output:

 1{
 2    "Item": {
 3        "version": {
 4            "S": "1"
 5        },
 6        "SK": {
 7            "S": "metadata"
 8        },
 9        "PK": {
10            "S": "file#2"
11        },
12        "foo": {
13            "S": "bar"
14        }
15    }
16}

#Aws   #Dynamodb   #Serverless  

Reply to this post by email ↪