Each item in a DynamoDB table has a maximum size limit of 400 KB, including both the attribute names and values. This limit applies to all data types: strings, numbers, and binary data.
The three best ways to mitigate the maximum size limit:
importboto3defpartition_data(data,size):return[data[i:i+size]foriinrange(0,len(data),size)]# 100 paragraphs of Lorem ipsumlorem="""
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Sem integer vitae justo eget magna. At tellus at
urna condimentum mattis pellentesque id. Habitasse...
"""dynamodb=boto3.resource("dynamodb")table=dynamodb.Table("lorem")partition_key="lorem"sort_key_prefix="p#"# p for partition# Write chunks to DynamoDBchunks=partition_data(lorem,5000)fori,cinenumerate(chunks):table.put_item(Item={"pk":partition_key,"sk":f"{sort_key_prefix}{i}","data":c})# Read chunks from DynamoDBresponse=table.query(KeyConditionExpression="pk = :pk and begins_with(sk, :sk)",ExpressionAttributeValues={":pk":partition_key,":sk":sort_key_prefix},ScanIndexForward=True)# Query for all paginated results if applicable.items=response["Items"]while"LastEvaluatedKey"inresponse:response=table.query(ExclusiveStartKey=response["LastEvaluatedKey"])items.update(response["Items"])# Concatenate the data field from all the itemslorem_from_dynamodb="".join(i["data"]foriinitems)print(lorem==lorem_from_dynamodb)# prints True
Compress the data
Try to reduce the size of your data by compression. Compression algorithms like Gzip can significantly reduce the size of the data.
importboto3importzlibdefcompress_data(data):returnzlib.compress(data.encode())# 100 paragraphs of Lorem ipsumlorem="""
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Sem integer vitae justo eget magna. At tellus at
urna condimentum mattis pellentesque id. Habitasse...
"""dynamodb=boto3.resource("dynamodb")table=dynamodb.Table("lorem")partition_key="lorem"sort_key="lorem"table.put_item(Item={"pk":partition_key,"data":compress_data(lorem)})response=table.get_item(Key={"pk":partition_key})data=response["Item"]["data"]lorem_from_dynamodb=zlib.decompress(bytes(data)).decode()print(lorem_from_dynamodb==lorem)# prints true
importboto3importzlibdefpartition_data(data,size):return[data[i:i+size]foriinrange(0,len(data),size)]defcompress_data(data):returnzlib.compress(data.encode())# 100 paragraphs of Lorem ipsumlorem="""
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Sem integer vitae justo eget magna. At tellus at
urna condimentum mattis pellentesque id. Habitasse...
"""dynamodb=boto3.resource("dynamodb")table=dynamodb.Table("lorem")partition_key="lorem"sort_key_prefix="p#"# p for partition# Write chunks to DynamoDBchunks=partition_data(lorem,50000)fori,cinenumerate(chunks):table.put_item(Item={"pk":partition_key,"sk":f"{sort_key_prefix}{i}","data":compress_data(c),})# Read chunks from DynamoDBresponse=table.query(KeyConditionExpression="pk = :pk and begins_with(sk, :sk)",ExpressionAttributeValues={":pk":partition_key,":sk":sort_key_prefix},ScanIndexForward=True,)# Query for all paginated results if applicable.items=response["Items"]while"LastEvaluatedKey"inresponse:response=table.query(ExclusiveStartKey=response["LastEvaluatedKey"])items.update(response["Items"])# Concatenate the data field from all the itemslorem_from_dynamodb="".join(zlib.decompress(bytes(i["data"])).decode()foriinitems)print(lorem_from_dynamodb==lorem)# prints true
Store the data in S3
Consider storing the data in S3 as opposed to an attribute value in DynamoDB.
importboto3# 100 paragraphs of Lorem ipsumlorem="""
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Sem integer vitae justo eget magna. At tellus at
urna condimentum mattis pellentesque id. Habitasse...
"""bucket_name="bucket_name"object_key="object_key"partition_key="lorem"s3_key=f"s3://{bucket_name}/{object_key}"# Store data in S3 objects3=boto3.client("s3")s3.put_object(Bucket=bucket_name,Key=object_key,Body=lorem.encode())# Store reference to S3 object in DynamoDBdynamodb=boto3.resource("dynamodb")table=dynamodb.Table("lorem")table.put_item(Item={"pk":partition_key,"s3_key":s3_key})# Get reference to S3 object in DynamoDBresponse=table.get_item(Key={"pk":partition_key})s3_key=response["Item"]["s3_key"]# Read contents of S3 objectbucket,key=s3_key[5:].split("/")# remove "s3://" prefix and split on "/"response=s3.get_object(Bucket=bucket,Key=key)lorem_from_s3=response["Body"].read().decode()print(lorem_from_s3==lorem)# prints True