A query operation as specified in DynamoDb documentation:
A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.
and the scan operation:
A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.
A query operation uses the joint hash and range index and is available only for tables with a composite primary key. Scan on the other hand goes through the entire table and then filters the results afterwards.
Assume that we have a Users table and a Posts table which look like:
Get first K versions of a post:
Get all users in a specific city:
Performance and Cost Considerations
Operation Speed: Query operation is expected to be very fast and only marginally slower than a get operation. The scan operation on the other hand can take anywhere from 50-100ms to a few hours to complete and depends on the size of the table.
Read Unit Cost: For a query operation the read units consumed depend on the total size of all the items returned. If for example, a query operation returns 20 items with a total size of 20.1K, the read units consumed would be 21 (assuming that the operation finishes within a second). Since the scan operation is performed by going through each item in the table, for any reasonably sized table the scan operation will consume all the read units until the operation finishes. Looking at it another way, the total time required for the scan operation to complete can be approximated as at least: T = S / (R * 2), where S is the total size of the table in kilobytes and R is the read units provisioned for a table. The reads for scan are eventually consistent and consume half the read units compared to consistent reads. For a 1GB table with a provisioning of 100 read units, it would take approximately 84 minutes. Note that one scan operation wouldn't last 84 minutes because DynamoDB will only evaluate 1MB worth of data before filtering and returning the results. The entire table scan would therefore require 1000 scan operations.
Operation Overhead: Since a scan operation can consume all read units, it can slow down other operations by starving them.
When modelling data for DynamoDB, one must try to minimize any potential scan operations. Designing tables for performance and ways to minimize the impact of scan operations is covered in my next post: DynamoDB: Modelling data for performance