- Fundamental Unit of Capacity: A shard is the core building block of a Kinesis data stream. It determines the stream's throughput in terms of data ingestion and retrieval.
- Sequence of Records: Each shard maintains an ordered sequence of data records. Data within a shard is immutable; you cannot delete or modify specific records directly.
- Scaling Mechanism: The number of shards in your Kinesis data stream directly affects its overall capacity. You can add or remove shards to scale your stream's ability to handle data.
Key Capacity Metrics of a Shard
- Data Ingest: A single shard supports up to 1MB/second of data writes and up to 1,000 records per second.
- Data Retrieval: A single shard supports up to 2MB/second of data reads. Note that read capacity is shared between all consumers of a shard.
How Shards Work
- Producers: Applications acting as producers send data records to a Kinesis data stream.
- Partition Keys: You can optionally provide a partition key when putting data into the stream. Kinesis uses the partition key to determine which shard the record is assigned to. This helps distribute data evenly and can be used to maintain the order of related records.
- Consumers: Applications that process the data records from the stream are consumers. Consumers typically read data in batches from multiple shards in parallel.
Managing Shards
- Provisioning: When creating a Kinesis data stream, you specify the initial number of shards.
- Resharding: You can dynamically increase or decrease the number of shards using the Kinesis API or management console to adjust the throughput of your stream as needed.
Use Cases
- Real-time Data Processing: Process high-volume data streams for applications like website clickstream analysis or IoT device telemetry.
- Log Aggregation: Collect and process logs from various applications or systems in a central location.
- Event-Driven Architectures: Use Kinesis to trigger actions downstream in response to specific data events.