[MLS-C01] [Exploratory Data Analysis] Kinesis Data Streams

Posted by Oscaner on June 16, 2022

  • Massively scalable and durable real-time data streaming service
  • Continuously capture gigabytes of data per second from thousands of sources
    • Website clickstreams
    • Database event streams
    • Social media feeds
    • Financial transactions
    • IoT events
    • Application logs
  • Enables real-time analytics
    • Real-time dashboards
    • Real-time anomaly detection
    • Dynamic pricing
    • Real-time fraud detection

Key Concepts

  1. Data Producer

An application that emits data records as they are generated

  1. Data Consumer

AWS service or distributed Kinesis application that retrieves data from Kinesis Data Streams

  1. Shard

A shard is the base throughput unit of a Kinesis Data Stream

Data producers assign partition keys to records

Partition keys ultimately determine which shard ingests the data record for a data stream

Data consumers retrieve data from all shards in a stream as the data is generated

  1. Data Stream

A logical grouping of shards

Data stream will retain data for 24 hours, or up to 7 days with extended retention enabled

Putting Data Into Streams

Data producers put data into Amazon Kinesis data streams using the Kinesis Data Streams APIs

  • Amazon Kinesis Producer Library
    • Highly configurable library that puts data into an Amazon Kinesis data stream
    • Simple, asynchronous, reliable interface to achieve high producer throughput
  • Amazon Kinesis Agent
    • Pre-built Java application that collects and sends data to your Amazon Kinesis stream
    • Install the agent on web servers, log servers, and database servers
    • Agent monitors files/database resources and continuously sends data to your stream

Big Data Architecture

Additional Key Points

  • Shards are append-only logs
  • Shards contain ordered sequence of records ordered by arrival time
  • One shard can ingest up to 1000 data records per second, or 1MB/sec
  • Specify the number of shards needed when you create a stream
  • Add/Remove shards from stream dynamically as throughput changes via API, Lambda, Auto scaling
  • Enhanced fan-out: one shard allows 1MB/sec in and 2MB/sec out for each consumer
  • Non-Enhanced fan-out: one shard allows 1MB/sec in and 2MB/sec out shared across consumers
  • Monitor shard-level metrics in Amazon Kinesis Data Streams

Labs


本文由 Oscaner 创作, 采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外, 均为本站原创或翻译, 转载前请务必署名