Sunday, June 10, 2018

Amazon Kenisis

Kenisis:


Can be run on EC2 Instances.
Similar as Kafka

Records of a stream can be accessible up to 24 hours by default and can be extended up to 7 days by enabling extended data retention.

The maximum size of a data blob (the data payload before Base64-encoding) in one record is 1 megabyte (MB).

Kenisis storage:


1. Streams

2. Record - The unit of data of the Kinesis data stream, which is composed of a sequence number, a partition key, and a data blob.

data blob in simple terms can be called as actual message.

3. Shards - Data would be stored in Shards, replicated in Availability zones. Available for applications to consume the records.

4. Streams are made of multiple shards. Each shard can ingest data upto 1MB/sec and upto 1000 Transactions Per Second

Stram = shard1 + shard2 +shard3..

As stream data increases shards can be increased and can reduced when stream inflow is less.

5. Partitioning feature is also available, where in if customer_id is chosen as partition key then HASH(cust_id) would be done and always same customer id would go to
same shard.

6. APIs are available to read from kenisis. These client libraries would handle the complexity from multiple shards and distributed mode. The experience would be likle reading from a single source.

7. Auto scaling can be enabled, which can spun up a new EC2 instance when number of shards are increased.

Features of Amazon Kinesis

Real-time processing − It allows to collect and analyze information in real-time like stock trade prices otherwise we need to wait for data-out report.

Easy to use − Using Amazon Kinesis, we can create a new stream, set its requirements, and start streaming data quickly.

High throughput, elastic − It allows to collect and analyze information in real-time like stock trade prices otherwise we need to wait for data-out report.

Integrate with other Amazon services − It can be integrated with Amazon Redshift, Amazon S3 and Amazon DynamoDB.

Build kinesis applications − Amazon Kinesis provides the developers with client libraries that enable the design and operation of real-time data processing applications. Add the Amazon Kinesis Client Library to Java application and it will notify when new data is available for processing.

Cost-efficient − Amazon Kinesis is cost-efficient for workloads of any scale. Pay as we go for the resources used and pay hourly for the throughput required.

Source: https://www.youtube.com/watch?v=ZROcwFis7wI


Queries:

1. How to programatically increase shars or is it automatic?