Wednesday, March 28, 2018

Kafka Connect


Kafka Connect- https://www.confluent.io/product/connectors/

Kafka Connect
Kafka Connect is a framework included in Apache Kafka that integrates Kafka with other systems. Its purpose is to make it easy to add new systems to your scalable and secure stream data pipelines.

To copy data between Kafka and another system, users instantiate Kafka Connectors for the systems they want to pull data from or push data to. Source Connectors import data from another system (e.g. a relational database into Kafka) and Sink Connectors export data (e.g. the contents of a Kafka topic to an HDFS file).


https://kafka.apache.org/documentation/#connectapi - Look under Transformations

https://docs.confluent.io/current/connect/connect-hdfs/docs/hdfs_connector.html
https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

Few links to customize the connectors or write own connectors.

JAR files would be under /home/gorrepat/confluent/confluent-4.0.0/share/java/kafka-connect-hdfs


https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroConverter.java


https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/json/JsonFormat.java

Need to add Custom logic(fingerprint removal) in write() of https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/json/JsonRecordWriterProvider.java

3 comments:

  1. Really nice blog post.provided a helpful information.I hope that you will post more updates like thisBig data hadoop online Training Bangalore

    ReplyDelete
  2. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    Big Data Hadoop training in electronic city

    ReplyDelete
  3. It was really a wonderful article and I was really impressed by reading this blog. We are giving all software Courses such as Data science, big data, hadoop, apache spark scala, python and many other course. hadoop training institute in bangalore is one of the reputed training institute in bangalore. They give professional and real time training for all students.

    ReplyDelete