Part 1: Introduction to Kafka

LiYen Yoong
2 min readMar 27, 2020

The idea came about when we have multiple source systems and target systems, and the integrations need to write many different configurations. Each of these configurations comes with difficult around:

  • Protocol — how the data is transported (example: HTTP, REST, TCP, etc.).
  • Data format — how the data is parsed (example: CSV, JSON, binary, etc.).
  • Data schema — how the data is shaped and may change.

Each source system may have an increased load from the connections.

Why Apache Kafka?

Decoupling the data streams & systems

What is Apache Kafka?

Apache Kafka is a high-throughput distributed messaging system (or streaming platform). It was created by LinkedIn, and it is an open-source project maintained by Confluent.

You can have any data streams from websites, micro-services, financial transactions, etc. Once it is in Kafka, you may want to put the data into your databases, analytics system, email system, etc.

Kafka uses for these broad classes of applications:

  • Building real-time streaming data pipelines that reliably get data between systems or applications.
  • Building real-time streaming applications that transform or react to the streams of data.

Kafka Concepts

  • Kafka runs on a cluster on one or more servers that can span multiple data centres.
  • The Kafka cluster stores streams of records in categories called topics.
  • Each record consists of a key, a value, and a timestamp.

Originally published at http://liyenz.wordpress.com on March 27, 2020.

--

--