Part 2: Introduction to Kafka

3 min readApr 4, 2020

I wrote about the Introduction to Kafka a while ago without touching the technical side of it and its use cases. I will not explain in detail for each use cases for now. There are couples of jargons to be familiar with this blog. I used an image I downloaded from the Internet to explain it.

Image: https://images.app.goo.gl/A6PnHPocHe8yJeveA

There are four core APIs (Application Programming Interfaces) we need to know:

The Producer API allows an application to publish a stream of records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

We can run the Kafka in a single node server (node) or in a cluster mode with multiple nodes (Kafka broker). Producers are processes that publish data or a stream of records (push messages) into Kafka topics within the broker. A consumer pulls records off a or more Kafka topic and processes the streams of records produced to them.

Main parts of Kafka system:

Broker: Handles all requests from clients (produce, consume, and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster.
Zookeeper: Keeps the state of the cluster (brokers, topics, users). (It is a system).
Producer: Sends records to a broker.
Consumer: Consumes batches of records from the broker.

For now, I keep the explanation of Zookeeper in another blog. In my self-learning course, the instructor shared some use cases of using the Kafka:

Messaging system
Activity tracking
Application logs gathering
Streaming processes with Spark or Kafka Stream API.
Decoupling system dependencies.
Integration with Spark, Flink, Hadoop, Storm and other Big Data technologies.

Reference:
https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html
https://docs.confluent.io/
https://kafka.apache.org/

Originally published at http://liyenz.wordpress.com on April 4, 2020.

Part 2: Introduction to Kafka

Written by LiYen Yoong