Part 2: Introduction to Kafka

LiYen Yoong
3 min readApr 4, 2020

I wrote about the Introduction to Kafka a while ago without touching the technical side of it and its use cases. I will not explain in detail for each use cases for now. There are couples of jargons to be familiar with this blog. I used an image I downloaded from the Internet to explain it.

Image: https://images.app.goo.gl/A6PnHPocHe8yJeveA

There are four core APIs (Application Programming Interfaces) we need to know:

  • The Producer API allows an application to publish a stream of records to one or more Kafka topics.
  • The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
  • The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
  • The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

We can run the Kafka in a single node server (node) or in a cluster mode with multiple nodes (Kafka broker). Producers are processes that publish data or a stream of records (push messages) into Kafka topics within the broker. A consumer pulls records off a or more Kafka topic and processes the streams of records produced to them.

Main parts of Kafka system:

  • Broker: Handles all requests from clients (produce, consume, and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster.
  • Zookeeper: Keeps the state of the cluster (brokers, topics, users). (It is a system).
  • Producer: Sends records to a broker.
  • Consumer: Consumes batches of records from the broker.

For now, I keep the explanation of Zookeeper in another blog. In my self-learning course, the instructor shared some use cases of using the Kafka:

  • Messaging system
  • Activity tracking
  • Application logs gathering
  • Streaming processes with Spark or Kafka Stream API.
  • Decoupling system dependencies.
  • Integration with Spark, Flink, Hadoop, Storm and other Big Data technologies.

Reference:
https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html
https://docs.confluent.io/
https://kafka.apache.org/

Originally published at http://liyenz.wordpress.com on April 4, 2020.

--

--