Part 2: Introduction to Kafka
I wrote about the Introduction to Kafka a while ago without touching the technical side of it and its use cases. I will not explain in detail for each use cases for now. There are couples of jargons to be familiar with this blog. I used an image I downloaded from the Internet to explain it.
There are four core APIs (Application Programming Interfaces) we need to know:
- The Producer API allows an application to publish a stream of records to one or more Kafka topics.
- The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
- The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
- The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
We can run the Kafka in a single node server (node) or in a cluster mode with multiple nodes (Kafka broker). Producers are processes that publish data or a stream of records (push messages) into Kafka topics within the broker. A consumer pulls records off a or more Kafka topic and processes the streams of records produced to them.
Main parts of Kafka system:
- Broker: Handles all requests from clients (produce, consume, and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster.
- Zookeeper: Keeps the state of the cluster (brokers, topics, users). (It is a system).
- Producer: Sends records to a broker.
- Consumer: Consumes batches of records from the broker.
For now, I keep the explanation of Zookeeper in another blog. In my self-learning course, the instructor shared some use cases of using the Kafka:
- Messaging system
- Activity tracking
- Application logs gathering
- Streaming processes with Spark or Kafka Stream API.
- Decoupling system dependencies.
- Integration with Spark, Flink, Hadoop, Storm and other Big Data technologies.
Reference:
https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html
https://docs.confluent.io/
https://kafka.apache.org/
Originally published at http://liyenz.wordpress.com on April 4, 2020.