Kafka
- AKA Apache Kafka.
- Open-source distributed event streaming platform.
- Ensuring a continuous flow and interpretation of data so that the right information is at the right place, at the right time.
- Battle-tested, distributed, highly scalable, elastic, fault-tolerant, and secure solution.
- Can be deployed on bare-metal hardware, virtual machines, and containers.
- Supports on-premise servers and cloud environments.
Event streaming
- Digital equivalent of the human body’s Central Nervous System (CNS).
- Technological foundation for the ‘always-on’ world; where the user of software is other softwares.
- It means:
- Capturing data in real-time.
- From different event sources: e.g. databases, sensors, mobile devices, cloud services, and software applications.
- In the form of streams of events.
- To store these event streams durably for:
- Later retrieval.
- Manipulating.
- Processing.
- Reacting to the event streams in real-time as well as retrospectively.
- Routing the event streams to different destinations.
Event streaming use cases
- Processing payments and financial transactions in real-time (e.g. stock exchanges, banks, and insurances).
- Track and monitor cars, trucks, fleets, and shipments in real-time (e.g. logistics and the automotive industry).
- Capture and analyze sensor data from IoT devices or other equipment (e.g. inspections with robots).
- Collect and immediately react to customer interactions and orders.
- Monitor patients in hospital care and predict changes in condition.
- Foundation for data platforms, event-driven architectures, and microservices.
Key capabilities
- Pub/sub pattern.
-
Storing streams of events durably and reliably.
[!NOTE]
Kafka’s performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.
- Live or retrospective processing.
How it works
- A distributed system consisting of servers and clients.
- Client: SDK that read, write, and process streams of events.
- Communicates via a high-performance TCP network protocol.
- Producers and consumers are fully decoupled and agnostic of each other, resulting high scalability.
Glossary
- # Topic:
- A channel for categorizing events.
- A topic is similar to a folder in a filesystem.
- Multi-producer and multi-subscriber.
- Every topic can be replicated, even across geo-regions or datacenters, so that there are always multiple brokers that have a copy of the data. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data.
- # Event:
- AKA record or message.
- Usually has a key, value, timestamp, and optional metadata headers. Here's an example event.
- Similar to the files in a folder (topic).
- Can be read as often as needed (but can also guarantee to process events exactly-once).
- # Partitioning:
- Topics are partitioned.
- A topic is spread over a number of "buckets" located on different Kafka brokers.
- Important for scalability, because it allows client apps to read/write data from/to many brokers at the same time.
- # Producer:
- Client apps that send data to our Kafka topics.
- # Consumer:
- Client apps that receive data from Kafka topics by subscribing to the events. <dt id=serverDefinition"> # Server: </dt>
- A cluster of one or more servers that can span multiple datacenters or cloud regions.
- Some of these servers form the storage layer, called the brokers.
- Some manages data distribution.
Docker wurstmeister/kafka
- Version format mirrors the Kafka format;
<scala version>-<kafka version>
. - Customize any Kafka parameters by adding them as environment variables, learn more.