Kafka

AKA Apache Kafka.
Open-source distributed event streaming platform.
Ensuring a continuous flow and interpretation of data so that the right information is at the right place, at the right time.
Battle-tested, distributed, highly scalable, elastic, fault-tolerant, and secure solution.
Can be deployed on bare-metal hardware, virtual machines, and containers.
Supports on-premise servers and cloud environments.

Event streaming

Digital equivalent of the human body’s Central Nervous System (CNS).
Technological foundation for the ‘always-on’ world; where the user of software is other softwares.
It means:
- Capturing data in real-time.
- From different event sources: e.g. databases, sensors, mobile devices, cloud services, and software applications.
- In the form of streams of events.
- To store these event streams durably for:
  - Later retrieval.
  - Manipulating.
  - Processing.
  - Reacting to the event streams in real-time as well as retrospectively.
  - Routing the event streams to different destinations.

Processing payments and financial transactions in real-time (e.g. stock exchanges, banks, and insurances).
Track and monitor cars, trucks, fleets, and shipments in real-time (e.g. logistics and the automotive industry).
Capture and analyze sensor data from IoT devices or other equipment (e.g. inspections with robots).
Collect and immediately react to customer interactions and orders.
Monitor patients in hospital care and predict changes in condition.
Foundation for data platforms, event-driven architectures, and microservices.

Pub/sub pattern.
Storing streams of events durably and reliably.

[!NOTE]

Kafka’s performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.
Live or retrospective processing.

A distributed system consisting of servers and clients.
- Client: SDK that read, write, and process streams of events.
Communicates via a high-performance TCP network protocol.

How Kafka works infographic

Producers and consumers are fully decoupled and agnostic of each other, resulting high scalability.

# Topic:: A channel for categorizing events.; A topic is similar to a folder in a filesystem.; Multi-producer and multi-subscriber.; Every topic can be replicated, even across geo-regions or datacenters, so that there are always multiple brokers that have a copy of the data. A common production setting is a replication factor of 3, i.e., there will always be three copies of your data.
# Event:: AKA record or message.; Usually has a key, value, timestamp, and optional metadata headers. Here's an example event.; Similar to the files in a folder (topic).; Can be read as often as needed (but can also guarantee to process events exactly-once).
# Partitioning:: Topics are partitioned.; A topic is spread over a number of "buckets" located on different Kafka brokers.; Important for scalability, because it allows client apps to read/write data from/to many brokers at the same time.
# Producer:: Client apps that send data to our Kafka topics.
# Consumer:: Client apps that receive data from Kafka topics by subscribing to the events.; A cluster of one or more servers that can span multiple datacenters or cloud regions.; Some of these servers form the storage layer, called the brokers.; Some manages data distribution.

Version format mirrors the Kafka format; <scala version>-<kafka version>.
Customize any Kafka parameters by adding them as environment variables, learn more.