Brokers & Queues
It is around a decade that many firms , because of complexity in business , tight competition with competitors and emerging new technology stacks like cloud-computing, were migrated or still are moving their software and/or systems to the distributed architectures like microservices or event-driven architecture.
When designing a distributed system and complex software with many independent components, several key questions and considerations arise:
How can the system and its components be informed about changes and/or failures in services?
How can we send messages to components to request information or command a service to perform a task efficiently?
How can these components and services communicate with each other efficiently?
To address all those questions, we need to grasp how queues and brokers work differently and understand their functionalities to figure out the messaging best-practice in distributed systems. Additionally, we will examine two of the most popular brokers in this article.
Message
Before diving into the definitions of queues and brokers, it's crucial to understand a fundamental element that underpins both: the message. Without a clear grasp of what a message is, the concepts of queues and brokers won't be very meaningful.
Message queues operate on a first-in, first-out (FIFO) principle to deliver messages. This means that the message that enters the queue first will also be the first one to be delivered. In other words, queues ensure that messages are delivered in the sequence they were added, maintaining a guaranteed order throughout the process.
Moreover, they help architects and developers in designing systems with loosely coupled services that operate independently. They also have the capability to store messages until they are reliably delivered to their intended recipients, thus enhancing the system's resilience.
Message Broker
In addition to their significant advantages like enhancing reliability, simplifying deployment, and enabling modular components, brokers also have a few drawbacks. Here, we'll delve into three notable disadvantages.
Increasing Architecture Complexity: It is the primary drawback of brokers which involves introducing an extra layer of complexity to the system, because they need configuration as well as maintenance and resource management. There are several solutions to mitigate the complexity disadvantage faced by brokers. These include:
- Evaluating the necessity of brokers in software
- Optimizing and maintaining resource allocation in brokers
- Utilizing automation tools to streamline configuration processes
Single Point of Failure: In distributed systems, message brokers play a crucial role as they are the (most probably) only service or application responsible for handling communication between different components. If a message broker fails, it disrupts communication between components, potentially resulting in system-wide failure. To address the single point of failure in message brokers, several strategies such as below can be implemented:
- Deploy multiple instances of the message broker across different servers to ensure high availability feature
- Implementing load balancing mechanisms to not only improve performance but also enhance fault tolerance
- Backing up message broker data and defining a disaster recovery plan
With this fundamental understanding of brokers established, let's now explore the pros and cons of the most widely used message brokers currently available in the market.
Comparison & Contrast
RabbitMQ
Messaging Paradigm: In RabbitMQ, the messaging system is centered around three key features for routing and delivering events and messages:
Messages and Message Flow: A message is a unit of data sent between producers and consumers. The message flow describes how this data moves from the producers, who generate the messages, to the consumers, who receive and process them.
Producers & Consumers: They are applications designed to function as either senders or receivers of messages.
Exchange: It is responsible for routing messages to their appropriate destinations. It means, exchange types (direct, fanout, topic, and headers) allow us to define how messages are routed to their destinations.
Furthermore, RabbitMQ uses the Advanced Message Queuing Protocol (AMQP) to facilitate communication between components, supporting both publish/subscribe and point-to-point messaging models.
Architecture
Broker: At the core of RabbitMQ lies the component responsible for receiving and delivering messages, broker. It serves as a central messaging hub, efficiently managing exchanges, queues, and bindings.
Nodes: In a RabbitMQ architecture, a single server running a RabbitMQ instance is referred to as a node. To achieve scalability, multiple nodes are typically used.
Cluster: It refers to a collection of RabbitMQ nodes collaborating as a single unit. Grouping multiple RabbitMQ nodes into a cluster enhances both scalability and high availability in your system architecture.
Plugins: They enhance RabbitMQ by adding extra functionalities and extending its features. They can be enabled or disabled as needed. One such example is the Federation plugin, which connects RabbitMQ brokers to other brokers. Another widely used example is authentication and authorization plugins. These plugins facilitate the use of external systems, such as LDAP, for user management and access control.
Scalability: RabbitMQ is typically scaled vertically, meaning that its capacity can be enhanced by incorporating additional resources, such as CPU or memory, to a single instance. This approach is particularly valuable in large-scale distributed systems, where the demand for scalability may fluctuate across different services at varying times. By scaling vertically, RabbitMQ ensures that it can efficiently handle growing workloads without significant architectural changes.
In addition to vertical scaling, RabbitMQ can also be scaled horizontally, where multiple broker nodes are added to form a cluster. This allows for load distribution and fault tolerance, making it highly adaptable to environments with unpredictable traffic patterns.
Latency: RabbitMQ is inherently designed to deliver messages with minimal latency, ensuring swift and efficient message processing. However, several factors can significantly influence and degrade this performance. These include the size of the message, inherent network latency, the need for message persistence, and the overhead of serialization and deserialization processes. Each of these elements contributes to potential delays in message delivery, despite the broker's optimized architecture.
Moreover, optimizing latency in RabbitMQ often involves carefully tuning both the system architecture and message-handling strategies. Employing techniques like batching messages, fine-tuning resource allocation, or leveraging faster storage solutions can help alleviate the effects of these latency-inducing factors.
Durability: Durability in RabbitMQ enables the creation of a resilient and dependable messaging system by ensuring that both queues and exchanges are configured in durable mode, thus supporting message persistence. This guarantees that messages are retained even in the event of a broker restart, ensuring that no critical data is lost. By persisting messages to disk, RabbitMQ provides a safeguard against unexpected failures, offering a robust solution for handling long-running processes and critical operations.
Nevertheless, it is essential to recognize that this durability comes at a cost. Persisting messages to disk introduces a trade-off, as it can potentially impact system performance. Writing messages to disk is inherently slower than keeping them in memory, and depending on the volume of data and the system's workload, this can lead to performance degradation.
Use Cases: RabbitMQ serves as an excellent solution for managing message flows between different components within microservices and event-driven architectures. Its capability to act as a mediator or task distributor in orchestrated EDA scenarios makes it an invaluable tool in ensuring seamless communication and coordination across various services.
Beyond its core function of message brokering, RabbitMQ enhances system reliability by providing fault tolerance and message persistence. This ensures that even in the event of service failures, important messages are not lost, thereby supporting high availability in distributed systems.
Apache Kafka
In the modern landscape of distributed systems, Apache Kafka stands out as the premier solution for handling real-time data processing. Originally developed by LinkedIn, Kafka serves as a robust distributed streaming platform and a publish/subscribe messaging system. At its core, Kafka brokers act as a central hub, efficiently distributing and delivering messages across networks. Its design allows businesses, architects, and developers to manage vast streams of data with reliability and speed.
Beyond its foundational role, Apache Kafka is engineered with several key features that make it indispensable for contemporary data-driven applications. It boasts fault tolerance, ensuring that data remains intact even in the event of system failures. Kafka’s architecture also supports high-throughput and scalability, allowing for seamless expansion as the volume of data increases, all while maintaining low latency in message delivery, making it ideal for applications requiring near-instantaneous data processing.
When considering a messaging broker for a product, understanding the full potential of Kafka is essential. Its ability to handle large-scale data operations, coupled with its resilience and performance, makes it a go-to choice for many organizations aiming to maintain efficient data pipelines.
Messaging Paradigm: At its core, Apache Kafka is built on a sophisticated distributed message log, which serves as the foundation for both its architecture and messaging paradigm. A distributed message log is a data structure that ensures messages are stored, ordered, and persisted across various nodes within a distributed system. This architecture allows Kafka to provide fault tolerance, high availability, and horizontal scalability while maintaining the integrity of message delivery.
Kafka’s distributed message log offers a unique advantage by enabling real-time stream processing and the capability to replay messages, allowing consumers to retrieve data at any point in time. Unlike traditional messaging systems, Kafka decouples message producers and consumers, enabling asynchronous communication and enhanced throughput in large-scale systems.
Moreover, Kafka’s partitioning mechanism ensures that messages are evenly distributed across multiple brokers, thereby optimizing load balancing and increasing the overall system’s efficiency.
Architecture: The architecture of Apache Kafka has undergone significant evolution, particularly since version 2.8.0. Earlier versions relied heavily on ZooKeeper for cluster coordination and metadata management, but with the introduction of the quorum-based controller, Kafka transitioned to an architecture that no longer depends on ZooKeeper. This new controller allows brokers to handle metadata management internally, streamlining operations, improving scalability, and enhancing resilience by eliminating the need for an external service like ZooKeeper to maintain coordination.
The quorum-based controller not only enhances Kafka's ability to handle larger clusters but also aligns Kafka with contemporary trends in distributed computing, where eliminating external dependencies is increasingly prioritized for simplifying infrastructure. This advancement positions Kafka to better serve a wider range of use cases, from real-time analytics to event-driven architectures, as it continues to evolve into a more autonomous and efficient platform.
In the ZooKeeper-based architecture, the Kafka cluster is composed of five core components—brokers, producers, consumers, topics, and partitions—that work in tandem to facilitate distributed messaging. At the heart of the cluster are the brokers, which manage the data flows and ensure message persistence. Producers send messages to brokers, while consumers retrieve them from designated topics. Topics are further divided into partitions, enabling parallelism and scalability. Overseeing these components in Kafka versions 2.8.8 and earlier is ZooKeeper, which manages metadata, keeps track of the cluster state, and ensures proper synchronization across nodes.
Let's explore each of these components and understand how they work in more detail.
Zookeeper: This part is mainly used for managing tasks related to coordinating cluster such as member detection and leader/follower election. Furthermore, it is responsible for maintaining information about brokers and keeping track of detailed information about consumers and producers as well. Additionally, Zookeeper is responsible for overseeing and storing vital information (metadata) regarding the Kafka cluster. This includes general configuration of brokers and topics setting and assignment of partitions.
Producers: They are applications or services whose primary role of them is sending or publishing data (information) to Kafka brokers , where it will be stored in topics. In terms of how communication works between Kafka brokers and producers, they will be sending messages to topics asynchronously. This means they'll be doing it in a way where they don't have to wait for each message to be acknowledged before sending the next one.
Brokers: Think of them as the core building blocks of a Kafka cluster, distributed across several servers. In simpler terms, they make up the Kafka cluster. Each of these brokers, also referred to as nodes, has its own unique identifier (numeric ID), essentially acting as the heart of the Kafka cluster (as briefly mentioned).
Topics: They are logical channels that is used to organize and classify the the messages within an Apache Kafka cluster which divided to the partitions. Strictly speaking, a topic is a group or category that collects messages from producers and delivers them to consumers.
Partitions: Apache Kafka achieves scalability and parallelism through partitions, which combine to form a topic. Furthermore, these partitions are essential for Kafka's ability to scale horizontally. Each message has an offset as an identifier which are uniquely defined in partitions.
Scalability: It refers to the capability of a system to manage increasing workloads effectively, whether anticipated or sudden. In the context of Apache Kafka, scalability is achieved by efficiently distributing data across multiple brokers and partitions within a Kafka cluster. This allows Kafka to accommodate growing workloads seamlessly, ensuring that performance remains consistent as data volume or processing demand escalates.
Apache Kafka's architectural design inherently supports scalability through several key features. By dividing topics into multiple partitions, Kafka ensures that data can be processed in parallel. Partition replication enhances fault tolerance, while its distributed architecture allows new brokers to be added to the cluster dynamically, ensuring that Kafka can grow horizontally with ease. This contrasts with systems like RabbitMQ, which typically scale vertically by upgrading the resources of a single machine.
Complementing Kafka’s ability to scale horizontally, its distributed nature provides a robust foundation for fault tolerance and high availability. Each partition's replication ensures that data is not lost in case of broker failures, allowing Kafka to maintain operational efficiency even in the event of hardware or network issues.
In addition, Kafka’s scalability ensures that organizations can manage long-term growth without significant architectural overhauls. Whether scaling for larger data sets, higher throughput, or more complex workloads, Kafka’s partitioning model allows for fine-grained control over performance optimization.
Latency: Since Kafka is designed to support various types of applications such as real-time and semi-real-time, it is a very crucial feature and game changer metric that refers to elapsed time between when the event or message published by producer and when it is consumed by consumer.
In every application, latency can stem from various factors, some of which may be susceptible to compromise while others are beyond our control. Here, we're looking at the overall delay time in Kafka, which we call end-to-end latency. This includes three main parts that contribute to the delay as flows:
Producer/Consumer Latency: The network, which moves data between a producer and broker or between a broker and consumer, is crucial for application performance and Kafka latency. However, it can easily and detrimentally impact performance. Furthermore, we need to take into account additional actions such as data serialization/deserialization which could also result to latency.
Broker (Kafka) Processing Latency: When an event or message reaches the broker, it gets stored on the disk and later retrieved when requested by the consumer. This process relies on input/output operations, which can affect performance. Furthermore, factors like the number of message replications and schema validation (if applied) could play a significant role in increasing processing phase latency.
Cluster Latency: Typically, Apache Kafka operates with several nodes (brokers) and partitions. As you add more brokers, the internal network latency tends to increase.
Durability: Apache Kafka offers a robust durability feature, ensuring that events or messages are securely stored and retained, even in the face of system failures or downtime. This durability is achieved through several key mechanisms, with one of the most fundamental being data replication. Kafka replicates data across multiple brokers (nodes), creating redundancy that ensures even if one broker fails, the data remains accessible from other replicated copies. This high-availability model is central to Kafka’s ability to guarantee that no message is lost.
Another critical mechanism that underpins Kafka’s durability is the acknowledgment system. Kafka allows producers to receive confirmation (or acknowledgment) only once a message has been successfully written across the cluster, and not before. This ensures that the producer is aware of the message's secure storage and its replication within the cluster, adding an extra layer of reliability in message delivery.
The log retention policy, along with two other mentioned mechanisms, adds a strong durability feature to our cluster. There are two types of retention policies: size-based policy and the other one time-based policy.
Use Cases: Apache Kafka has become a cornerstone solution for addressing a wide range of business requirements, including message queuing, log aggregation, and sophisticated event sourcing. It also excels in change data capture (CDC), real-time data streaming and analysis, data integration, and enabling seamless communication between microservices. With its robust architecture and ability to handle high-throughput, low-latency data streams, Kafka empowers businesses to process and transport vast volumes of data efficiently and reliably.
Kafka’s versatility allows it to serve as a backbone for modern data pipelines, supporting diverse use cases from transactional systems to complex event-driven architectures. Whether capturing real-time data from various sources or ensuring reliable communication between distributed systems, Kafka's scalability and fault tolerance ensure that businesses can build solutions that evolve with their growing data needs.
Wrapping Up
Queues and brokers are essential components in distributed systems, providing reliable messaging between different services and applications. A message queue acts as a buffer, holding messages until the receiving system is ready to process them. This allows for asynchronous communication, meaning senders and receivers don’t need to be active at the same time, which greatly improves system resilience and performance. Message brokers, on the other hand, manage this communication by routing, transforming, or even storing the messages.
RabbitMQ and Apache Kafka are two of the most popular message brokers, each with its unique strengths. RabbitMQ follows a traditional queue-based architecture, where messages are routed to different consumers using sophisticated routing patterns. It is known for its simplicity, ease of setup, and versatility in handling both small-scale and enterprise-level tasks. RabbitMQ excels in situations where you need complex routing logic and guarantees of message delivery through acknowledgment and persistence mechanisms.
Apache Kafka, in contrast, focuses on handling large volumes of data streams with high throughput and scalability. Instead of traditional queues, Kafka uses a distributed log-based system where messages are stored in topics, and consumers can read them at their own pace. Kafka’s architecture allows it to retain messages for a configurable amount of time, enabling features like replaying messages and real-time data processing.
Share This Article