- Title: [[The Big Little Guide to Message Queues]]
- Type: #source/article
- Author:
- Reference: https://sudhir.io/the-big-little-guide-to-message-queues/
- Published at: [[2020-12-20]]
- Reviewed at: [[2021-06-21]]
- Links: [[Choosing between AWS messaging services]] [[MQTT]] [[Kafka]] [[Message Queues]] [[Concurrency]] [[Control Flow]]
---
## Questions
- message queue vs streaming
- disruptor pattern
## Topics covered
- What are message queues and their history
- Useful mental models
- Delivery guarantees - at-least-once, at-most-once, and exactly-once
- Message ordering and and FIFO guarantees and how they affect parallelism and performance.
- Patterns for fan-out (delivering 1 message to many systems) and fan-in (receiving messages from many system into 1)
- Pros and cons of popular message queue systems today.
## What are message queues
Systems for transferring information between 2 systems. "Messages" can be data, metadata, signals, or a combination. The sending and receiving systems could be processes on the same computer, loosely coupled modules on the same app, services on difference computers or tech stacks.
The message queue system itself isn't too concerned with the contents of the message, it's concerned with the publishers, subscribers and semantics of message delivery. Message queues embody the [[end-to-end principle of network design]] where it's up to the publisher(s) and subscribers(s) to define the content and meaning of the messages.
## Why do we need them
Systems need to exchange information. Sometimes APIs make sense (when synchronous information exchange and control is needed). Often times, systems are more decoupled. APIs end up being more brittle because you need to couple the scale and availability of systems. Moving files around is usually a kludge for not having a message queue.
By decoupling the systems, rather than making all your systems [[High Availability | HA]], you just need to focus on making the message queue [[High Availability | HA]]
> There are plenty of ways of getting these messages around using API calls, file systems, or many other abuses of the natural order of things; but all of these are ad-hoc implementations of the message queue that we sometimes refuse to acknowledge we need.
### What's the catch
Don't use a queue when you don't need to. Queues reduce complexity of systems that need to communicate but a single systems that doesn't need to communicate is even simpler. Sometimes you need to be synchronized and coupled and a synchronous API call is the better fit (e.g. a web server and data service in an OLTP application or time-sensitive applications where processing latency needs to be predictable).
Queues tend to increase individual message processing latency in exchange to throughput.
If we do have 2 loosely coupled systems that need to exchange information in a durable way, message queues are indispensable *(where did I get the impression that message queues were ephemeral and best attempt?)*
## Semantics
### At-least-once
Guarantees delivery with an efficient implementation at the cost of the receiver *possibly* receiving the message multiple times.
Most common, easiest to implement (correctly) and easiest to reason about. If the message queue system / broker has message or a recipient, it keeps sending it until you acknowledge it.
The application determines if dupes are acceptable or if you need to de-dupe at some point. e.g. if you're moving money, you'll want to uniquely identify each message so you only take the action once.
### At-most-once
It's a less common pattern that pushes delivery retry to the end clients. Used when a repeated message would be disastrous but missing messages are OK. ^[e.g. akka, while not a message queue, has at-most-once message delivery semantics to improve delivery speed and reduce actor mailbox complexity. It expects clients to use the ask pattern when a response is needed https://www.lightbend.com/blog/how-akka-works-exactly-once-message-delivery]. It can either be very technically complicated to implement, e.g. where the broker makes a lot of effort to record delivery to make sure its never sends a message more than once or it can be technically simpler to implement where the system makes a best attempt at delivery with no retries (similar to how a router just passes along UDP packets with no local memory or it).
### Exactly-once
tl;dr; exactly once delivery is impossible so we need exactly-once processing – using at-least once we can guarantee the receiver never misses messages and it can implement the appropriate processing.
> "The holy grail of messaging and the fountain of a lot of snake-oil".
Everyone starts thinking "how hard could it be" and either moves to at-least-once + message identity or tries to sell an exactly-once system to those you haven't figured out at-least-once yet.
**Fundamental truths**
- Senders and receivers are imperfect (even with perfect code, the systems they're running on might crash)
- Networks are imperfect, packets can get dropped, connections closed and requests restarted... it's typical the [[Bysantine Generals Problem]]
Systems that claim *exactly-once* are making a narrow claim about the messaging system itself at best and ignoring the end to end properties when you include the clients. E.g. a broker might implement some form of ordering, locking, hashing, timers and identity tokens to ensure that a message (once received) is never re-delivered after it's been acknowledged. Nothing prevents the sender from re-sending or the receiver from dropping an acknowledgment and re-receiving a message.
## Ordering vs Parallelism
> "Doing work in a sequence and doing multiple pieces of work at the same time are always at conflict with each other."
- **Ordering -> A speed limit on the queue**
- **Parallelism -> a system can scale out almost indefinitely** ^[e.g. the AWS SQS queue with 250 billion messages being consumed at 1M messages / sec https://twitter.com/timbray/status/1246157403663388672]
- **Many applications that appear to *require* FIFO can usually be made parallel with a bit of creativity**
All your messages can delivered in order, imposing a limit on processing (hence delivery) throughput because only 1 receiver can process a message at a time *or* you get to consume your messages in parallel, sacrificing [[First in first out | FIFO]] ordering guarantees.
A distributed lock across parallel consumers, in addition to being complex, makes the problem equivalent to single-consumer FIFO ordering. If you're not careful about lease timeouts you could either end up in a dead lock (e.g. the lock holder dies) or multiple readers (e.g. if the distributed lock is implemented with an election scheme and clients are partitioned).
[[First in first out | FIFO]] is usually implemented by only attempting the delivery of a single message, waiting for an ack before sending the next one. This means that even if there are multiple consumers, only one will be processing a message at a time. e.g. financial transactions, respecting rate limits, or generally any message who's domain design assumes they'd always be processed in order.
[[First in first out | FIFO]] ordering places more burden on the queuing system. It needs to be able to keep up with messages delivered to it but has the same internal constraints as parallel coordinating receivers. The queuing system can only have one agent handling messages at a time, while still providing availability and needing to act as a buffer between mismatches in message arrival and delivery rates.
Many systems support some form of tagging (e.g. [group headings](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/using-messagegroupid-property.html)) to only impose FIFO on some messages while the rest don't have any ordering guarantees. e.g. all messages under the heading or "payments" need to be FIFO and "orders" need to be FIFO, but not with respect to each other.
**Parallel ≠ Random **
Ordering is usually best effort. Order is generally maintained over large enough time windows.
A queue could deliver messages completely randomly but most often it's not. The system would have to go out of its way to re-order messages when it's really just trying to deliver the messages as it's received them. Out of order messages happen as a function of variability in message processing speeds along the chain of components from sender to receiver.
## Fan-out / Fan-in
Fan-out is when the same message (from a single producer) needs to be delivered to multiple receivers.
The inverse, fan-in is when messages from multiple different senders (published to multiple different queues) need to be read by a single (possibly parallel) consumer. e.g. a receiver that's archiving all messages delivered or sending notifications from different types of events (order confirmed, payment failed...)
#to-do figure out how to describe topics
Naive approaches to implementing topic -> queue fan-out can lead to costly write amplification.
## Poison Pills & Dead Letters
In synchronous API calls you can directly signal a caller that arguments are invalid. In an async message passing system, the sender has already moved on an the message queue needs to somehow deliver the message. If you don't ack the message, an at-least-once system will keep re-sending (thinking you didn't get it). You need a way to quarantine the bad messages.
Message we aren't setup to handle or didn't expect (since sender and receiver are decoupled, you may be getting messages we didn't intend to receive), or messages messages we expected but are invalid (a poison pill) can be sent to a dead letter queue. A separate consumer can handle the dead letter queue (e.g. by alerting).
You need to guard poison-pill -> dead-letter queue forwarding against [[Denial of Service Attacks | DoS attacks]].
## The Q-List
A survey of message queue systems.
### AWS SNS & AWS SQS
Each provides a primitive piece of the puzzle. SNS allows a client to publish to a topic. SQS provides queues that a producer can send to and a consumer can receive from. You can configure pub-sub fan out by configuring a message sent to an SNS topic (rather than directly to a specific SQS queue). SNS topics can be configured to be delivered to multiple SQS queues for consumption by each queue's clients.
SNS and SQS support Parallel and [[First in first out | FIFO]] ordering with different pricing and capacity guarantees for each. AWS has the concept of a *message group ID* to denote group headings for messages that need to be FIFO.
Hosted / fully managed and can scale up (indefinitely in the case of non-FIFO queues).
Receivers poll the queue to pull messages from it.
#to-do It seems that you can have a high-cardinality of message group IDs (e.g. aws docs suggest using a different id for each message you want to process in parallel but de-dupe on the receiving end).
### AWS EventBridge
Hides the nitty-gritty mechanisms of topics and queues and exposes a single event bus where all messages must match a specific structure. The standard format includes the message topic as part of the message. The bus can then route the message using multiple different configurable approaches – an SQS queue, webhooks, etc. The event bus then becomes configurable with routing rules and standard components (e.g. archiving, auditing, monitoring, alerting, replay, etc).
### Google Pub/Sub
Integrates queues and topics in a single service. Topics are what senders post to. Queues are called *subscriptions* and are created when receivers subscribe to a topic.
Receivers can poll its *subscription* or pub/sub can also be configured like a [[webhook]] and `POST` the message to a server endpoint. The response code controls the ack and back-pressure.
Hosted / fully managed.
Ordering is handled by an *ordering key* – messages with the same ordering key are processed sequentially.
In addition to matching the SQS+SNS features you can also look at historical records and seek (bulk ack replay messages) using timestamps.
## Resources
- NATS introduction and rationale by Derek Collison [SCaLE 13x Derek Collison NATS A new nervous system for distributed cloud platforms](https://www.youtube.com/watch?v=t_USxxOGzcw)