The extensibility of OpenMessaging benchmarks

March 16, 2018

Matteo Merli, Luc Perkins

Let’s be honest: performance benchmarks in the software industry typically range somewhere between unhelpful and downright misleading. Far too often, we’ve seen companies rushing to sell tools and platforms, including pub-sub messaging systems, by cherry-picking workloads to showcase a specific system’s strengths or focusing on weird edge cases rather than realistic scenarios. We’ve seen this happen in benchmarks in a variety of domains, from databases to cloud services to hardware. We think that the industry deserves better, especially in the increasingly crucial messaging space. And so we’re excited that the OpenMessaging Project, of which Streamlio is a founding member, is stepping up to forge a bold new path that has the potential to change how we approach benchmarks.

Enter OpenMessaging benchmarks

The OpenMessaging Project is an effort to develop an open standard for messaging systems. Streamlio is a founding member of the project, alongside Yahoo!, Alibaba, DiDi, and others. The OpenMessaging benchmarks project was created as a sub-project under the auspices of the broader OpenMessaging Project. The benchmarks project operates in the same spirit as the umbrella project, and we’ve been active from the very beginning.

All the code for the OpenMessaging benchmarks project is available on GitHub, in the openmessaging/openmessaging-benchmark repo.

We’ve been active in the creation of the OpenMessaging benchmarks project. We talked a bit about that in a previous post.

We’re extremely excited about the OpenMessaging benchmarks project because it’s simply a better approach to benchmarks in just about every way we can think of.

Core goals

The OpenMessaging benchmarks project is rooted in a small set of goals, all of which seek to :

Goal Description
Realism Benchmark workloads should mirror real production workloads along all the important axes, including message size, message production rate from clients, and number of topic partitions.
Extensibility Contributors to the benchmarking project should be able to easily create new workloads as well as add support for new messaging systems and new cloud platforms.
Flexibility The benchmarks should cover workloads that vary along a variety of axes and should include both streaming/real-time and queuing use cases
Openness It’s called OpenMessaging for a reason. All relevant benchmarking code should be open source, anyone should be able to scrutinize what’s there (and submit pull requests!), and membership in the OpenMessaging Project is open to anyone, including companies who compete with one another directly.
Standardization Benchmarked workloads should be directly analogous across messaging platforms. That means no comparing apples and oranges, thus no tailoring workloads to the strengths and weaknesses of specific platforms.
Self serve Anyone should be able to run the benchmarks themselves in a real cloud-based environment, not just on a laptop (consistent with the realism goal). Benchmarks shouldn’t be proprietary or even “independent.” What you do with the results—publish them in blog posts or whitepapers, parse them using an analytics stack—is up to you. Instructions for running benchmarks are currently available for Apache Kafka and Apache Pulsar, with others on the way soon.

In this post, we’ll focus mostly on realism and extensibility and show you how to create your own benchmarking workloads.

The importance of realistic workloads

The current OpenMessaging benchmark workloads were designed with realism in mind. Workload configurations can vary along a number of crucial axes:

Axis Why it’s important
Number of topics Production-ready messaging systems need to support many topics, potentially thousands or even millions
Partitions per topic Topic partitioning is essential to speedy performance in messaging systems
Message size Some use cases demand messages with tiny payloads while others demand payloads encompassing huge JSON objects, large files, etc.
Message production rate Pub-sub systems need to stand up to heavy load on the “publish” side and demonstrate that they can handle intense bursts of activity from producers
Longevity Messaging systems need to stand up to load for long periods of time. Very few use cases will be well served by a system that’s blazingly fast for 5 minutes but then slows down markedly after that, for example due to memory issues.

The goal of realism goes hand in hand with the goal of extensibility. In order to run workloads that are realistic for you and your use cases you need to be able to quickly create new ones and even, potentially, contribute them to the project. We’ll show you how to do precisely that in the next section.

Creating your own workloads

Workloads are really the heart of all benchmarks, regardless of which type of system you’re dealing with. In the OpenMessaging benchmarks project, creating new workloads actually doesn’t require you to write code at all. Instead, all you need to do is create a configuration file using YAML and specify some values and you have a new workload. The current workload configuration are here

When you create a new workload you can run it on your own or, even better, submit a pull request to the repo and have it added permanently.

First, let’s have a look at an existing workload:

name: 1 topic / 1 partition / 1Kb
topics: 1
partitionsPerTopic: 1
messageSize: 1024
payloadFile: "payload/payload-1Kb.data"
subscriptionsPerTopic: 1
producersPerTopic: 1
producerRate: 50000
consumerBacklogSizeGB: 0
testDurationMinutes: 15

As you can see from the configuration, this workload involves:

  • one topic
  • one partition, one producer, and one subscription on that single topic
  • messages that are 1 kilobyte each (you can see the message payload here)
  • clients producing 50,000 messages per second for 15 minutes
  • no message backlog

But let’s say that you wanted to change it up a bit and create a workload that involves:

  • three topics
  • five partitions per topic
  • 5kb per message
  • clients producing 500 messages per second for 30 minutes

You could specify that workload using this YAML file (called 3-topics-500-rate-5kb.yaml) and add it to the workloads folder in the benchmarks repo:

name: 3 topics / 5 partitions/topic / 5kb / 500 msg/sec
topics: 3
partitionsPerTopic: 5
messageSize: 5120
payloadFile: "payload/payload-5kb.data" # We would need to create this file
subscriptionsPerTopic: 1
producersPerTopic: 1
producerRate: 500
testDurationMinutes: 30

With your new workload configuration in place, you could deploy benchmarking infrastructure on Amazon Web Services (see instructions for Apache Kafka and Apache Pulsar), and then run your workload on Pulsar like this:

$ bin/benchmark \
  --drivers driver-pulsar/pulsar.yaml \
  workloads/3-topics-500-rate-5kb.yaml

If you’d set up Kafka infrastructure, you could run the workload like this:

$ bin/benchmark \
  --drivers driver-kafka/kafka.yaml \
  workloads/3-topics-500-rate-5kb.yaml

The results for your new workload would then be written to a file with a name like 3-topics-500-rate-5kb-Pulsar-2018-03-14-00-59-14.json. And that’s it! You just created, ran, and obtained results for a custom messaging workload without touching a single line of code. Even better, that workload could be run on any supported messaging platform.

A fresh start for messaging benchmarks

We’re confident that the OpenMessaging benchmarks project will emerge as the standard for the industry and eventually cover all major messaging platforms. Kafka and Pulsar are a good start (with RabbitMQ support coming soon), and we invite other communities to add support for other systems.

The new workload exercise we went through above shows just how easy it is to create new workloads. We invite you to experiment with your own workloads. If you have any interesting results to share, please let us know at info@streaml.io!