Multi-tenant messaging with Apache Pulsar

September 26, 2017

Matteo Merli, Sijie Guo

In previous blog posts, we have described several reasons why Apache Pulsar is an enterprise-grade messaging solution that you should consider for your real-time businesses. In the following series of blog posts, we’ll take a deep dive into enterprise-grade features like storage durability to prevent data loss, multi-tenancy, geo-replication, and encryption and security.

In this post we’ll focus on multi-tenant messaging in Apache Pulsar. Multi-tenancy is the ability of a single instance of software to serve multiple tenants. A tenant is a group of users that have the same “view” of the system. Apache Pulsar, as an enterprise message hub, has supported multi-tenancy from day one because it needed to serve the demanding requirements of Yahoo, and no existing open source system available at the time was designed for multi-tenancy, including the common log abstraction system Apache Kafka. Creating multiple instances of Pulsar for various users or functions was simply not acceptable, as that makes it harder to share real-time data across departments and creates silos.

As an enterprise-grade messaging system, multi-tenancy in Pulsar was designed to meet the following requirements:

  • Ensure that strict SLAs are met
  • Guarantee isolation between tenants
  • Enforce resource utilization quotas
  • Provide per-tenant and system-wide security
  • Ensure low-cost operations and simpler manageability

Apache Pulsar meets the above requirements by:

  • Establishing security with authentication, authorizations, and ACLs (Access Control Lists) for each tenant.
  • Enforcing storage quotas for each tenant.
  • Defining all isolation mechanisms as policies that can be changed at runtime, enabling low-cost operations and simple manageability.

Pulsar in 30 seconds

To help understand how Pulsar achieves multi-tenancy, let’s take a quick look at Pulsar’s messaging model.

As in many other pub-sub systems, an application that feeds data into Pulsar is called a producer, while an application that consumes data from Pulsar is a consumer. Consumer applications are also sometimes referred to as subscribers. Following the general pub-sub pattern, the topic is the core message construct in Pulsar. Loosely speaking, a topic represents a channel into which producers append data and from which consumers pull data. A group of consumers forms a subscription on a topic. Different groups of consumers can choose their own preferred way of consuming messages from the same topic: exclusive, shared, or failover. The different subscription modes are shown in Figure 1.

Figure 1. Pulsar's subscription modes: exclusive, shared, and failover.
Figure 1. Pulsar’s subscription modes: exclusive, shared, and failover.

Pulsar was built from the ground up to support multi-tenancy. So topics are organized under two multi-tenancy-specific resources: properties and namespaces. A property represents a tenant in the system. A tenant can provision multiple namespaces within its property. And each namespace can then contain any number of topics. The namespace is the basic administrative unit for a tenant in Pulsar, for which you can set ACLs, fine-tune replication settings, manage geo-replication of message data across clusters, control message expiry, and perform critical operations.

Figure 2. Three separate tenants in one Pulsar deployment.
Figure 2. Three separate tenants in one Pulsar deployment.

For a more complete introduction to Pulsar you can read the Introduction to Pulsar) post. Next, we’ll take a look at the mechanisms that Pulsar uses to achieve multi-tenancy.

Security

The first step to achieving multi-tenancy is to ensure that a given tenant (a) only be allowed to access the topics for which it has permissions and (b) isn’t allowed to access the topics that it should not see or access. This is achieved via a pluggable authentication and authorization mechanism.

In Pulsar, when a client connects to a message broker, the broker uses an authentication plugin to establish the identity of this client and then (potentially) assigns that client a role token. This role token is a string, such as admin or application-1, that can represent a single client or multiple clients. Role tokens are used to control permissions for clients to produce or consume on certain topics and to administer the configuration for tenant properties.

Out of the box, Pulsar supports two authentication providers: TLS client auth and Athenz, an authentication system created by Yahoo. You can also implement your own authentication provider. For more details, check out Pulsar’s documentation.

After a client’s role token is identified by the authentication provider, Pulsar brokers use an authorization provider to determine what clients are authorized to do. Authorization is managed at the property level, which means that you can have multiple authorization schemes active in a single Pulsar cluster. You could, for example, create a shopping property that has one set of roles and applies to a shopping application used by your company, while an inventory property would be used only by an inventory application. And the permissions are managed at the namespace level, that is, within properties. You can grant permissions to specific roles for lists of operations such as produce and consume, to a namespace. For details on how to configure authorization in property level and grant permissions for namespaces, please check out Pulsar’s documentation here.

Ultimately, authentication and authorization isolate tenants from accessing topics and performing actions that they don’t have permissions for. Next, let’s take a look at how Pulsar applies resource isolation on tenants to meet tenant SLAs.

Isolation

Besides isolation for the sake of security, multi-tenant applications are expected to meet SLAs by Pulsar providing isolation of robustness and performance. This is achieved by soft isolation such as disk quota, flow control, and throttling, and hard isolation, which physically isolates some tenants to a subset of brokers (for serving) and BookKeeper bookies (for storage).

Before describing the mechanisms used for achieving isolation, let’s examine what an Apache Pulsar cluster installation looks like. Figure 3 below illustrates a typical installation. A Pulsar cluster is composed of a set of brokers (for serving pub-sub traffic), bookies (for message storage), and an Apache ZooKeeper ensemble for coordination and configuration management. The Pulsar broker is the component that receives and delivers messages. The bookies are Apache BookKeeper servers that provide durable storage for messages until they are consumed.

Figure 3. A typical installation of Apache Pulsar.
Figure 3. A typical installation of Apache Pulsar.

Soft Isolation

Brokers and bookies are the physical resources that are typically shared by producers and consumers. Pulsar provides various mechanisms on both the broker and bookie side to protect tenants and enable them to meet their SLAs.

Storage

Apache Pulsar uses Apache BookKeeper as the durable storage system for messages. Each bookie in Apache BookKeeper can efficiently serve hundreds of thousand ledgers (each ledger is a segment of a topic). BookKeeper can achieve this because it’s designed for I/O isolation. On an individual bookie, there is a single journal (on its own dedicated disk device) for aggregating all the writes appended to it. Messages are then periodically flushed in the background and stored in separated storage disk devices. Such an I/O architecture provides isolation between writes and reads. This means that a tenant can read as fast as possible, maxing out the I/O on the storage devices, while the write throughput and latency remain unaffected.

Besides I/O isolation, different tenants can configure different storage quotas for different namespaces. Pulsar also provides mechanisms for tenants to take specified actions when the quotas are filled up, such as blocking message producing, throwing an exception, or dropping older messages.

Pulsar brokers

In addition to the bookie level, Pulsar also provides multiple mechanisms at the broker level to meet SLAs. First, everything in the Pulsar broker happens asynchronously. The amount of memory that is used also capped per broker. Whenever a broker is overloaded on CPU or memory usage, traffic can be quickly shifted (manually or without intervention) to less loaded brokers. A load manager component in each Pulsar broker is dedicated to that.

One thing to call out here is that Pulsar can quickly shift traffic between brokers to meet SLAs because it separates the serving layer from the storage layer. This makes brokers almost completely stateless. Unlike in other messaging systems, where a partition of messages is only stored on a subset of brokers, Pulsar brokers don’t store any data locally. The cost of moving a topic from one broker to another broker is minimal. This results in fast traffic rebalancing and the ability to quickly protect tenants.

Second, a flow control protocol is deployed on both the message production and message consumption side. On the producer side, tenants are able to configure limits on in-flight messages for both brokers and bookies. This will slow down users trying to publish faster than the system can absorb. On the consumer side, tenants are able to configure limits on the outstanding messages that brokers can deliver to consumers.

Finally, on the consumer side, Pulsar also throttles delivering messages to consumers at a specified rate. This prevents consumers from consuming message faster than the system can absorb.

All of these software mechanisms ensure that producers and consumers can meet their SLAs.

Hard isolation

The mechanisms described above are used to ensure that Pulsar can meet tenants’ SLAs when they are sharing resources (brokers and bookies). In some circumstances, however, applications want physical resource isolation too. Pulsar achieves this by providing an option to isolate certain tenants or namespaces to a particular set of brokers. This ensures that these tenants or namespaces can fully use the resources on that particular set of brokers.

This option can also be used for experimenting with different configurations, debugging and quickly reacting to any unexpected situations that happen in production. For example, a particular user might be triggering a bad behavior in the broker that can impact performance for other tenants. In such cases, this particular tenant can then be physically isolated to a subset of brokers that will not serve the traffic from any other tenants until a proper fix that correctly handles the condition can be deployed.

Beside physically isolating traffic on brokers, you can also isolate the traffic on the bookies that are used for storing messages. This can be configured by specifying a certain placement policy on namespaces.

The mechanisms used by Pulsar here can be treated as a lightweight version of having multiple clusters for different tenants, but you actually don’t have to set them up separately. This achieves the exact same physical isolation with one single cluster and simpler operationality.

Conclusion

Apache Pulsar is a true multi-tenant messaging system that provides multi-level isolation between resources. In this blog post, we have examined the mechanisms that Pulsar uses to achieve multi-tenancy, including authentication and authorization for achieving security isolation, flow control, throttling and storage quotas for achieving isolation when sharing physical resources, and placement policies for achieving physical resource isolation. We hope that this helps you fully understand Apache Pulsar and its multi-tenancy enterprise-grade feature. In the next blog post, we’ll examine another enterprise-grade feature in Apache Pulsar: geo-replication.

If you’re interested in Pulsar, you may want to participant in the Pulsar community via:

For more information about the Apache Pulsar project in general, please visit the official website at http://pulsar.apache.org/ and follow the project on Twitter @apache_pulsar.