I’ve been excited about HashiCorp Nomad for years. Schedulers are crucial components in many large-scale software systems, and Nomad is powerful, it provides a straightforward API and simple abstractions (such as jobs), and its distribution as a precompiled binary makes deployment a breeze. It also has great documentation and a very nice UI. Building out Nomad support for Apache Heron was surprisingly simple and, I believe, a big win for the project.
Not only can Apache Heron now be run using an existing Nomad cluster—potentially in a large-scale production environment—but we also chose Nomad to be used as the scheduler for Apache Heron’s so-called standalone cluster mode due to Nomad’s operational simplicity (more on that below). Standalone cluster mode allows users to quickly and easily set up a lightweight distributed cluster to run Apache Heron processing topologies on their own machine, and thus to have a ready-made development and testing environment with very little effort.
Even better, when using Nomad with the raw fork/exec driver, the user doesn’t even need admin/root privileges to run the cluster. Integrating Heron with Nomad potentially lowers the barrier to entry for many users.
Figure 1 shows how Heron uses Nomad to manage topologies:
When you submit a new topology to Heron using the Heron CLI, it interacts with the Heron API server, which creates a logical and physical plan for the topology and uploads any topology artifacts to a storage system (this aspect of Heron is also highly configurable/extensible). The API server then provides Nomad with a logical and physical plan for the topology. Nomad then uses uploaded topology artifacts to schedule actual topology spout and bolt instances (containers) as Nomad jobs.
Figure 2 below shows what a physical plan for a topology might look like, with the topology’s spout and bolt instances spread across containers.
The physical plan for this topology consists of five containers, each containing a number of instances of various components scheduled in a round robin fashion. Heron uses the Nomad scheduler to translate each container into a Nomad job specification and then submit those jobs to a Nomad cluster.
Containers are submitted as separate Nomad jobs (instead of as a single Nomad job) for two reasons:
Submitting each container as a separate Nomad job gives Apache Heron the flexibility of scheduling.
You can use the Heron CLI to submit topologies to a Nomad cluster just as you would to any other cluster management tool or scheduler. Here’s an example command for submitting a Java topology to a Nomad cluster:
$ heron submit nomad \ ~/.heron/examples/heron-api-examples.jar \ com.twitter.heron.examples.api.ExclamationTopology \ Test1
$ ~/.heron/bin/heron-nomad status
Here’s some example output for the topology running in the Nomad cluster:
ID Type Priority Status Submit Date test117cb18c9-192d-4271-8d42-4a5e62a93c89-0 service 50 running 02/26/18 19:26:06 UTC test117cb18c9-192d-4271-8d42-4a5e62a93c89-1 service 50 running 02/26/18 19:26:06 UTC test117cb18c9-192d-4271-8d42-4a5e62a93c89-2 service 50 running 02/26/18 19:26:06 UTC test117cb18c9-192d-4271-8d42-4a5e62a93c89-3 service 50 running 02/26/18 19:26:06 UTC test117cb18c9-192d-4271-8d42-4a5e62a93c89-4 service 50 running 02/26/18 19:26:06 UTC
This output shows a single running topology consisting of 5 total containers. You can also check on the status of the topology using the Heron UI. In the UI you can see that the topology is running and also examine relevant stats for the topology.
You can also kill a Heron topology running in a Nomad cluster using the Heron CLI:
$ heron kill nomad test1
In this section I’ve provided an overview of Apache Heron’s integration with Nomad. For specific details on how to deploy to an existing Nomad cluster, please see the official Heron documentation.
As I mentioned above, we’ve introduced a new runtime mode for Apache Heron called standalone cluster mode, which we added to enable users to deploy a Heron cluster using just a few simple commands. Under the hood, a Heron standalone cluster is a Nomad cluster. Theoretically, we could’ve chosen any number of schedulers for standalone cluster mode, but we chose Nomad because it’s so lightweight and easy to use and also because users can interact with their standalone cluster using Nomad’s very good tools, such as the Nomad Web UI, to interact with their Heron cluster.
Starting up a Heron standalone cluster running on Nomad is very simple. First, set the number nodes you’d like in the cluster (you’ll be prompted to provide host information about the nodes in your cluster):
$ heron-admin standalone set
With the configuration in place, you can start the cluster:
$ heron-admin standalone cluster start
$ heron submit standalone \ ~/.heron/examples/heron-api-examples.jar \ com.twitter.heron.examples.api.WordCountTopology \ test1
You can tear down a running cluster like this;
$ heron-admin standalone cluster stop
For details on how to deploy a Heron standalone cluster, see the official Heron documentation.
Now that you’ve seen how they work together, you can try out both systems for yourself. You can get started with Heron at https://apache.github.io/incubator-heron/docs/getting-started/ and Nomad at https://www.hashicorp.com/products/nomad.
If you have questions or problems, feel free to ask questions on the official Apache Heron mailing list. Please note, though, that we’ll soon be transitioning this mailing list to Apache: email@example.com