Experiences porting Heron to Kubernetes

September 18, 2017

John Crawford from ndustrial.io

As users of Storm for the past three years, with multiple topologies and projects in production, we became pretty decent operators and maintainers of our Storm topologies and came to feel that we could contend with most problem situations that came our way. With the rise of microservices-based architectures, however, deploying applications, APIs, and worker services in containers became a very attractive option for our use cases, and the efficiency gains associated with containerization made using them an obvious choice.

But there was always a gaping hole in our plan: how in the world do we get Storm to run in containers properly? There have been some attempts to get Storm to run on various container orchestration platforms, but the architecture of Storm just wasn’t a good fit and there didn’t seem to be much interest in the community, from what we saw, in making it a good fit. It seemed like if you did get Storm to run on Kubernetes or DC/OS, it was like you were forcing a cube into a round hole and you knew you couldn’t trust it in a production environment.

Enter Heron

The announcement of Heron being released to the open-source community was a game changer for us. Once the project was adapted in the following months to work within Docker containers, it was then very easy to see the potential for deploying it on many container orchestration environments. Since Kubernetes has so much community support and momentum behind it and many companies are already running this platform in production, it was important to us that Heron, which has been trusted in production environments on top of Mesos at Twitter for over 3 years, be adapted to work on Kubernetes.

In my experiences with the framework, Heron is modular in all the important places ,  one of which is in the scheduler. With built-in Docker support, which actually runs the Heron Instance, all the necessary steps for getting Heron up and running on a container orchestration platform of your choice are up to the Heron scheduler. The scheduler is responsible for coming up with the physical plan of how many containers it’s going to use and where it’s going to distribute the pieces of the topology across those containers, as well as actually deploying those containers to the orchestration platform.

I know what you’re thinking :  how am I going to figure out where to place those containers? The 2 difficult pieces here are:

  1. How should individual components be placed on the Heron instances (containers)?
  2. Where and how should these containers be deployed on my cluster?

The really cool thing here is the fact that I don’t have to care about my first problem—unless I really, really want to—because that piece is modular as well. It’s called a Packing Plan in Heron, and you can write your own if you want, but the user can specify separately which plan they’d like to use.

The second difficult thing here is covered by the orchestration engine, Kubernetes (or whichever other platform I’d like to use). You leave the heavy lifting to Heron, which interacts with Kubernetes to decide where it should place the containers.

Figure 1. High-level Kubernetes submission diagram
Figure 1. High-level Kubernetes submission diagram

As with most orchestration platforms, the tricky part is always with the networking, but Kubernetes makes that pretty easy. Since Heron instances need to talk to each other and expose their own ports for different services, setting each Pod’s internal IP for the Docker container was very important. Once the Pods are able to communicate with each other, the internal mechanisms of Heron take over and start all the coordination required to actually start the topology.

It’s important to note here that based on the packing plan and the configuration set by the user, we are still setting limits on the amount of RAM and CPU consumption each container can use, which is really nice. You can individually set more or less RAM or CPUs allocated to individual bolts and spouts within each topology, which is a very useful tool introduced as a part of Heron and coincides with controlling resource consumption on orchestration platforms.

Less use of ZooKeeper (but still a dependency)

The best improvement from a reliability perspective (based on our experience) when switching from Storm to deployments on Heron was the low usage of Zookeeper within the cluster. Zookeeper is only used for discovery of other instances and for storing a little bit of necessary configuration. There’s simply no need for Heron to constantly talk to Zookeeper, which was one of the main issues we always encountered with Storm and caused a few grey hairs to form over time. It is still a dependency, however, so you will need to be able to talk to a Zookeeper cluster either within Kubernetes or another place that is accessible from all pods within the cluster.

Executing Heron Commands

When implementing a new scheduler with Heron, as I covered a little bit above, it really comes down to first figuring out how you’ll deploy the containers and then implementing the different commands (submit, kill, activate, deactivate, and update). The deactivate and activate commands basically pause and unpause the topology.

The more interesting command is the update command which allows the user to scale a certain component within the topology (without having to build a new JAR to submit to the cluster). You would use this in cases where you’re trying to scale up a certain component to handle a higher load and then back down once the load levels back down. You can imagine some pretty cool use cases here when you can integrate with an external system to essentially tune the topology based on some external monitoring.

When using Heron in Kubernetes, this basically means that we’re adding and removing Pods on-the-fly when we execute this command, based on the output of the Packing Plan, which will tell the scheduler decide whether or not to add or remove any containers. This is all exposed via an interface so once each scheduler implements this interface, it’s enormously powerful and simple to use since little thought really needs to be put into understanding the inner workings of Heron. This is a big deal for many reasons, but most importantly, it masks many of the complexities of a complicated system like Heron, but also gives you just enough flexibility in the right place to just get the job done.

While I’ve outlined the conveniences of deploying Heron on Kubernetes, you’ll probably want to get your hands dirty with deploying Heron to an existing (or new) Kubernetes cluster. We will go into a detailed tutorial in a follow-on blog post.

About the guest author

John Crawford is the Co-Founder and CTO at ndustrial.io. ndustrial.io is focused on eliminating waste in manufacturing and industrial operations, whether it be waste of raw resources and energy, waste of products, wasteful processes, waste of software code and waste of time, by harnessing the power of real-time data.