The Heron Stream Processing Engine on Google Kubernetes Engine

How to deploy a Heron toplogy on a GKE cluster powered by Kubernetes

Chris Kellogg

November 28, 2017

Heron is a real-time, distributed, fault-tolerant stream processing engine developed at Twitter and used by major tech companies such as Twitter, Microsoft, and Google. One of the great features of Heron is that it was designed to be extensible, which you can read more about in this blog post. This extensibility allows Heron to run in many different environments. This blog post will walk you through the steps of deploying a Heron topology on Google Kubernetes Engine or GKE for short.

Video walkthrough

If you’re more of a visual learner, check out the YouTube video below. Otherwise, keep reading for text instructions.

Setting up the Google Cloud Platform Project

In order to get started with GKE, we will first need to set up a project. Let’s head over to the Google Cloud Console and create a new project. Here’s the link:

https://console.cloud.google.com/projectselector/kubernetes

After logging in we can press the Create button to set up a new project.

Next, we need to enter a name for our new project. I’m going to call it heron-kubernetes-project. After entering a name for the project we can hit the Create button. It will take a few minutes for the project to get set up and initialized.

Install the gcloud tool

A common way to interact with Google Cloud Platform is to use the gcloud command line tool. We will install this tool here.

https://cloud.google.com/sdk/downloads#interactive

Here are the steps to install the gcloud tool:

  1. Install the client

    $ curl https://sdk.cloud.google.com | bash
    
  2. Restart the shell

    $ exec -l $SHELL
    
  3. Initialize the client

    $ gcloud init
    

Install kubectl

Now that we have gcloud installed we need to install kubectl. This is a command line interface for running commands against Kubernetes clusters. The gcloud tool provides an easy way to install kubectl.

$ gcloud components install kubectl

Configure gcloud defaults

Let’s configure some defaults for the glcoud tool. Let’s list all current GCP projects:

$ gcloud projects list
PROJECT_ID             NAME                      PROJECT_NUMBER
astral-chassis-184421  heron-kubernetes-cluster  564237646564

The name for the project that you create will differ from this, of course. Set the default project to that project name:

$ gcloud config set project astral-chassis-184421
# Remember to use your project name, not this one

Create a GKE cluster

We are now ready to launch a GKE cluster. We’ll create a three-node cluster by running the following command. This will take about 5-10 minutes to complete.

$ gcloud container clusters create heron-cluster \
  --machine-type=n1-standard-4 \
  --num-nodes=3 \
  --local-ssd-count=2

Next, we need to set up our kubectl credentials so that we can access the cluster we just created and run commands against it. The gcloud tool provides an easy way to do this with the following steps.

  1. List clusters

    $ gcloud container clusters list
    
    NAME           ZONE           MASTER_VERSION  MASTER_IP      MACHINE_TYPE   NODE_VERSION  NUM_NODES  STATUS
    heron-cluster  us-central1-c  1.7.8-gke.0     35.184.35.179  n1-standard-4  1.7.8 *       3          RUNNING
    
  2. Get credentials for the cluster

    $ gcloud container clusters get-credentials heron-cluster
    
  3. Verify that the credentials were properly set

    $ kubectl get pods --all-namespaces
    
    NAMESPACE     NAME                                                      READY     STATUS    RESTARTS   AGE
    kube-system   event-exporter-v0.1.7-958884745-45f95                     2/2       Running   0          5m
    kube-system   fluentd-gcp-v2.0.9-7z3x0                                  2/2       Running   0          5m
    kube-system   fluentd-gcp-v2.0.9-tzfdz                                  2/2       Running   0          5m
    kube-system   fluentd-gcp-v2.0.9-vcgnx                                  2/2       Running   0          5m
    kube-system   heapster-v1.4.3-865849417-x5n1f                           3/3       Running   0          5m
    kube-system   kube-dns-3468831164-9mq8r                                 3/3       Running   0          5m
    kube-system   kube-dns-3468831164-xkg6z                                 3/3       Running   0          5m
    kube-system   kube-dns-autoscaler-244676396-6460d                       1/1       Running   0          5m
    kube-system   kube-proxy-gke-heron-cluster-default-pool-3323a650-ncj0   1/1       Running   0          5m
    kube-system   kube-proxy-gke-heron-cluster-default-pool-3323a650-sq8r   1/1       Running   0          5m
    kube-system   kube-proxy-gke-heron-cluster-default-pool-3323a650-vz5l   1/1       Running   0          5m
    kube-system   kubernetes-dashboard-1265873680-4dw1g                     1/1       Running   0          5m
    kube-system   l7-default-backend-3623108927-bsv88                       1/1       Running   0          5m
    
  4. Check out the Kubernetes dashboard. First, open up a proxy to the cluster:

    $ kubectl proxy -p 8001
    

    Now you can visit http://localhost:8001/ui in your browser. It should look something like this:

Heron on GKE

Now that we have a GKE cluster running we’re ready to install the Heron components on the GKE cluster so that we can deploy a Heron topology.

Install Heron Components

We will install 4 components:

Component Purpose
Apache ZooKeeper ZooKeeper will act as a state manager and provide coordination for running topologies
Apache BookKeeper BookKeeper is a scalable, fault-tolerant, and low-latency storage service that we will use as a repository to store code artifacts our Heron jobs. You can read more about it in this post on the Streamlio blog.
Heron tools Topology metrics and the Heron UI dashboard
Heron API Server This will allow us to submit jobs to the GKE cluster
  1. Install Zookeeper

    $ kubectl create -f https://raw.githubusercontent.com/twitter/heron/master/deploy/kubernetes/gcp/zookeeper.yaml
    

    You can check on the status of the ZooKeeper pods at any time:

    $ kubectl get pods
    NAME                                  READY     STATUS    RESTARTS   AGE
    zk-0                                  1/1       Running   0          1m
    

    Wait until ZooKeeper is running before proceeding to the next steps.

  2. Install BookKeeper

    $ kubectl create -f https://raw.githubusercontent.com/twitter/heron/master/deploy/kubernetes/gcp/bookkeeper.yaml
    
  3. Install Heron tools

    $ kubectl create -f https://raw.githubusercontent.com/twitter/heron/master/deploy/kubernetes/gcp/tools.yaml
    
  4. Install the Heron API Server

    $ kubectl create -f https://raw.githubusercontent.com/twitter/heron/master/deploy/kubernetes/gcp/bookkeeper-apiserver.yaml
    

Now, let’s verify that everything is installed and running properly:

$ kubectl get pods

NAME                                  READY     STATUS    RESTARTS   AGE
bookie-autorecovery-818030636-0qtbl   1/1       Running   0          4m
bookie-autorecovery-818030636-rgn07   1/1       Running   0          4m
bookie-r2g4r                          1/1       Running   0          4m
bookie-wdq6k                          1/1       Running   0          4m
bookie-xqqhc                          1/1       Running   0          4m
heron-apiserver-2992471003-x89l1      2/2       Running   0          1m
heron-tracker-1063616089-g2b9z        2/2       Running   0          1m
zk-0                                  1/1       Running   0          5m

Let’s also make sure that we’re are running the kubectl proxy before attempting to run our Heron topology. See the command below.

$ curl http://localhost:8001/api/v1/proxy/namespaces/default/services/heron-apiserver:9000/api/v1/version
{
   "heron.build.git.revision" : "bf9fe93f76b895825d8852e010dffd5342e1f860",
   "heron.build.git.status" : "Clean",
   "heron.build.host" : "ci-server-01",
   "heron.build.time" : "Sun Oct  1 20:42:18 UTC 2017",
   "heron.build.timestamp" : "1506890538000",
   "heron.build.user" : "release-agent1",
   "heron.build.version" : "0.17.0"
}

You can visit the Heron UI by visiting http://localhost:8001/api/v1/proxy/namespaces/default/services/heron-ui:8889 in your browser:

Install Heron Client

The GKE cluster is now ready to run a Heron topology. In order to submit it to the cluster, we need to se tup our local environment by installing the Heron Client.

Download the client install script, make it executable, and then run the script:

$ wget https://github.com/twitter/heron/releases/download/0.17.1/heron-client-install-0.17.1-darwin.sh
$ chmod +x heron-*.sh
$ ./heron-client-install-0.17.1-darwin.sh --user

Now add Heron to your path:

$ vim ~/.bash_profile

Add the following line to your Bash profile file:

# Setting for heron
export PATH=$PATH:$HOME/bin

Restart the shell:

$ exec -l $SHELL

Now verify that the Heron client is working:

$ heron

You should see output like this:

usage: heron [-h] <command> <options> ...

optional arguments:
  -h, --help           show this help message and exit

Available commands:
    activate           Activate a topology
    deactivate         Deactivate a topology
    help               Prints help for commands
    kill               Kill a topology
    restart            Restart a topology
    submit             Submit a topology
    update             Update a topology
    version            Print version of heron-cli
    config             Config properties for a cluster

Getting more help:
  heron help <command> Prints help and options for <command>

For detailed documentation, go to http://heronstreaming.io

Submit the topology

Now it’s time to submit our first topology to our GKE cluster. The following command will launch one of prepacked example topologies.

Before running this command, make sure that the kubectl proxy is running.

$ heron submit kubernetes \
  --service-url=http://localhost:8001/api/v1/proxy/namespaces/default/services/heron-apiserver:9000 \
  ~/.heron/examples/heron-api-examples.jar \
  com.twitter.heron.examples.api.AckingTopology acking

You should see log output like this if the topology submission is successful:

[2017-11-25 16:31:09 -0800] [INFO]: Successfully launched topology 'acking'

We have now successfully submitted a topology and can head over to the Heron UI to check out the topology metrics. Here’s the URL:

http://localhost:8001/api/v1/proxy/namespaces/default/services/heron-ui:8889

And that’s it! We have successfully deployed a Heron topology on a GKE cluster.