To help you try out the Streamlio platform, we’ve created a sandbox deployment that you can install and run on your laptop.
What the sandbox contains
The Streamlio Sandbox packages next generation streaming data processing technology into a single docker image:
- Apache Pulsar provides durable messaging and stream-native processing
- Apache BookKeeper provides zero-data-loss stream storage
The Docker image also contains an example word count Pulsar Function. That function does the following:
- Consumes randomly chosen sentences published to a Pulsar topic by a Pulsar producer
- Splits incoming sentences into individual words
- Counts each word into an aggregated time interval
- Periodically publishes those counts to a Pulsar topic that is then read by a Pulsar consumer
Setup
Your initial setup steps will depend on how you choose to run the sandbox. Regardless of your method of running the sandbox, you’ll need to have Python 2.7+ installed on your system, as well as the lastest python pulsar-client
library. You can install it using pip:
$ pip install pulsar-client --upgrade
Running the sandbox
There are three ways that you can run the sandbox:
- You can run the image using Docker
- You can run the image on a Kubernetes cluster
Run the sandbox image using Docker
To run the Streamlio sandbox using Docker, you’ll need to install Docker for your platform:
The Docker image for the Streamlio sandbox is available via Docker Hub. You can run it using this command:
$ docker run -d \
--name streamlio-sandbox \
-p 6650:6650 \
-p 8080:8080 \
-p 8000:8000 \
streamlio/sandbox
You can check to make sure the image is running using docker ps
, which should output something like this:
CONTAINER ID IMAGE ...
c90100be5ea8 streamlio/sandbox ...
Shut down and remove the image
Once you’re finished experimenting with the Streamlio sandbox, you can kill the running container:
$ docker kill streamlio-sandbox
You can also remove the container at any time:
$ docker rm streamlio-sandbox
Run the sandbox on Kubernetes
You can run the Streamlio sandbox on a running Kubernetes cluster using just a few kubectl
commands. First, apply the YAML configuration:
$ kubectl apply -f \
https://raw.githubusercontent.com/streamlio/sandbox/master/kubernetes/streamlio-sandbox.yaml
The streamlio/sandbox
Docker image is fairly large, so it may take a minute or more to pull the image and start it up. You can watch the progress of the installation :
$ kubectl get pods -w -l app=streamlio-sandbox
Once the STATUS
changes to RUNNING
, you can connect to the running pod using kubectl
’s port-forward
command:
$ kubectl port-forward \
$(kubectl get pods \
-l app=streamlio-sandbox \
-o=jsonpath='{.items[0].metadata.name}') \
6650:6650 \
8080:8080 \
8000:8000
This will open all the ports necessary for running the example. You can now proceed with the rest of the example.
When you’re finished, you can remove the sandbox from your cluster:
$ kubectl delete -f \
https://raw.githubusercontent.com/streamlio/sandbox/master/kubernetes/streamlio-sandbox.yaml
Run the producer and consumer scripts
There are two Python scripts in the sandbox that act as a Pulsar producer and consumer, respectively. You can fetch them like this:
$ wget https://raw.githubusercontent.com/streamlio/sandbox/master/producer.py
$ wget https://raw.githubusercontent.com/streamlio/sandbox/master/consumer.py
If the Docker image is currently running, start up the consumer (just make sure to wait a few seconds after you’ve started up the Docker image):
$ python consumer.py
If you get an error along the lines of
Exception: Pulsar error: ConnectError
, try waiting a few seconds and retrying. If that doesn’t work, rundocker ps
to check on the status of the running image.
Initially, no messages will be published to the topic that the consumer is listening on. This will change when you start up the producer:
$ python producer.py
Once you start up the producer, you should begin to see messages like this via the consumer:
Received message: {"a": 273,"ago": 273,"am": 273,"an": 273,"and": 547,"apple": 273,"at": 273,"away": 273,"cow": 274,"day": 273,"doctor": 273,"dwarfs": 274,"four": 273,"i": 273,"jumped": 274,"keeps": 273,"moon": 274,"nature": 273,"over": 274,"score": 273,"seven": 547,"snow": 274,"the": 1095,"two": 273,"white": 274,"with": 273,"years": 273}
Received message: {"a": 284,"ago": 284,"am": 283,"an": 284,"and": 568,"apple": 284,"at": 283,"away": 284,"cow": 283,"day": 284,"doctor": 284,"dwarfs": 284,"four": 284,"i": 283,"jumped": 283,"keeps": 284,"moon": 283,"nature": 283,"over": 283,"score": 284,"seven": 568,"snow": 284,"the": 1134,"two": 283,"white": 284,"with": 283,"years": 284}
Received message: {"a": 294,"ago": 294,"am": 293,"an": 294,"and": 588,"apple": 294,"at": 293,"away": 294,"cow": 294,"day": 294,"doctor": 294,"dwarfs": 294,"four": 294,"i": 293,"jumped": 294,"keeps": 294,"moon": 294,"nature": 293,"over": 294,"score": 294,"seven": 588,"snow": 294,"the": 1176,"two": 293,"white": 294,"with": 293,"years": 294}
Received message: {"a": 304,"ago": 304,"am": 303,"an": 304,"and": 608,"apple": 304,"at": 303,"away": 304,"cow": 305,"day": 304,"doctor": 304,"dwarfs": 304,"four": 304,"i": 303,"jumped": 305,"keeps": 304,"moon": 305,"nature": 303,"over": 305,"score": 304,"seven": 608,"snow": 304,"the": 1218,"two": 303,"white": 304,"with": 303,"years": 304}
The producer, in turn, should be producing output like this:
Sending message - four score and seven years ago
Sending message - i am at two with nature
Sending message - i am at two with nature
Sending message - four score and seven years ago
Sending message - an apple a day keeps the doctor away
Sending message - the cow jumped over the moon
Sending message - snow white and the seven dwarfs
Get the current function status:
curl http://localhost:8080/admin/v2/functions/public/default/wordcount/status
{
"functionStatusList": [{
"running": true,
"numProcessed": "2347",
"numSuccessfullyProcessed": "2347",
"lastInvocationTime": "1530237837516",
"instanceId": "0"
}]
}
If your output looks something like that, then the sandbox is working! That means that you now have an end-to-end, real-time, stateful processing platform powered by Apache Pulsar (incubating), Pulsar function, and Apache BookKeeper running on your laptop.
Examine Pulsar topics
You can get insight into Pulsar topics using the Pulsar Dashboard. The sandbox uses two topics: sentences
and wordcount
. You can get info on those topics by navigating to http://localhost:8000/stats/namespace/public/default/ in your browser.
The Pulsar Dashboard updates once every minute.
You can see the input and output topics in Pulsar:

You can also drill down into the stats of the input topic queue (named sentences
):

We can also take a look at the wordcount
topic, which contains word count results:
