Daniel Ferreira Jorge is Director of Operations at STICORP.
Adopting any new technology, including an open source one, is usually seen as a big risk even when it has significant technical advantages. I wanted to share a summary of our experience with Apache Pulsar, a key technology we’ve adopted to support our applications, because there are certainly other companies like ours that could also benefit from taking a closer look at Pulsar.
STICORP is a software company based in Brazil. We provide software solutions that help over 7,000 customers automate and manage tax reporting, helping them ensure compliance, avoid penalties, and identify places where they can save money on taxes. That requires managing a lot of documents and a large number of processing workflows triggered by activities that happen all the time at our customers–invoices, shipments, and payments all generate documents that our software needs to process. In a single day, a customer can generate over 200,000 documents that our software needs to handle.
Our software obtains these documents automatically through integrations with client and government systems and organizes them in a manner that allows the customer to analyze the data behind these documents. The content and complexity of these documents depends on the specific business process that needs to be addressed. For example, there are different types of documents based on invoices, shipments, and payments being sent from customers to suppliers and customers, and additional documents for recording when activities such as these have been completed.
Handling all of the different information in these documents requires a workflow consisting of several steps. The documents arrive in an encrypted form, so the first step is to decrypt them. Then the raw message content in the documents is extracted as an XML file and converted to JSON. From there it is split into multiple topics and put onto an event bus. Ultimately it ends up downstream in a Couchbase NoSQL database where it can be accessed by other applications.
Performance and scalability are critical for us. Because of the nature of the transactions that generate these documents, we can see sudden spikes of up to 25,000 messages per second for a single customer. It’s also important for us to be able to support large numbers of topics–breaking up the information in documents into multiple topics makes it much easier to organize and manage how we connect the right data to the right workflow processors, but it also means that we may have 30 different topics for a single customer.
We had originally deployed Apache Kafka to provide this event bus. Although we had a stable Kafka infrastructure and the decision to change wasn’t easy, we came to realize that Kafka was not the best technology for our needs–Kafka was not originally designed for cloud-native world we live in today and because of that we needed to spend a lot of time to make it work for our application. The main challenge we had with Kafka was that it wasn’t good at handling large numbers of topics. In addition, Kafka’s architecture made scaling painful for us–because Kafka brokers are stateful, expanding or shrinking a Kafka cluster requires rebalancing partitions, which impacts our performance and latency as well as limiting how easily and quickly we can react to changes in workloads.
Looking for alternatives, we learned about Apache Pulsar and decided to evaluate it. Because Apache Kafka and Apache Pulsar use similar messaging concepts, we saw that 100% of our Kafka use cases can be implemented with Pulsar in the exact same way that they are implemented with Kafka. That is one of the things that facilitated the decision to switch to Pulsar.
We also saw some important differences in the architecture and design of Pulsar. A key difference is the decoupling of storage and brokers–Pulsar brokers are stateless, whereas in Kafka data is stored on the brokers. That is one example of an architectural difference that allows Pulsar to do things that are difficult or impossible in Kafka. Examples include:
Apache Pulsar simply offered too many benefits to ignore. We made the decision to implement Pulsar and have been very happy with it. We’ve already moved over 30% of our production data flow to Pulsar and aim to have all of it running through Pulsar within the next six months.