4 uses of Kafka you’ll see inside Ebury

Engineering

by José Machado on January 27, 2022

As Ebury fully embraces new architectures supporting our business growth, we’ve selected Kafka to be a backbone of system integrations. Here you will find 4 cases where it has been implemented successfully:

1 – Data Migration (a.k.a. Database Consolidation Service)

Ebury has one system that has been leased to a client with all the same capabilities in the past. As time went by, this client was acquired by Ebury, and now it’s part of the same organisation. That created a problem: Now we have one system running over 2 different infrastructures, which means 2 instances of its database. In order to consolidate this system into a single one, we need to consolidate its data.

Since it’s a complex system, which grew apart in 2 different places, there are some reasons why we cannot just use a simple database-replication from point A to point B, such as primary key conflicts.

Let’s say you have a client, with primary key equals 42 in the source DB. Since both DBs grew apart from each other, the client with the same primary key (42) in the destination DB will be a different client.

Kafka to the rescue. Kafka doesn’t mean just the main component, but all of Kafka’s ecosystem, including Kafka Connect and specific CDC connectors.

The CDC Connector reads directly the database logs as they’re written, generating events for each insert/update/delete on it, allowing us to capture all the changes in the DB without impacting it.

Each table has its own topic inside Kafka, which means that the DB Consolidation Service can then read the appropriate topics, and consolidate the information on the destination database, generating the primary key translations as needed.

2 – Account Details Service (a.k.a. Microservice Pattern)

Allowing our clients to create their own accounts is a key process inside our business.

But the account creation process has several caveats. Each account must go through a validation process, and feed a number of different applications. That means each application must manage the accounts in the same way, and must be synchronised regarding the account status, such as “enabled” or “disabled”.

In order to solve this problem, we created a microservice which will manage all the client requests and handle the business validation of the accounts, and for each status change, this service will generate a Domain Event [1] that will feed the legacy applications.

These events range from “New Account Created”, to “Account Deactivated” and all the in-betweens necessary to manage the accounts, in a single place. The legacy applications can then listen to these events, and enable/disable the accounts as needed.

Also, changes in the accounts (such as an email update), will also generate domain events, and keep the whole system updated.

Since the process is asynchronous, we can trigger additional Sagas [1] for the accounts as they are enabled and/or disabled, allowing a higher level of automation

3 – Beneficiary Query Service – CQRS Pattern

Command Query Responsibility Segregation is a pattern where you separate your data-changing statements (commands) from your queries [2]. Although not necessary, a common pattern is to separate your command database from your queries databases, in order to have different workloads on different machines, and synchronise data between these two. This separation is effective if the synchronisation between the two databases is done in (almost) real-time.

We apply this concept in a legacy application, where all the commands are stored (insert a new client, change address of this other client, change telephone of this third one, etc), and for each change, a new Domain Event is generated, updating another database used specifically for queries.

Kafka acts as the data bus allowing synchronous command requests to be unbound from eventually consistent data in our query services.

It may not sound that impressive at first, but what if I told you that this simple concept reduced the workload on this legacy application, while also allowing an output of queries 50x faster on the new service and opens up options such as geo distribution of query services to put our systems and data closer to our clients worldwide?

Also, using CQRS opens up a door to use more advanced architectures such as Event Sourcing [3].

4 – Data lake

Another use we have for Kafka is to serve as our main backbone for feeding a data lake that provides an environment for big query analysis of our data.

As our applications grow, there is no need to add new and convoluted integrations on these systems, as the Kafka connectors used already provide a great level of adaptability to data changes, while also providing almost real time data feed to our analysts

By leveraging the Kafka ecosystem here for data delivery, we can focus our resources on the important part of the problem, which is data analysis.

No silver bullet

There is no single piece of technology that will solve all problems. Having Kafka to enable systems integration, for CQRS or Event Sourcing is great, but also brings its own challenges, as the CAP Theorem still applies.

Also, asynchronous communication is always harder to manage than synchronous communication, which means that we still have CRUD applications where it’s rational to have a simpler approach.

That being said, it’s a fact that Kafka high-performance, high-availability and high-scalability enabled faster development cycles and better usage of our resources.

References:

[1] You can check more details on Domain Events and Sagas on the links below:

[2] CQRS patterns:

[3] Event Sourcing

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_5ZETTGME4T	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_51187572_43	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	16 years 4 months	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.