In order to move value around the world reliably and securely for our clients, Ebury is a financial institution connected to the SWIFT network, the world’s leading provider of financial messaging services.
More than 10.000 financial institutions are currently connected to the network, enabling international funds transfer across more than 200 countries.
In a nutshell, we send and receive messages to and from the network – mainly regarding payments that we are sending to our clients’ beneficiaries, incoming funds that we are receiving from clients or third parties and credits or debits to maintain our balances with the liquidity providers.
The reconciliation process
Ebury must ensure that our internal business activity and records match all externally received information. There are many reasons that a financial institution would be required to maintain a rigorous reconciliation process: client balances, accounting purposes, regulatory and compliance requirements, fraud prevention and bounced checks.
Regarding the SWIFT network related transactions, we are interested in one specific message type received: the MT942 – which is the detailed list of entries debited, credited, or booked to the accounts.
One of the first steps of the complex reconciliation process is to classify the entries that we receive in the MT942 message into different groups. Examples of groups are if an entry is related to a liquidity provider activity, a client fund, a client returned fund, a company account movement, etc.
The correct classification into those groups will ensure that the reconciliation process is automatic and does not require any manual intervention, which is desired to avoid human mistakes and handle the workload on top of hundreds of entries daily.
The first attempt to classify the transaction entries into the groups was to analyse the messages and identify their patterns manually. The technical solution was to introduce a list of if-statements after parsing the MT942 message entries to define the group.
Initially, it was efficient and straightforward to quickly add a new rule every time an entry was wrongly classified. However, the classification accuracy decreased to less than 75% when we grew to several thousand customers, introduced more liquidity providers, and opened more accounts. Additionally, it became impossible to analyse thousands of entries and understand the independent patterns that could group them correctly.
The conclusion was to research and test a better solution to provide better accuracy for the entry classification and enable the automatic reconciliation process for most transactions.
Machine learning-based classification
The engineering and product teams started to evaluate a new way to approach the problem – wondering if a machine learning-based approach would find the patterns and result in better accuracy.
Before running the experiments, the team first defined the input as the content of the MT942 message and generated the matrix of TF-IDF features using the TfidfVectorizer. Then, a dataset was extracted with all the entries previously classified to be used as an input for the experiments – 80% of the data to train the model and 20% to validate the model results.
The first experiment used a generative classifier called MultinomialNB, the most popular one used to analyse categorical text data. The model results were not promising because this method is much more efficient for use cases with features with strong independence conditions.
The following experiment used a discriminative classifier called LogisticRegression, which does not assume conditional independence of the features. The model generated by the classifier resulted in an accuracy of 99%.
Nowadays, all entries inside the MT942 messages are classified in the corresponding groups in real-time by the team’s model.
The generated model works retrospectively, so we have to retrain the model when the accuracy decreases. Ideally, we would be able to automatically retrain a new model, compare the new accuracy with the previous one and promote it to be used instead of the old one.