Cracking the Code: Supervised vs. Unsupervised Machine Learning

The Scale of Financial Crime

Fintechs and banks are looking to adopt a risk-based approach to manage the detection of illegal activities, to intercept and cease any illicit funds from entering their ecosystem. Many are turning to solutions with artificial intelligence (AI) that harness the strength of machine learning (ML). They seek to strengthen their organization’s control, by being able to uncover previously undetected risks, all whilst continuing to grow their business and remain regulatory compliant.

In this evolving world of financial crime detection where AI and ML play a pivotal role in unraveling the complexities of illicit activities, there is a debate and perhaps confusion over which technique should be applied to best resolve the problem. Among the arsenal of ML techniques, two prominent methods stand out: unsupervised and supervised learning.

Supervised vs Unsupervised Machine Learning

Machine learning is a process that utilizes algorithms to enable computers to learn without being explicitly programmed. In simpler terms, these algorithms can absorb information and make informed predictions based on it.

Supervised machine learning

Supervised machine learning requires labeled input data, where each instance is associated with a known outcome or class. The algorithm learns from this labeled data and uses it to make predictions or classifications on new, unseen data. This systematic process permits the examination of multiple data options or connections, analyzing each against predefined criteria, before concluding.

This approach is straightforward and effective in cases where there’s a clear distinction between normal and abnormal behavior, due to its reliance on explicitly labeled examples to guide the learning process. Resulting in significant improvement in decision-making related to accurate outputs. The rule-based approach ensures it only matches records that fit the right conditions, guaranteeing the correct answer/output each time. However, in instances where randomness is introduced, an algorithm’s predictability may falter, resulting in occasional divergence in its outcomes.

Further, supervised machine learning can improve customer insights, to enhance learning on normal and abnormal behavior, by analyzing data collected. Predefined labeled data can be adjusted accordingly to provide better outputs in line with an organization’s risk level and business goals.

The exclusive utilization of supervised learning methods presents several hurdles, notably the substantial investment of time and effort required for data labeling and model training. Moreover, this approach heightens the risk of amplifying pre-existing biases and disregarded complexities. By training models solely on labeled data, there’s a palpable risk of mirroring historical decision-making trends or institutional norms, ultimately perpetuating entrenched biases. The model’s dependence on labeled data results in the replication of past decision trends and institutional practices, effectively creating a loop of feedback.

For instance, if analysts historically made assumptions regarding specific customer demographics like nationality or occupation, disproportionately flagging their activities as suspicious, the model learns from these patterns and perpetuates the same biases over time.

Supervised rule-based machine learning for transaction monitoring can be ill-equipped to address the increasing sophistication of financial criminals. As financial crimes become more complex and innovative, financial institutions must respond accordingly. Rule-based systems are resource-hungry, provide only partial coverage, are prone to false positives, and unintentionally introduce bias. Their inefficiencies make firms vulnerable to bad actors and regulatory fines.

Unsupervised machine learning

An alternative approach is through unsupervised machine learning, a dynamic and evolving system that learns the normal behavior of clients using historical unlabeled data. It has to infer its own rules and structure the information based on any similarities, differences, and/or patterns without explicit instructions on how to interpret each data set through techniques like clustering, association, and dimensionality reduction.

Once a baseline of normality is established, the system can then detect patterns and anomalies without explicit guidance.

Unsurprisingly, unsupervised learning is commonly used for exploratory data analysis on vast amounts of data, such as anomaly detection, big data visualization, or customer segmentation. Its particular effectiveness in detecting anomalies or unusual patterns in financial data without the need for labeled examples is crucial for identifying emerging threats and previously unseen forms of financial crime revealing hidden insights and relationships within the data.

This approach makes it easier for businesses to gain insights from data when no labels are present, helping them to understand the underlying structure of a dataset, and identify patterns and relationships, without the need for a human to teach them. With the explosion of data in various fields, unsupervised learning is becoming increasingly important in extracting valuable insights from large datasets. Moreover, leveraging unsupervised AI machine learning can drastically reduce false positives to just 1% of alerts, delivering substantial operational advantages for compliance teams.

However, consideration must be taken as unsupervised learning does not use pre-existing labeled data to identify specific outcomes/results, as the input data is not labeled and algorithms do not know the exact output in advance, a level of human interpretation may be required.

Semi-supervised learning

A combined proposition is semi-supervised learning, a combination of supervised and unsupervised learning, using both labeled and unlabeled data to train AI models for classification and regression tasks. This reduces the reliance on labeled data, allowing organizations to allocate resources more efficiently, and improves model performance by leveraging unlabeled data to learn in situations where obtaining a large amount of labeled data may be challenging. A practical solution that can deliver greater value with better outputs and reduction of cost (once the manual resource has been accounted for).

A machine learning solution, whether supervised, unsupervised, or semi-supervised, needs to perform accurately in a dynamic environment where business goals change, regulatory requirements adapt and financial crime activities evolve. The answer lies in looking at what the specific problem being addressed is, the quality and amount of data, and if there are relevant existing tools and experience to build and manage rule based models:

- What is the type and quantity of data available? And therefore, what type of algorithms would best support the organization?
- Is there the capacity to invest time and resources into structuring labeled datasets, could this be managed and maintained long-term?
- What are the immediate and long-term goals? Is the priority accurate decision-making with predicted behavior, or is the goal the ability to discover new trends and hidden patterns?

Four things to consider if a current machine learning solution is equipped to support the fight against financial crime:

1. Is the solution “unbiased” – does the ML spot anomalies in the data itself with no pre-predetermined rules?
2. Can the solution analyze data as a blank slate, i.e. can it quickly “learn” what is normal for that dataset and spot anomalies intuitively?
3. Is it adaptable? Can anomalies be detected (unknown activity) in any customer and product environment even if unanticipated criminality has developed?
4. Does it provide accurate outputs? Inaccuracy of flagged transactions causes vast false positives that are not detection-worthy, wasting valuable resources.

Blog writer: David Segev, Chief Scientist at ThetaRay

Category:

Blog

Client Testimonial

Brochure

News

Announcement: Jeff Otten Joins ThetaRay as Chief Revenue Officer