Ai and machine learning help detect, predict and prevent fraud - IBM Watson Data Science Meetup
1. AI and machine learning
help detect, predict and
prevent fraud
Nina Lozo
Data & AI Technical Professional
IBM South East Europe,
nina.lozo@rs.ibm.com
IBM Watson Data Science Meetup
8. Data Science in Fraud
Detection
8
A data science platform is
complimentary to other fraud
detection systems
• React: Predictive models can quickly
determine changing patterns in fraud
and react to them in real time
• Improve: Data science can help
derive new fraud detection rules,
which can be used to improve the
business process
• Achieve more: Data science can
increase the rate of fraud detection
15. Advanced analytics techniques can dramatically
improve the effectiveness and efficiency of fraud
management…
Where once fraud was detected by risk functions flagging
suspect transactions for manual review, firms can now use
neural networks based on unsupervised and supervised
architectures to monitor dubious activities.
– McKinsey & Company
Fraud management: Recovering value through next-generation solutions
16. 16
AI drives real business value in fraud prevention
Faster screening
updates
Enhanced screening
models
Enhanced accuracy
of fraud profiling
Enhanced identity
verification
Centralization of
fraud processes
Enhanced fraud
analytics tools
Lower cost of fraud
infrastructure
Reduced fraud false
positive rates
Improved
investigations
process
Facilitate
investigation case
management
Automated fraud
reporting
Comply with voluntary
and mandated
regulations while
differentiating
competitive position
Reduced costs of
payment fraud
losses
Reduced costs of
fraud screening &
monitoring
Reduced cost of
fraud investigations
Reduced cost of
compliance
reporting of fraud
17. AI enables you to predict likelihood of fraud and proactively act upon
insight to drive better prevention
Capture
Data Collection delivers an accurate view
of customer attitudes and opinions
Predict Act
Predictive capabilities bring repeatability
to ongoing decision making, and drive
confidence in your results and decisions
Unique deployment technologies and
methodologies maximize the impact of
analytics in your operation
…
…
Data
Collection
Deployment
TechnologiesPlatform
Deep
Learning
Detect Predict Analyze
Data
Mining
Machine
Learning
18. Easily & seamlessly
move from sandbox
to production
Connect data science
models with real-time
data
Deploy predictive
models into business
process
Create more “citizen
data scientists” with
visual modeling
Train advanced ML
models without data
science degree
Make fraud detection
easier
Empower data
scientists to get
ahead of fraudsters
Enable deep learning
and neural networks
Get latest models and
frameworks in fraud
prediction
Leverage more
unstructured data like
text and images
Easy tooling to train
models in a few clicks
Leverage pre-trained
APIs to jump start
development process
Move from fraud
detection to fraud
prediction
Upskill the team
to do more data
science
Stay ahead of
fraudsters with
latest ML models
Make faster and
more accurate
prediction
Strategies to stay ahead
19. 19
“With the data mining
system, we generated
productivity savings of
nearly 80 percent.”
Francisco Ruiz
Head of Compliance,
Bancolombia
Solution
Deployed predictive data-modeling software that helped it more easily and
quickly detect transactions that were part of potential money-laundering
operations
Solution prevents, detects and reports potentially fraudulent banking activities
that may stem from criminals and terrorists
Challenges
Need to analyze millions of daily transactions to identify current and potential
fraud
Move from a labor-intensive decentralized system to a more automated
process
Results
Reveals 40% more suspicious transactions by automatically identifying the
most likely fraudulent activities. Increases reporting capabilities by 200% and
analysts productivity by 80%
Discovers the latest money-laundering techniques by capturing data from 700
branches and 2,300 ATMs in six countries.
Aggregates multiple transaction activities with centralized reporting for more
precision in detecting financial relationships.
20. 20
"The IBM Data Science
Elite team was able to
help direct our operating
model, and skills,
towards a deeper and
more integrated
structure."
Guy Taylor
Head of Data & Data-Driven
Intelligence
Solution
Deployed predictive data-modeling software that helped it more easily and
quickly detect transactions that were part of potential money-laundering
operations
Solution prevents, detects and reports potentially fraudulent banking activities
that may stem from criminals and terrorists
Challenges
Fraudulent activity is very rare relative to all online banking activity, making it
difficult to predict, posing a reputational risk to the bank
Current fraudulent alert system has a very high false positive rate, lowering
customer satisfaction
Results
Reduce number of alerts that fraud responders must review and reduce missed
fraudulent activity
Assist fraud responders in identifying which suspicious activities are most likely
to be fraudulent.
21. 21
FIRST MODELING APPROACHCHALLENGES
• Fraudulent activity is very
rare relative to all online
banking activity
(0.004% of sessions)
• ~500M actions/ month
• Predictors need to be
accepted by fraud team
Nedbank:
Predict
Fraudulent
Online
Banking
Activity
SECOND MODELING APPROACH
21
OBJECTIVE
• Use supervised machine learning to
predict fraudulent activity within
Nedbank's mobile banking system
OVERVIEW
• Currently uses a decision-rule based
system to flag suspicious transactions for
review by fraud responders
• High false positive rate, low false negative
rate
• Missed fraudulent activity is costly
• Large volume of alerts places a burden on
responders
94%
48%
CURRENT SYSTEM
WITH AUGMENTATION
False Positives
4%
7%
CURRENT SYSTEM
WITH AUGMENTATION
False Negatives
95%
85%
CURRENT SYSTEM
ML MODEL
False Positives
17%
6%
CURRENT SYSTEM
ML MODEL
False Negatives
• Augment existing system by
predicting which alerts on
individual activity are correct
• Predict which user sessions are
fraudulent within first 10 seconds
22. 22
“Before this solution, the
minimum time it took to
settle a claim was three
days. Now, the low-risk
claims that pass down
the ‘immediate’ channel
can be settled within
an hour.”
Anesh Govender
Head of Finance, Reporting and
Salvage at Santam
Solution
Santam chose IBM for the range of functionality, flexibility, and its ability to
integrate with an existing system
The company’s core claims management system resided on a mainframe
platform that still met the company’s needs
The solution integrated different kinds of rules from across the infrastructure,
including process rules from company’s business process management
software system, decision and agility rules from SPSS software itself, and
override
Challenges
Fraud losses accounted for an annual 6 to 10 percent of premium costs for
Santam customers
Needed a solution that more effectively assessed risk and separated potentially
fraudulent claims from lower risk ones would prevent fraud, reduce other costs
and increase efficiency
Results
Identified a major fraud ring in less than 30 days after implementation.
Saved more than USD2.5 million in payouts to fraudulent customers, and nearly
USD5 million in total repudiations.
Reduced claims processing time on low-risk claims by nearly 90 percent.
23. 23
“The [Watson] Studio
gives us the ability to
process millions and
millions of records and
to be able to act real
time.”
Julio Sánchez
Global Analytics Lead -
Accenture Center for IBM
Technologies, Accenture
Solution
By analyzing internal company and external data to determine the
risk factors and level associated with each service user and alerts
audit managers of risky behavior
Challenges
Detects the likelihood of fraudulent behavior – such as an individual
posing as a legitimate customer who receives a service but won’t
pay for it
The ability to identify and trace new anomalous behaviors through
continuous monitoring is vital for companies to take preemptive
actions against future costly occurrences of fraud
Results
Process millions of records and be able to act real time
24. 24
Reduces payments on
fraudulent claims and
improves its ability to
collect payments from
other insurance
companies
Solution
IPCC implemented solution to rapidly identify and investigate
suspicious claims and to expedite handling of unsuspicious claims in
order to improve customer satisfaction.
Challenges
IPCC needed ways to automate the workflows and data gathering
related to fraudulent and subrogated automobile claims.
Results
Accelerated payments collection
Reduced costs of claims payments
Yielded annual return on investment (ROI) of 403% for direct and
indirect benefits and a payback within 3 months
26. Where do we go from here?
An IBM-led AI Journey Workshop provides the strategy and expertise to transform your business into a
cognitive enterprise and unlocks the full potential of your data with AI.
Briefing
& Vision
AI Journey
Workshop
Design
& Validate
Implement
& Deliver
Conclude
& Expand
Identify your unique
business challenges
and needs.
Explore how AI is
transforming
every industry.
Partake in an IBM-led
half or full day
workshop to explore
your use case and
scope out potential
solutions.
Work with IBM subject
matter experts to fully
define the scope and
success criteria for an
AI solution.
Delivery and
deployment of the
agreed upon AI
solution, tailored
specifically to your
business needs.
Explore how to further
accelerate your
organization’s AI
Journey with IBM.
Using AI/ML for fraud detection is not new. However, typical organization contains multiple fraud departments, each with its own internal point-solution which monitors fraud for that specific channel, product, or fraud type. Structured and unstructured data collected internally and externally but very few of these point-solutions share data. Each uses varying analytical techniques across channels and transaction systems, which results in not having a complete view of risk exposures across the institution. Cannot see patterns or behaviors that would spark a concern that fraudulent activity is crossing multi-business lines because the observation space is too narrow.
Combatting fraud and performing investigative action demands an end-to-end data science platform. It empowers an organization to scale analysis with ready access to public clouds, private clouds and on-premises. The platform also speeds modeling, training and deployment time and simplifies collaboration with data scientists, risk analysts, investigators, and other subject matter experts while adhering to strong governance and security posture. Further, in order to respond to new types of fraud, waste and abuse while minimizing false negatives and accelerating response, the platform needs to continuously accommodate real-time data, monitor and detect fraudulent activities and adapt as the patterns change and spot anomalies.
Rare occurrences create an imbalance in the classification of fraud detection models and makes detection challenging.
Shift to increased digital and mobile customer platforms led to transactions being executed more quickly, leaving banks and processors with less time to identify, counteract, and recover the underlying funds. As quickly as new technology is used to identify fraudsters, they themselves are identifying new ways of defrauding the bank. For instance, identity theft is mutating from card skimming to account takeovers (ATO). Synthetic identify, a scenario where fraudsters combine fragments of stolen or fake information to create a new identity and apply for financial products.
Upskill team to do more data science:
Single platform for all model development, regardless of expertise(open source coding frameworks for data scientists + visual programming tools for LOB experts and business analysts)
Align business and technical teams to work rapidly with routine against new threats
Bring together models developed from different departments and provide a 360 view of patterns and behaviors
Increase efficiency of “known” threats while continuously improving models for new threats
Faster discovery and deployment:
Support the full end to end AI lifecycle by seamlessly integrating with data management to understand your data and know where it is, deployment to get to production faster, and to protect against bias and promote trust via traceability
Fraud detection is complex and everchanging.
- To get ahead, you need deep learning – accelerated GPU (train faster), integration w/ most popular framework for deep learning (getting latest models in deep learning to stay ahead of the fraudsters), visual tools (visually build complex neural networks to train more advanced modeling) – for the data scientists
Organizations need to identify anomalies accurately and efficiently at the level of accounts, merchants, cardholders and locations.
False positives require manual investigations through providing content analytics across primary internal and external data sources
Fraud detection – meaning detecting fraudulent behavior after it occurs – forcing companies to set aside money and resources for the inevitable losses they will incur, costing financial institutions millions of dollars and destroying the customer experience. Financial institutions need to get in front of the problem and focus on fraud prevention.
Advanced analytics techniques can dramatically improve the effectiveness and efficiency of fraud management. The integration of high-quality data sources (such as digital communications, geospatial data, and satellite imagery), the use of more sophisticated modeling techniques (such as machine learning, deep learning, and natural-language processing), and the introduction of automation technologies (such as natural-language generation and cognitive-computing algorithms) are transforming the way companies approach risk management.
FASTER SCREENING UPDATES Higher detection of connected frauds thru faster updates to lists/models from monitoring analytics
ENHANCED SCREENING MODELS: Higher fraud detection thru enhanced fraud screening models
ENHANCED ACCURACY of FRAUD PROFILING Higher fraud detection thru enhanced scoring in fraud screening
ENHANCED IDENTITY VERIFICATION Higher fraud detection thru automated visual identity verification processes
CENTRALIZATION OF FRAUD PROCESSES Standardize, consolidate & automate fraud modeling across enterprise
ENHANCED FRAUD ANALYTICS TOOLS Improve productivity of fraud analysts with model building & testing acceleration tools
LOWER COST FRAUD INFRASTRUCTURE: Reduce cost of fraud analytics systems costs through use Big Data technologies
REDUCED FRAUD FALSE POSITIVE RATES: Reduce number of fraud investigations through enhanced false positive rates
IMPROVED INVESTIGATIONS PROCESS: Improve productivity of fraud investigators thru providing content analytics across primary internal and external data sources
FACILITATE INVESTIGATION CASE MANAGEMENT Improve multi-person fraud investigations through use of case management and collaboration tools
AUTOMATED FRAUD REPORTING: Reduce legal costs of fraud cases thru enhanced fraud discovery documentation
The process
Use predictive analytics to help predict the likelihood of fraud
Use data mining for clustering, classification, and segmenting data to find patterns and associations related to fraud
Use machine learning to detect anomalies in transactions and predict whether transactions are fraudulent
Use text/web mining to analyze unstructured data for sentiment analysis, or variable extraction to flag fraudulent activity
Connects insights on why fraud happens from deploy into production to predict to prevent it happening
Data science make fraud detection faster. Enabling your non-data scientists (analyst, SMEs) to train advanced ML models without DS degree – through visual modeling.
Fraud detection is complex and everchanging. To get ahead of fraudsters, you need deep learning – accelerated GPU (train faster), integration w/ most popular framework for deep learning (getting latest models in deep learning to stay ahead of the fraudsters), visual tools (visually build complex neural networks to train more advanced modeling) – for the data scientists
In the case of unstructured data (text and image), IBM is the only vendor that can provide easy tooling (train models in a few clicks, image segmentation, sentiment analysis) – pre-trained APIs, easy-to-use visual tools
Link to Reference Profile: http://w3-01.ibm.com/sales/ssi/cgi-bin/ssialias?infotype=CR&subtype=NA&htmlfid=0GLOS-87TQH2&appname=crmd#attachments
AML Compliance
Process >1.3 Million transactions / day
Predictive modeling allows the bank to narrow down the number of transactions requiring detailed analysis by 95 percent, saving resources and speeding report production.
Reduced the number of customers analyzed in each segment from 4,000 to 130, allowing for more targeted and cost-effective analysis
Solution synopsis
A bank in Colombia wants to adhere to stricter governmental regulations regarding the reporting of potentially fraudulent transactions by deploying IBM SPSS Modeler to centralize and automate its analysis of 1.3 million transactions per day; the new system can identify potential fraud more easily and more quickly, it can focus more precisely on between 5,000 and 6,000 transactions
Special handling instructions
The client has agreed to be a reference for sales situations. The status of any installation or implementation can change, so you should always contact the Primary Contact or Additional Contact named in the reference prior to discussing it with your client. Any public use, such as in marketing materials, on WWW sites, in press articles, etc., requires specific approval from the client. It is the responsibility of the person or any organization planning to use this reference to make sure that this is done. The IBM representative will, as appropriate, contact the client for review. You should not contact the client directly.
Business need
Bancolombia, a private bank based in Medellin, Colombia, serves 6 million customers in six countries. It needs to adhere to stricter governmental reporting requirements instituted in 2008, and to analyze millions of daily transactions to identify current and potential fraud. With its decentralized system, staff has to routinely analyze 120,000 customers and transactions per week. The bank wants to evolve from that labor-intensive decentralized system based on strict rules and parameters to a more automated one that would better detect unusual patterns or behavior.
Solution implementation
Bancolombia deployed IBM SPSS Modeler to improve its ability to identify potential money-laundering and other fraudulent activities. It increased the speed and precision of its compliance reporting, integrated and centralized data from its multiple branches and its lines of business, and substantially lowered the cost of analyzing individual transactions. It can now identify transactional activities that may have been distributed among multiple entities in order to circumvent statutory limits and currency regulations.
Benefits of the solution
By using IBM SPSS Modeler, Bancolombia was able to reduce the number of transactions it analyzed from 120,000 per week to between 5,000 and 6,000. By reducing the number of transactions it had to analyze, it was able to generate productivity savings of up to 80 percent. The increased efficiency it gained from the use of IBM SPSS Modeler also enabled it to increase the number of “suspicious operation” reports it files with the government from 400 to 1,200. At the same time, it has been able to submit reports of higher quality, which gives the government more information to pursue potential fraud. Previously, only 57 percent of the bank’s reports received the highest ratings in terms of quality and thoroughness. With the new system, 97 percent of the reports receive the highest rating.What Makes it Smarter: - Intelligent: Reveals 40 percent more suspicious transactions by automatically identifying the most like fraudulent activities. Increases reporting capabilities by 200 percent and analysis productivity by 80 percent. - Instrumented: Discovers the latest money-laundering techniques by capturing account data from 700 branches and 2,300 ATMs in six countries. - Interconnnected: Aggregates multiple transaction activities with centralized reporting for more precision in detecting financial relationships.Additional Smarter Planet information:- Intelligent: Reveals 40 percent more suspicious transactions by automatically mining 1.3 million transactions per day and identifying the most transactions most likely to be fraudulent. Previously, bank employees manually analyzed 120,000 customers and transactions per week. Because the new system can identify potential fraud more easily and more quickly, it can focus more precisely on a smaller number of transactions—between 5,000 and 6,000. Additional efficiency gains can be found in how the automated solution generated productivity savings of nearly 80 percent by reducing the number of staff needed to review massive transaction volume, while increasing reporting by 200 percent. - Instrumented: Discovers the latest money-laundering techniques and increases accuracy by capturing data from both commercial and personal accounts, and from 700 branches and 2,300 ATMs in six countries. The bank uses two specific identifiers crucial to understanding deviations: expected transactional patterns for different commercial segments, and normal transaction patterns for individual customers within each segment. By defining expected and typical patterns, the bank can then mine the data to identify either unusual transactions or sudden changes in behavior. - Interconnected: Aggregates activities that may be distributed among multiple entities in order to circumvent statutory limits and currency regulations by centralizing reporting from the bank’s 700 branches. This allows the bank to more precisely detect relationships between those who deposit money and those who receive money.Solutions/Offerings
Special handling instructions
The client has agreed to be a reference for sales situations. The status of any installation or implementation can change, so you should always contact the Primary Contact or Additional Contact named in the reference prior to discussing it with your client.
Link to Reference Profile: http://w3-01.ibm.com/sales/ssi/cgi-bin/ssialias?infotype=RF&subtype=CS&htmlfid=SANS-985HX2&appname=crmd
Public Case Study: http://www-03.ibm.com/software/businesscasestudies?synkey=P366760E07052S87
Client Name: Santam Insurance
About the client - Santam is South Africa’s largest short-term insurance company with assets of over R17 billion (US$ 1.88 billion). It provides personal, commercial, agricultural, and specialist insurance policies throughout South Africa and holds additional businesses in Zimbabwe, Malawi, Uganda, Tanzania and Zambia
Business Need:Santam faced the challenge of operating in an environment where fraud was estimated to account for between 6 and 10 percent of all premium revenue because of the challenges of managing complex claims while maintaining a high level of customer service. To solve this problem, Santam sought to find a more personalized method for managing each claim and prioritizing the effort needed to successfully investigate and mediate each claim. Solution Summary:The Head of Finance, Reporting and Salvage at Santam, sought to automate and manage these claims through a predictive analytics solution. Santam’s vendor evaluation process led to a decision to select either SAS Institute or Olrac SPSolutions (an IBM Partner), which was offering an IBM SPSS-based solution. Santam sought to create an advanced predictive analytics deployment and were sure that Olrac SPSolutions had the skills and prior experience necessary to build the necessary functionality from an IBM SPSS base solution. Results:Saved R17.9 million (US$ 1.98 million) in the first four months of use Benefits:The detection of an insurance fraud syndicate that had previously gone undetected; The use of predictive analytics to categorize claims also reduced the time and costassociated with settling cases; Low risk cases no longer have to go through the exhaustive due diligence that previously took at least three days to perform. Now, approximately 50 percent of these claims are accelerated through improved categorization. Fifteen percent of claims, or about 54,000 claims, can be processed in less than an hour, representing a 95 percent savings in time
https://www.ibm.com/case-studies/Accenture
Fraud detection in the telecommunications space is a major focus area for Accenture’s business.
Infinity Property & Casualty Corporation (IPCC) is a provider of personal automobile insurance with an emphasis on nonstandard auto insurance.
Nonstandard auto insurance provides coverage to drivers who, because of their driving record, age, or vehicle type, represent higher than normal risks and pay higher rates for coverage. The company’s products provide insurance coverage for liability to others for bodily injury and property damage, and for physical damage to an insured’s vehicle from collision and various other damages. IPCC distributes its products primarily through the Web and a network of independent agencies.
Connects insights on why fraud happens from deploy into production to predict to prevent it happening
Data science make fraud detection faster. Enabling your non-data scientists (analyst, SMEs) to train advanced ML models without DS degree – through visual modeling.
Fraud detection is complex and everchanging. To get ahead of fraud, you need deep learning – accelerated GPU (train faster), integration w/ most popular framework for deep learning (getting latest models in deep learning to stay ahead of the fraudsters), visual tools (visually build complex neural networks to train more advanced modeling) – for the data scientists
In the case of unstructured data (text and image), IBM is the only vendor that can provide easy tooling (train models in a few clicks, image segmentation, sentiment analysis) – pre-trained APIs, easy-to-use visual tools
An IBM-led AI Journey Workshop provides the strategy and expertise to transform your business into a cognitive enterprise and unlocks the full potential of your data with AI. AI Journey Workshops are complimentary, from IBM to you. • Develop an actionable use case and roadmap to the future. • Design a high-level solution in support of the given use case. • Identify gaps and plan to address through detailed design.
Once the backdrop of the auto insurance claims triaging story has been established, a best practice for this demo is to show the end result of what’s being built so your audience knows what they’re heading toward.
In this case, what we’re using Watson Studio for is to explore the available data from the insurance company’s system of record to pull out data of interest for the claims adjusters, and to build, train, and deploy a claims fraud probability model. We’re also taking the location data and using it to get weather data for the date/time of the loss event, and plotting the locations of interest onto a map.
Watson Studio features self-service tools designed for many different kinds of knowledge workers, ranging from business analysts who look for GUI tools, with MS Excel-like functionality, and data scientists who need tools to make the development and training of neural nets easier. There are two important points here:
- teams can mix and match tooling (ie. Use refinery to prepare a data set, and train a model with this prepared data set in a notebook or canvas tool
- all of these tools share a common environment, with governance tools, security, and management interfaces. No longer do people need to throw work over the fence by a team with specialized skills; now, these skilled users are in the same platform, where it’s easy for people to share results and work together.