The document provides an introduction to the HPCC Systems open source platform. It describes how HPCC Systems can be used to solve challenges in detecting insurance fraud and bust out fraud. It also outlines the core workflow of learning from data to make decisions, and highlights key capabilities like high performance computing, a data-centric language, and an integrated delivery system for data and analytics. Examples are given of how HPCC Systems has been applied in various industries.
3. Why
HPCC
Systems-‐
Example
1:
Insurance
Collusion
3
• Detec)ng
insurance
claim
fraud
• The
Insurance
company
data
only
finds
a
connec3on
between
two
of
the
seven
claims,
and
only
iden)fied
one
other
claim
as
being
weakly
connected
THE
CHALLENGE
4. Example
1:
Insurance
Collusion
4
Family
1
Family
2
THE
SOLUTION
• The
results
showed
two
family
groups
interconnected
on
all
of
these
seven
claims
• The
links
were
much
stronger
than
the
carrier
data
previously
supported
Customers
Claim
data
is
linked
with
the
LexisNexis®
Risk
Solu)ons
data
using
the
HPCC
Systems
pla9orm
THE
RESULT
5. Example
2:
Bust
Out
Fraud
5
THE
CHALLENGE
• Individual
(unconnected)
accounts
were
defaul)ng
• Some
accounts
were
flagged
as
fraud
once
contact
was
lost
with
individuals
• It
was
challenging
for
the
financial
ins)tu)on
to
understand
the
depth
and
width
of
the
fraud
6. Example
2:
Bust
Out
Fraud
6
THE
SOLUTION
• 31
accounts
associated
with
3
fraud
accounts
(1
degree
separa)on)
• 212
accounts
associated
with
2
known
charge
of
accounts
(2
degree
of
separa)on)
• Iden3fied
Ring
Leader
with
8,
1st
degree
associates
and
72
2nd
degree
associates
5
million
accounts
flagged
with
ac)ve,
known
fraud,
charge
offs
and
preemp)vely
closed
tags
THE
RESULT
7. The
Core
Workflow
–
Learn
and
Make
Decisions
Introduc)on
to
HPCC
Systems
7
8. LEARN
Workflow
8
Raw
Historic
Data
–
e.g.
Duplicate
names,
unclean
phone
numbers
En)ty
Disambigua)on
and
Linking
(aka
MDM)
Profile,
Clean
and
Normalize
Social
Network
Graph
Crea)on
9. DECISION
Workflow
(Real-‐)me
or
Batch)
9
Customer
Inquiry
Data
Social
Network
Graph
Analysis
Outcome
10. The
Data
Centric
Approach
10
A
single
source
of
data
is
insufficient
to
overcome
inaccuracies
in
the
data
The
holes
are
inaccuracies
found
in
the
data.
Our
pla9orm
is
built
on
the
premise
of
absorbing
data
from
mul3ple
data
sources
and
transforming
them
to
a
highly
intelligent
social
network
graphs
that
can
be
manipulated
to
extract
the
non-‐obvious
value.
The
holes
in
the
core
data
have
been
eliminated.
11. • Grid
compu)ng
• Data-‐centric
language
(ECL)
• Integrated
delivery
system
that
offers
data
plus
analy)cs
Our
Solu)ons
Are
Powered
by
HPCC
Systems
at
Their
Core
11
Big
Data
Structured
Records
Unstructured
Records
News
Ar)cles
Proprietary
Data
Public
Records
Unstructured
and
Structured
Content
High
Performance
Compu)ng
Cluster
Pla9orm
(HPCC)
Analysis
Applica)ons
Key
Capabili)es
• Over
4
petabytes
of
content
• 50
billion
records
• 10,000
sources
• 7.5
billion
unique
name
and
address
combina)ons
• Mul)-‐bureau/mul)-‐
source
models
and
bureau
roll-‐over
support
• Extensive
experience
leveraging
atomic
level
data,
combining
and
leveraging
disparate
data
• Approximately
400
models
deployed
(custom
and
flagship)
• Data
and
analy)cs
• Iden)ty
verifica)on
and
authen)ca)on
• Fraud
detec)on
and
preven)on
• Inves)ga)on
• Screening
• Receivables
management
Fusion
Linking
Refinery
Open
Source
Components
Complex
Analysis
Clustering
Analysis
Link
Analysis
En3ty
Resolu3on
Financial
Services
Government
Health
Care
Insurance
Legal
Retail
Scien3fic
Technical
Medical
Exhibi3ons
13. 13
SAP
Oracle
ERP
RDBMS
Flat
Files
IoT
Terminals
JMS
Others
Thor
ROXIE
Standardiza3on
&
Aggrega3on
System
Query
Delivery
System
I
N
T
E
R
L
O
K
Data
Integra)on
• Connect
• Integrate
• Schedule
• Transform
Standardiza)on
• Clean
• Profile
• Normalize
Aggrega)on
• Master
Data
Crea)on
• Rela)onship
Analysis
• Predic)ve
Analysis
• Business
Intelligence
D
S
P
Integra3on
System
Visualiza3on
System
STRIKE
Technology
Overview
14. ECL:
A
Powerful
Data
Flow
Language
14
How
you
code
How
the
system
executes
it
15. Graph
Data
can
be
Represented
Using
Na)ve
Support
for
Hierarchical
Data
and
Index
Pointers
15
16. STRIKE
Technology
Layer
View
16
Data
Connect
Analy)cs
Tools
Common
Programming
Language
Data
Science
Portal
Cleaning
MDM
Dashboard
Creator
Workflow
Builder
ECL
Thor
ROXIE
Interlok
Profiling
Normaliza3on
Predic3ve
Analysis
Business
Intelligence
A`ribute
Crea3on
Rela3onship
Analysis
SALT
KEL
18. Industry
Example:
Smart
Hat
18
4,000
workers
die
and
millions
injured
annually
while
working
on
the
industrial
floor
Very
high
cost
for
maintaining
safety
for
businesses
THE
CHALLENGE
19. THE
SOLUTION
Example
2:
Smart
Hat
19
THE
OUTCOME:
Produced
an
industrial
wearable
that
uses
IoT
and
wireless
communica5ons
systems
to
protect
and
empower
industrial
workers.
1.
Factory
readings
(temp,
pressure,
CO,
CO2)
2.
Real-‐)me
alerts
3.
Update
monitoring
sta)on
Sensor
equipped
Wi-‐Fi
hardhats
Central
Monitoring
Sta)on
Predic)on
Engine
4.
Emergency
updates
20. Next
Genera)on
HPCC
Systems
Goal
20
Autonomous
Vehicles
&
Driver
Behavior
Security
&
Energy
Public
Health,
Safety,
Security
&
Transporta5on
Logis5cs
&
Naviga5on
Safety,
Opera5ons
&
Equipment
Op5miza5on
• Real-‐)me
data
collec)on,
analysis
and
aler)ng
to
enable
IoT
• Enable
event
driven
workflows
like
managing
Blockchain
ledgers
and
Driver
Behavior
Automa5on
&
Security
FACTORIES
HOME
OUTSIDE
OFFICES
CITIES
VEHICLES
21. Pa)ent
wearable
records
exercise
informa)on
Shared
Ledger
Doctor
informs
pa)ent
that
they
need
to
exercise
Pa)ent
Exercises
Smart
Contract
with
an
ini)al
$
value
is
created
Pa)ent
agrees
to
exercise
regimen
Pa)ent
wearable
updates
analy)cs
engine
periodically
Analy)cs
Engine
updates
the
ledger
by
adding
or
decreasing
value
of
the
contract
1
2
3
4
Contract
details
are
updated
to
a
shared
ledger
5
6
7
8
Example:
Healthcare
Blockchain
(Napster
for
contracts)
Enables
Event
Driven
Contracts