SlideShare ist ein Scribd-Unternehmen logo
1 von 69
Downloaden Sie, um offline zu lesen
Big Data. New Physics.
And Geospatial “Superfood”
© 2014 IBM Corporation
1111
Jeff Jonas,Jeff Jonas,Jeff Jonas,Jeff Jonas, IBM Fellow
Chief Scientist, Context Computing
Email: jeffjonas@us.ibm.com
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
About the Speaker
Jeff Jonas
IBM Fellow, Chief Scientist for Context
Computing
Founder and Chief Scientist of Systems
Research & Development (SRD),
acquired by IBM in 2005
© 2014 IBM Corporation
2222
acquired by IBM in 2005
Been designing, building deploying
entity resolution systems for three
decades
This technology is used today by defense
& intelligence, financial institutions,
humanitarian efforts and more
Today: Primarily focused on
‘sensemaking on streams’ with special
attention towards privacy and civil
liberties protections
”The data must find the data
and the
relevance must find the user.”
© 2014 IBM Corporation
3333
relevance must find the user.”
ComputingPowerGrowth
Available
Observation
Space
Context
Trend: Organizations Are Getting Dumber
Enterprise
Amnesia
© 2014 IBM Corporation
4444
Time
ComputingPowerGrowth
Sensemaking
Algorithms
Available
Observation
Space
Context
WHY?
Trend: Organizations Are Getting Dumber
ComputingPowerGrowth
© 2014 IBM Corporation
5555
Time
Sensemaking
Algorithms
ComputingPowerGrowth
Algorithms at Dead End.
You Can’t
© 2014 IBM Corporation
6666
You Can’t
Squeeze Knowledge
Out of a Pixel.
No Context
scrila34@msn.com
© 2014 IBM Corporation
7777
Context, definition
Better understanding something
© 2014 IBM Corporation
8888
Better understanding something
by taking into account the
things around it.
I ducked as the bat flew my way.
Another exciting baseball game …
© 2014 IBM Corporation
9999
Information in Context … and Accumulating
Top 200
CustomerTwitter
scrila34@msn.com
LinkedIn
Career History
© 2014 IBM Corporation
10101010
Customer
Job
Applicant
Twitter
Influencer
AML
Investigation
The Puzzle Metaphor
Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes
and colors
What it represents is unknown – there is no picture on hand
Is it one puzzle, 15 puzzles, or 1,500 different puzzles?
© 2014 IBM Corporation
11111111
Some pieces are duplicates, missing, incomplete, low quality, or have
been misinterpreted
Some pieces may even be professionally fabricated lies
Until you take the pieces to the table and attempt assembly, you don’t
know what you are dealing with
270 pieces
90%
200 pieces
66%
150 pieces
50%
6 pieces
2%
Puzzling Images: Courtesy Ravensburger © 2011
© 2014 IBM Corporation
12121212
90% 66% 50% 2%
30 pieces
10%
(duplicates)
© 2014 IBM Corporation
13131313
© 2014 IBM Corporation
14141414
First Discovery
© 2014 IBM Corporation
15151515
More Data Finds Data
© 2014 IBM Corporation
16161616
Duplicates in Front Of Your Eyes
© 2014 IBM Corporation
17171717
First Duplicate Found Here
© 2014 IBM Corporation
18181818
© 2014 IBM Corporation
19191919
Incremental Context – Incremental Discovery
6:40pm START
22min “Hey, this one is a duplicate!”
35min “I think some pieces are missing.”
© 2014 IBM Corporation
20202020
37min “Looks like a bunch of hillbillies on
a porch.”
44min “Hillbillies, playing guitars, sitting
on a porch, near a barber sign …
and a banjo!”
150 pieces
50%
© 2014 IBM Corporation
21212121
Incremental Context – Incremental Discovery
47min “We should take the sky and grass
off the table.”
2hr “Let’s switch sides, and see if we
can make sense of this from
different perspectives.”
© 2014 IBM Corporation
22222222
different perspectives.”
2hr10m “Wait, there are three … no, four
puzzles.”
2hr17m “We need a bigger table.”
2hr18m “I think you threw in a few random
pieces.”
© 2014 IBM Corporation
23232323
How Context Accumulates
With each new observation … one of three assertions are made: 1) Un-
associated; 2) placed near like neighbors; or 3) connected
Must favor the false negative
New observations sometimes reverse earlier assertions
© 2014 IBM Corporation
24242424
Some observations produce novel discovery
The emerging picture helps focus collection interests
As the working space expands, computational effort increases
Given sufficient observations, there can come a tipping point
Thereafter, confidence improves while computational effort decreases!
UniqueIdentities
Overstated Population
© 2014 IBM Corporation
25252525
Observations
UniqueIdentities
True Population
Counting Is Difficult
Mark Smith
6/12/1978
Mark R Smith
(707) 433-0000
DL: 00001234
© 2014 IBM Corporation
26262626
6/12/1978
443-43-0000
File 1
File 2
UniqueIdentities
The Rise and Fall of a Population
© 2014 IBM Corporation
27272727
Observations
UniqueIdentities
True Population
Data Triangulation
New Record
Mark Smith
6/12/1978
Mark R Smith
(707) 433-0000
DL: 00001234
© 2014 IBM Corporation
28282828
Mark Randy Smith
443-43-0000
DL: 00001234
6/12/1978
443-43-0000
File 1
File 2
Big Data [in context]. New Physics.
More data: better the predictions
– Lower false positives
– Lower false negatives
© 2014 IBM Corporation
29292929
More data: bad data good
– Suddenly glad your data is not perfect
More data: less compute
Big Data
© 2014 IBM Corporation
30303030
Pile of ____ Information In Context
One Form of Context: “Expert Counting”
Is it 5 people each with 1 account … or is it 1
person with 5 accounts?
Is it 20 cases of H1N1 in 20 cities … or one case
reported 20 times?
© 2014 IBM Corporation
31313131
reported 20 times?
If one cannot count … one cannot estimate vector
or velocity (direction and speed).
Without vector and velocity … prediction is
nearly impossible.
Entity Resolution
Demonstration
© 2014 IBM Corporation
32323232
Entity Resolution Demonstration
DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON
George Balston
YOB: 1951 SSN: 5598
DOD: 1995
VOTERVOTERVOTERVOTER
George F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
© 2014 IBM Corporation
33333333
When it comes to best practices in voter matching, if only a name and year of
birth match, this is insufficient proof of a match. Many different people in the
U.S. share a name and year of birth.
Human review is required.
Unfortunately, there can be many thousands of cases just like this and state
election offices don’t have the staff/budget to manually review them all.
Now Consider This Tertiary DMV Record
DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON
George Balston
YOB: 1951 SSN: 5598
DOD: 1995
VOTERVOTERVOTERVOTER
George F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
© 2014 IBM Corporation
34343434
DMVDMVDMVDMV
George F Balston
YOB: 1951 SSN: 5598 D/L: 4801
3043 SW Clementine Blvd Apt 210
Beaverton, OR 97005
The DMV record contains enough features to match both the voter (name, year
of birth and driver’s license) and/or the deceased persons record (name, year of
birth and SSN). For the sake of argument, let’s say it matches the voter best.
DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON
George Balston
YOB: 1951 SSN: 5598
DOD: 1995
Features Accumulate
VOTERVOTERVOTERVOTER
George F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
DMVDMVDMVDMV
© 2014 IBM Corporation
35353535
The voter/DMV record now shares a name, year of birth and SSN with the
deceased person. In voter matching best practices, this evidence would be
sufficient to make a determination that this voter is likely deceased.
This case no longer needs human review.
DMVDMVDMVDMV
George F Balston
YOB: 1951 SSN: 5598 D/L: 4801
3043 SW Clementine Blvd Apt 210
Beaverton, OR 97005
VOTERVOTERVOTERVOTER
George F Balston
YOB: 1951 D/L: 4801
13070 SW Karen Blvd Apt 7
Beaverton, OR 97005
Last voted: 2008
DMVDMVDMVDMV
As features accumulate it
becomes possible to resolve
previous un-resolvable
identity records.
As events and transactions
Useful Insight Revealed!Useful Insight Revealed!
© 2014 IBM Corporation
36363636
DMVDMVDMVDMV
George F Balston
YOB: 1951 SSN: 5598 D/L: 4801
3043 SW Clementine Blvd Apt 210
Beaverton, OR 97005
DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON
George Balston
YOB: 1951 SSN: 5598
DOD: 1995
As events and transactions
accumulate – detection of
relevance improves.
Here we can see George who
died in 1995 voted in 2008.
Expert Counting: Degrees of Difficulty
Incompatible
Features
Deceit
Bob Jones
123455
Ken Wells
550119
© 2014 IBM Corporation
37373737
Exactly
Same
Fuzzy
Bob Jones
123455
Bob Jones
123455
Bob Jones
123455
Robert T Jonnes
000123455
Bob Jones
123455
bjones@hotmail
Deceit Detection Using Context Accumulation
Deceit
Bob Jones
123455
Ken Wells
550119Robert Jones
123455
POB 13452
DOB 03/12/73
Feature
Accumulation
© 2014 IBM Corporation
38383838
Ken Wells
550119
POB 999911
DOB 03/12/73
gw3e56@hotmail.com
gw3e56@hotmail.com
DOB 03/12/73
Robert Jones
123455
Ken Wells
550119
Resolved!
DOB 03/12/73
Bob Jones
POB 13452
gw3e56@hotmail.com
Skilled adversaries use “channel separation”
to avoid detection.
© 2014 IBM Corporation
39393939
Cell Phone #1
Unknown
Cell Phone #2
Unknown
Passport #1
William A.
Bank Acct #1
Billy K.
Detection requires “channel consolidation.”
© 2014 IBM Corporation
40404040
William A
aka Billy K.
• Cell Phone #1
• Cell Phone #2
• Bank Acct #1
• Passport #1
Take Note
To catch clever criminals, one must ...
1) Collect observations the adversary doesn’t
© 2014 IBM Corporation
41414141
1) Collect observations the adversary doesn’t
know you have
2) Or, be able to perform compute over your
observations in a manner the adversary cannot
fathom
InfoSphere Identity Insight
v8
© 2014 IBM Corporation
42424242
v8
New Think About Expert Counting
Incompatible
Features
Deceit
Bob Jones
123455
Ken Wells
550119
© 2014 IBM Corporation
43434343
Exactly
Same
Fuzzy
Bob Jones
123455
Bob Jones
123455
Bob Jones
123455
Robert T Jonnes
000123455
Bob Jones
123455
bjones@hotmail
Key Features Enable Expert Counting
Name License Plate No. Serial Number
Address VIN MAC Address
Date of Birth Make IP Address
Phone Model Make
Passport Year Model
People Cars Router
© 2014 IBM Corporation
44444444
Passport Year Model
Nationality Color Firmware Version
Biometric Etc. Etc.
Etc.
Consider Lying Identical Twins
#123
Sue
3/3/84
Uberstan
Exp 2011
PASSPORT
#123
Sue
3/3/84
Uberstan
Exp 2011
PASSPORT
© 2014 IBM Corporation
45454545
Fingerprint
DNA
Most Trusted
Authority
“Same
person –
trust me.”
Most Trusted
Authority
The same thing cannot be in two
places … at the same time.
Two different things cannot
occupy the same space … at the
© 2014 IBM Corporation
46464646
Two different things cannot
occupy the same space … at the
same time.
Space & Time Enables Absolute Disambiguation
When When When
Where Where Where
People Cars Router
Name License Plate No. Serial Number
Address VIN MAC Address
Date of Birth Make IP Address
Phone Model Make
Passport Year Model
© 2014 IBM Corporation
47474747
Passport Year Model
Nationality Color Firmware Version
Biometric Etc. Etc.
Etc.
“Life Arcs” Are Also Telling
Bill Smith
4/13/67
Salem, Oregon
Bill Smith
4/13/67
Seattle, Washington
Address History Address History
© 2014 IBM Corporation
48484848
Address History
Tampa, FL 2008-2008
Biloxi, MS 2005-2008
NY, NY 1996-2005
Tampa, FL 1984-1996
Address History
San Diego, CA 2005-2009
San Fran, CA 2005-2005
Phoenix, AZ 1990-2005
San Jose, CA 1982-1990
OMG
© 2014 IBM Corporation
49494949
Space-Time-Travel
Cell phones are generating a staggering amount of geo-
locational data – 600B transactions per day being created
in the US alone
This data is being “de-identified” and shared with third
parties – in volume and in real-time
© 2014 IBM Corporation
50505050
parties – in volume and in real-time
Your movement quickly reveals where you spend your
time (e.g., evenings vs. working hours)
Re-identification (figuring out who is who) is somewhat
trivial
And, oh so powerful predictions …
The 10 People I Spend the Most Time With
(Not at Home and Not at Work)
1. Michelle J
2. Renee M
3. Peggy M
4. Erin E
5. Joshua J
He must be
following me!
© 2014 IBM Corporation
51515151
4. Erin E
5. Joshua J
6. Ivan X
7. Bob Y
8. Amanda H
9. Dane J
10. Wesley R
He must be
following me!
Consequences
Space-time-travel data is the ultimate biometric
It will enable enormous opportunity
It will unravel one’s secrets
© 2014 IBM Corporation
52525252
It will unravel one’s secrets
It will challenge existing notions of privacy
Adoption is now accelerating at a blistering pace
[Theatrical Pause]
© 2014 IBM Corporation
53535353
[Theatrical Pause]
The G2 | Sensemaking Project
© 2014 IBM Corporation
54545454
The G2 Vision
1) Evaluate each new observation against previous
observations.
2) Determine if what is being observed is relevant.
3) Delivering this actionable insight to its consumer
© 2014 IBM Corporation
55555555
3) Delivering this actionable insight to its consumer
… fast enough to do something about it while it is
still happening.
4) Doing this with sufficient accuracy and scale to
really matter.
Uniquely G2
Real “Context Computing”
– Complete Context: Contextualize diverse observations, each observation benefiting from others
– Current Context: Real-time, incremental integration
– Conflicting Context: High tolerance for disagreement, confusion and uncertainty
– Self-Correcting Context: New observations able to reverse earlier assertions
Engineered ground-up for cloud compute … in support of hemisphere-scale data
© 2014 IBM Corporation
56565656
Introduce new data sources (e.g., geospatial), new entity types (e.g., vessels),
new features (e.g., MAC addresses) … without schema change/re-engineering
From sense to respond in sub-200ms– fast enough to do something about the
transaction while it is still happening
Unprecedented number of Privacy by Design (PbD) features baked-in
Privacy by Design (PbD)
1. Full Attribution
2. Tamper Resistant Audit Log
3. Information Transfer Accounting
4. Data Tethering
© 2014 IBM Corporation
57575757
http://jeffjonas.typepad.com/jeff_jonas/2012/06/privacy-by-design-in-the-era-of-big-data.html
4. Data Tethering
5. False Negative Favoring
6. Self-Correcting False Positives
7. Analytics on Anonymized Data
Example: Self-Correcting False Positive
John T Smith Jr
123 Main Street
703 111-2000
DOB: 03/12/1984
John T Smith
123 Main Street
A plausible claim these
two people are the same
1
2 John T Smith Sr
123 Main Street
Until this record
3
© 2014 IBM Corporation
58585858
Which reveals this is a
FALSE POSITIVE
123 Main Street
703 111-2000
DL: 009900991
2
123 Main Street
703 111-2000
DL: 009900991
Until this record
comes into view
Example: Self-Correcting False Positive
John T Smith Jr
123 Main Street
703 111-2000
DOB: 03/12/1984
John T Smith
123 Main Street
John T Smith Sr
123 Main Street
1
3
2
© 2014 IBM Corporation
59595959
123 Main Street
703 111-2000
DL: 009900991
123 Main Street
703 111-2000
DL: 009900991
New Best Practice:
FIXED IN REAL-TIME
(not end of month)
John T Smith
123 Main Street
703 111-2000
DL: 009900991
2
2
Use Cases
Maritime Domain Awareness
New system lets authorities track suspicious ships
http://www.asiaone.com/print/News/Latest%2BNews/Science%2Band%2BTech/Story/A1Story201
30703-434337.html
Voter Registration Modernization
© 2014 IBM Corporation
60606060
Voter Registration Modernization
David Becker (PEW Charitable Trust) and Jeff Jonas (IBM) Discuss How G2 Has Helped
Modernize Voter Registration in America
http://ibmreferencehub.com/STG/ibm_executive_edge_2013/#gensession_daytwo_jonasbecker
Closing Thoughts
© 2014 IBM Corporation
61616161
Available
Observation
Space
Context
Wish This on the Adversary
Enterprise
Amnesia
ComputingPowerGrowth
© 2014 IBM Corporation
62626262
Time
Sensemaking
Algorithms
ComputingPowerGrowth
Wish This for Yourself: Better Sensemaking Skills
Available
Observation
Space
Context
ComputingPowerGrowth
© 2014 IBM Corporation
63636363
Time
Sensemaking
Algorithms
ComputingPowerGrowth
State of the Union: Isolated Analytics
Structured Data Analytics
Unstructured Data Analytics
© 2014 IBM Corporation
64646464
Observation
Space
Action
Social Network Analytics
The Future: General Purpose Context Accumulation
Data Finds Data Relevance Finds You
This is GThis is GThis is GThis is G2222
© 2014 IBM Corporation
65656565
Observation
Space
Consumer
(An analyst, a system,
the sensor itself, etc.)
Information
In Context
The most competitive organizations
are going to make sense of what they are observing
fast enough to do something about it
© 2014 IBM Corporation
66666666
fast enough to do something about it
while they are observing it.
Related Blog Posts
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
Puzzling: How Observations Are Accumulated Into Context
Big Data. New Physics.
On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others
© 2014 IBM Corporation
67676767
Your Movements Speak for Themselves: Space-Time Travel Data is Analytic
Super-Food!
When Federated Search Bites
Data Finds Data
Structuring Unstructured Data
Fantasy Analytics
Questions?
© 2014 IBM Corporation
68686868
Email: jeffjonas@us.ibm.com
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas
Big Data. New Physics.
And Geospatial “Superfood”
© 2014 IBM Corporation
69696969
Jeff Jonas,Jeff Jonas,Jeff Jonas,Jeff Jonas, IBM Fellow
Chief Scientist, Context Computing
Email: jeffjonas@us.ibm.com
Blog: www.jeffjonas.typepad.com
Twitter: http://www.twitter.com/jeffjonas

Weitere ähnliche Inhalte

Andere mochten auch

Reverse Engineering
Reverse EngineeringReverse Engineering
Reverse Engineeringdswanson
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceStu Hood
 
Cassandra @Formspring
Cassandra @FormspringCassandra @Formspring
Cassandra @Formspringmartincozzi
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Manchester city
Manchester cityManchester city
Manchester cityofrancis
 
Waldorf Education
Waldorf EducationWaldorf Education
Waldorf EducationxMerodi
 

Andere mochten auch (19)

Sap fiori
Sap fioriSap fiori
Sap fiori
 
intel core i7
intel core i7intel core i7
intel core i7
 
Oprah Winfrey
Oprah WinfreyOprah Winfrey
Oprah Winfrey
 
Clojure
ClojureClojure
Clojure
 
Reverse Engineering
Reverse EngineeringReverse Engineering
Reverse Engineering
 
Chess
ChessChess
Chess
 
Lionel Messi
Lionel MessiLionel Messi
Lionel Messi
 
Lionel messi
Lionel messiLionel messi
Lionel messi
 
Growth Hacking
Growth HackingGrowth Hacking
Growth Hacking
 
Hadoop and Cassandra at Rackspace
Hadoop and Cassandra at RackspaceHadoop and Cassandra at Rackspace
Hadoop and Cassandra at Rackspace
 
Cassandra @Formspring
Cassandra @FormspringCassandra @Formspring
Cassandra @Formspring
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Manchester city
Manchester cityManchester city
Manchester city
 
Jim rohn
Jim  rohnJim  rohn
Jim rohn
 
Waldorf Education
Waldorf EducationWaldorf Education
Waldorf Education
 

Ähnlich wie Jeff jonas big data new physics

Defrag 2010-distrib
Defrag 2010-distribDefrag 2010-distrib
Defrag 2010-distribJeff Jonas
 
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked LeaksEOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked LeaksJeff Jonas
 
Strata hadoop world 2015 context computing - jonas keynote - final
Strata hadoop world 2015   context computing - jonas keynote - finalStrata hadoop world 2015   context computing - jonas keynote - final
Strata hadoop world 2015 context computing - jonas keynote - finalJeff Jonas
 
CMU 2011 Watson Event
CMU 2011 Watson EventCMU 2011 Watson Event
CMU 2011 Watson EventMark Sherman
 
Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15
Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15
Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15Digiday
 
Confessions of an Architect
Confessions of an ArchitectConfessions of an Architect
Confessions of an ArchitectJeff Jonas
 
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...SolarWinds
 
Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Chris Taggart
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
Winning the 3rd Wave of BI
Winning the 3rd Wave of BIWinning the 3rd Wave of BI
Winning the 3rd Wave of BILooker
 
Data Driven Decisions at Scale
Data Driven Decisions at ScaleData Driven Decisions at Scale
Data Driven Decisions at ScaleDatabricks
 
How to Conquer your Post-Election Data Chaos with the Cicero API
How to Conquer your Post-Election Data Chaos with the Cicero APIHow to Conquer your Post-Election Data Chaos with the Cicero API
How to Conquer your Post-Election Data Chaos with the Cicero APIAzavea
 
Data Breaches - Sageworks, Inc., Webinar Series by Douglas Jambor
Data Breaches - Sageworks, Inc., Webinar Series by Douglas JamborData Breaches - Sageworks, Inc., Webinar Series by Douglas Jambor
Data Breaches - Sageworks, Inc., Webinar Series by Douglas JamborTurner and Associates, Inc.
 
TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!
TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!
TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!Wayne Norris
 

Ähnlich wie Jeff jonas big data new physics (15)

Defrag 2010-distrib
Defrag 2010-distribDefrag 2010-distrib
Defrag 2010-distrib
 
EOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked LeaksEOCD Big Data Flows vs. Wicked Leaks
EOCD Big Data Flows vs. Wicked Leaks
 
Strata hadoop world 2015 context computing - jonas keynote - final
Strata hadoop world 2015   context computing - jonas keynote - finalStrata hadoop world 2015   context computing - jonas keynote - final
Strata hadoop world 2015 context computing - jonas keynote - final
 
CMU 2011 Watson Event
CMU 2011 Watson EventCMU 2011 Watson Event
CMU 2011 Watson Event
 
Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15
Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15
Five Things to Consider in Multilingual Digital Publishing - DPSE, 10/5/15
 
Hum t19 hum-t19
Hum t19 hum-t19Hum t19 hum-t19
Hum t19 hum-t19
 
Confessions of an Architect
Confessions of an ArchitectConfessions of an Architect
Confessions of an Architect
 
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
If an Application Fails in the Datacenter and No Users Are On It, Will it Cut...
 
Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
Winning the 3rd Wave of BI
Winning the 3rd Wave of BIWinning the 3rd Wave of BI
Winning the 3rd Wave of BI
 
Data Driven Decisions at Scale
Data Driven Decisions at ScaleData Driven Decisions at Scale
Data Driven Decisions at Scale
 
How to Conquer your Post-Election Data Chaos with the Cicero API
How to Conquer your Post-Election Data Chaos with the Cicero APIHow to Conquer your Post-Election Data Chaos with the Cicero API
How to Conquer your Post-Election Data Chaos with the Cicero API
 
Data Breaches - Sageworks, Inc., Webinar Series by Douglas Jambor
Data Breaches - Sageworks, Inc., Webinar Series by Douglas JamborData Breaches - Sageworks, Inc., Webinar Series by Douglas Jambor
Data Breaches - Sageworks, Inc., Webinar Series by Douglas Jambor
 
TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!
TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!
TheInternetOfEvidence(tm)-LittleBrotherIsWatchingYou-AndHe'sTakingNotes!
 

Mehr von MIT Forum of Israel

Ben gurion int airport shmul zackay
Ben gurion int airport shmul zackayBen gurion int airport shmul zackay
Ben gurion int airport shmul zackayMIT Forum of Israel
 
הנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטר
הנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטרהנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטר
הנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטרMIT Forum of Israel
 
Breezo meter CleanTech Open 2014
Breezo meter   CleanTech Open 2014Breezo meter   CleanTech Open 2014
Breezo meter CleanTech Open 2014MIT Forum of Israel
 
המכון הישראלי לייצוא CleanTech Open 2014
המכון הישראלי לייצוא CleanTech Open 2014המכון הישראלי לייצוא CleanTech Open 2014
המכון הישראלי לייצוא CleanTech Open 2014MIT Forum of Israel
 
Michal Vakrat Wolkin CleanTech Open 2014
Michal Vakrat Wolkin CleanTech Open 2014Michal Vakrat Wolkin CleanTech Open 2014
Michal Vakrat Wolkin CleanTech Open 2014MIT Forum of Israel
 
Green spense CleanTech Open 2014
Green spense CleanTech Open 2014Green spense CleanTech Open 2014
Green spense CleanTech Open 2014MIT Forum of Israel
 
Ayla matalon CleanTech Open 2014
Ayla matalon CleanTech Open 2014Ayla matalon CleanTech Open 2014
Ayla matalon CleanTech Open 2014MIT Forum of Israel
 
AutoAgronome Cleantech Open 2014
AutoAgronome Cleantech Open 2014AutoAgronome Cleantech Open 2014
AutoAgronome Cleantech Open 2014MIT Forum of Israel
 
Yaniv Mor - Xplenty - big data new physics
Yaniv Mor - Xplenty - big data new physicsYaniv Mor - Xplenty - big data new physics
Yaniv Mor - Xplenty - big data new physicsMIT Forum of Israel
 
Mit Dec 2013 Measurement Redefined Similar Group
Mit Dec 2013 Measurement Redefined Similar GroupMit Dec 2013 Measurement Redefined Similar Group
Mit Dec 2013 Measurement Redefined Similar GroupMIT Forum of Israel
 

Mehr von MIT Forum of Israel (20)

Ben gurion int airport shmul zackay
Ben gurion int airport shmul zackayBen gurion int airport shmul zackay
Ben gurion int airport shmul zackay
 
Yossi cohen preso
Yossi cohen preso   Yossi cohen preso
Yossi cohen preso
 
הנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטר
הנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטרהנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטר
הנדסה פיננסית, השוואה ומה שיש ביניהם - סיפורה של פוינטר
 
הדרך להצלחה
הדרך להצלחההדרך להצלחה
הדרך להצלחה
 
טיפים לעסקאות Fwmk
טיפים לעסקאות Fwmkטיפים לעסקאות Fwmk
טיפים לעסקאות Fwmk
 
Treechains presentation
Treechains presentationTreechains presentation
Treechains presentation
 
Breezo meter CleanTech Open 2014
Breezo meter   CleanTech Open 2014Breezo meter   CleanTech Open 2014
Breezo meter CleanTech Open 2014
 
המכון הישראלי לייצוא CleanTech Open 2014
המכון הישראלי לייצוא CleanTech Open 2014המכון הישראלי לייצוא CleanTech Open 2014
המכון הישראלי לייצוא CleanTech Open 2014
 
Michal Vakrat Wolkin CleanTech Open 2014
Michal Vakrat Wolkin CleanTech Open 2014Michal Vakrat Wolkin CleanTech Open 2014
Michal Vakrat Wolkin CleanTech Open 2014
 
GenCell CleanTech Open 2014
GenCell CleanTech Open 2014GenCell CleanTech Open 2014
GenCell CleanTech Open 2014
 
Fuu CleanTech Open 2014
Fuu CleanTech Open 2014Fuu CleanTech Open 2014
Fuu CleanTech Open 2014
 
Green spense CleanTech Open 2014
Green spense CleanTech Open 2014Green spense CleanTech Open 2014
Green spense CleanTech Open 2014
 
Evr motors CleanTech Open 2014
Evr motors CleanTech Open 2014Evr motors CleanTech Open 2014
Evr motors CleanTech Open 2014
 
Ayla matalon CleanTech Open 2014
Ayla matalon CleanTech Open 2014Ayla matalon CleanTech Open 2014
Ayla matalon CleanTech Open 2014
 
AutoAgronome Cleantech Open 2014
AutoAgronome Cleantech Open 2014AutoAgronome Cleantech Open 2014
AutoAgronome Cleantech Open 2014
 
Asaf hahami
Asaf hahami Asaf hahami
Asaf hahami
 
43 north
43 north43 north
43 north
 
Dr Ohad Barzilay
Dr Ohad BarzilayDr Ohad Barzilay
Dr Ohad Barzilay
 
Yaniv Mor - Xplenty - big data new physics
Yaniv Mor - Xplenty - big data new physicsYaniv Mor - Xplenty - big data new physics
Yaniv Mor - Xplenty - big data new physics
 
Mit Dec 2013 Measurement Redefined Similar Group
Mit Dec 2013 Measurement Redefined Similar GroupMit Dec 2013 Measurement Redefined Similar Group
Mit Dec 2013 Measurement Redefined Similar Group
 

Kürzlich hochgeladen

FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 

Kürzlich hochgeladen (20)

FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 

Jeff jonas big data new physics

  • 1. Big Data. New Physics. And Geospatial “Superfood” © 2014 IBM Corporation 1111 Jeff Jonas,Jeff Jonas,Jeff Jonas,Jeff Jonas, IBM Fellow Chief Scientist, Context Computing Email: jeffjonas@us.ibm.com Blog: www.jeffjonas.typepad.com Twitter: http://www.twitter.com/jeffjonas
  • 2. About the Speaker Jeff Jonas IBM Fellow, Chief Scientist for Context Computing Founder and Chief Scientist of Systems Research & Development (SRD), acquired by IBM in 2005 © 2014 IBM Corporation 2222 acquired by IBM in 2005 Been designing, building deploying entity resolution systems for three decades This technology is used today by defense & intelligence, financial institutions, humanitarian efforts and more Today: Primarily focused on ‘sensemaking on streams’ with special attention towards privacy and civil liberties protections
  • 3. ”The data must find the data and the relevance must find the user.” © 2014 IBM Corporation 3333 relevance must find the user.”
  • 4. ComputingPowerGrowth Available Observation Space Context Trend: Organizations Are Getting Dumber Enterprise Amnesia © 2014 IBM Corporation 4444 Time ComputingPowerGrowth Sensemaking Algorithms
  • 5. Available Observation Space Context WHY? Trend: Organizations Are Getting Dumber ComputingPowerGrowth © 2014 IBM Corporation 5555 Time Sensemaking Algorithms ComputingPowerGrowth
  • 6. Algorithms at Dead End. You Can’t © 2014 IBM Corporation 6666 You Can’t Squeeze Knowledge Out of a Pixel.
  • 7. No Context scrila34@msn.com © 2014 IBM Corporation 7777
  • 8. Context, definition Better understanding something © 2014 IBM Corporation 8888 Better understanding something by taking into account the things around it.
  • 9. I ducked as the bat flew my way. Another exciting baseball game … © 2014 IBM Corporation 9999
  • 10. Information in Context … and Accumulating Top 200 CustomerTwitter scrila34@msn.com LinkedIn Career History © 2014 IBM Corporation 10101010 Customer Job Applicant Twitter Influencer AML Investigation
  • 11. The Puzzle Metaphor Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors What it represents is unknown – there is no picture on hand Is it one puzzle, 15 puzzles, or 1,500 different puzzles? © 2014 IBM Corporation 11111111 Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted Some pieces may even be professionally fabricated lies Until you take the pieces to the table and attempt assembly, you don’t know what you are dealing with
  • 12. 270 pieces 90% 200 pieces 66% 150 pieces 50% 6 pieces 2% Puzzling Images: Courtesy Ravensburger © 2011 © 2014 IBM Corporation 12121212 90% 66% 50% 2% 30 pieces 10% (duplicates)
  • 13. © 2014 IBM Corporation 13131313
  • 14. © 2014 IBM Corporation 14141414
  • 15. First Discovery © 2014 IBM Corporation 15151515
  • 16. More Data Finds Data © 2014 IBM Corporation 16161616
  • 17. Duplicates in Front Of Your Eyes © 2014 IBM Corporation 17171717
  • 18. First Duplicate Found Here © 2014 IBM Corporation 18181818
  • 19. © 2014 IBM Corporation 19191919
  • 20. Incremental Context – Incremental Discovery 6:40pm START 22min “Hey, this one is a duplicate!” 35min “I think some pieces are missing.” © 2014 IBM Corporation 20202020 37min “Looks like a bunch of hillbillies on a porch.” 44min “Hillbillies, playing guitars, sitting on a porch, near a barber sign … and a banjo!”
  • 21. 150 pieces 50% © 2014 IBM Corporation 21212121
  • 22. Incremental Context – Incremental Discovery 47min “We should take the sky and grass off the table.” 2hr “Let’s switch sides, and see if we can make sense of this from different perspectives.” © 2014 IBM Corporation 22222222 different perspectives.” 2hr10m “Wait, there are three … no, four puzzles.” 2hr17m “We need a bigger table.” 2hr18m “I think you threw in a few random pieces.”
  • 23. © 2014 IBM Corporation 23232323
  • 24. How Context Accumulates With each new observation … one of three assertions are made: 1) Un- associated; 2) placed near like neighbors; or 3) connected Must favor the false negative New observations sometimes reverse earlier assertions © 2014 IBM Corporation 24242424 Some observations produce novel discovery The emerging picture helps focus collection interests As the working space expands, computational effort increases Given sufficient observations, there can come a tipping point Thereafter, confidence improves while computational effort decreases!
  • 25. UniqueIdentities Overstated Population © 2014 IBM Corporation 25252525 Observations UniqueIdentities True Population
  • 26. Counting Is Difficult Mark Smith 6/12/1978 Mark R Smith (707) 433-0000 DL: 00001234 © 2014 IBM Corporation 26262626 6/12/1978 443-43-0000 File 1 File 2
  • 27. UniqueIdentities The Rise and Fall of a Population © 2014 IBM Corporation 27272727 Observations UniqueIdentities True Population
  • 28. Data Triangulation New Record Mark Smith 6/12/1978 Mark R Smith (707) 433-0000 DL: 00001234 © 2014 IBM Corporation 28282828 Mark Randy Smith 443-43-0000 DL: 00001234 6/12/1978 443-43-0000 File 1 File 2
  • 29. Big Data [in context]. New Physics. More data: better the predictions – Lower false positives – Lower false negatives © 2014 IBM Corporation 29292929 More data: bad data good – Suddenly glad your data is not perfect More data: less compute
  • 30. Big Data © 2014 IBM Corporation 30303030 Pile of ____ Information In Context
  • 31. One Form of Context: “Expert Counting” Is it 5 people each with 1 account … or is it 1 person with 5 accounts? Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times? © 2014 IBM Corporation 31313131 reported 20 times? If one cannot count … one cannot estimate vector or velocity (direction and speed). Without vector and velocity … prediction is nearly impossible.
  • 32. Entity Resolution Demonstration © 2014 IBM Corporation 32323232
  • 33. Entity Resolution Demonstration DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 © 2014 IBM Corporation 33333333 When it comes to best practices in voter matching, if only a name and year of birth match, this is insufficient proof of a match. Many different people in the U.S. share a name and year of birth. Human review is required. Unfortunately, there can be many thousands of cases just like this and state election offices don’t have the staff/budget to manually review them all.
  • 34. Now Consider This Tertiary DMV Record DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 © 2014 IBM Corporation 34343434 DMVDMVDMVDMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 The DMV record contains enough features to match both the voter (name, year of birth and driver’s license) and/or the deceased persons record (name, year of birth and SSN). For the sake of argument, let’s say it matches the voter best.
  • 35. DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 Features Accumulate VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DMVDMVDMVDMV © 2014 IBM Corporation 35353535 The voter/DMV record now shares a name, year of birth and SSN with the deceased person. In voter matching best practices, this evidence would be sufficient to make a determination that this voter is likely deceased. This case no longer needs human review. DMVDMVDMVDMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005
  • 36. VOTERVOTERVOTERVOTER George F Balston YOB: 1951 D/L: 4801 13070 SW Karen Blvd Apt 7 Beaverton, OR 97005 Last voted: 2008 DMVDMVDMVDMV As features accumulate it becomes possible to resolve previous un-resolvable identity records. As events and transactions Useful Insight Revealed!Useful Insight Revealed! © 2014 IBM Corporation 36363636 DMVDMVDMVDMV George F Balston YOB: 1951 SSN: 5598 D/L: 4801 3043 SW Clementine Blvd Apt 210 Beaverton, OR 97005 DECEASED PERSONDECEASED PERSONDECEASED PERSONDECEASED PERSON George Balston YOB: 1951 SSN: 5598 DOD: 1995 As events and transactions accumulate – detection of relevance improves. Here we can see George who died in 1995 voted in 2008.
  • 37. Expert Counting: Degrees of Difficulty Incompatible Features Deceit Bob Jones 123455 Ken Wells 550119 © 2014 IBM Corporation 37373737 Exactly Same Fuzzy Bob Jones 123455 Bob Jones 123455 Bob Jones 123455 Robert T Jonnes 000123455 Bob Jones 123455 bjones@hotmail
  • 38. Deceit Detection Using Context Accumulation Deceit Bob Jones 123455 Ken Wells 550119Robert Jones 123455 POB 13452 DOB 03/12/73 Feature Accumulation © 2014 IBM Corporation 38383838 Ken Wells 550119 POB 999911 DOB 03/12/73 gw3e56@hotmail.com gw3e56@hotmail.com DOB 03/12/73 Robert Jones 123455 Ken Wells 550119 Resolved! DOB 03/12/73 Bob Jones POB 13452 gw3e56@hotmail.com
  • 39. Skilled adversaries use “channel separation” to avoid detection. © 2014 IBM Corporation 39393939 Cell Phone #1 Unknown Cell Phone #2 Unknown Passport #1 William A. Bank Acct #1 Billy K.
  • 40. Detection requires “channel consolidation.” © 2014 IBM Corporation 40404040 William A aka Billy K. • Cell Phone #1 • Cell Phone #2 • Bank Acct #1 • Passport #1
  • 41. Take Note To catch clever criminals, one must ... 1) Collect observations the adversary doesn’t © 2014 IBM Corporation 41414141 1) Collect observations the adversary doesn’t know you have 2) Or, be able to perform compute over your observations in a manner the adversary cannot fathom
  • 42. InfoSphere Identity Insight v8 © 2014 IBM Corporation 42424242 v8
  • 43. New Think About Expert Counting Incompatible Features Deceit Bob Jones 123455 Ken Wells 550119 © 2014 IBM Corporation 43434343 Exactly Same Fuzzy Bob Jones 123455 Bob Jones 123455 Bob Jones 123455 Robert T Jonnes 000123455 Bob Jones 123455 bjones@hotmail
  • 44. Key Features Enable Expert Counting Name License Plate No. Serial Number Address VIN MAC Address Date of Birth Make IP Address Phone Model Make Passport Year Model People Cars Router © 2014 IBM Corporation 44444444 Passport Year Model Nationality Color Firmware Version Biometric Etc. Etc. Etc.
  • 45. Consider Lying Identical Twins #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT #123 Sue 3/3/84 Uberstan Exp 2011 PASSPORT © 2014 IBM Corporation 45454545 Fingerprint DNA Most Trusted Authority “Same person – trust me.” Most Trusted Authority
  • 46. The same thing cannot be in two places … at the same time. Two different things cannot occupy the same space … at the © 2014 IBM Corporation 46464646 Two different things cannot occupy the same space … at the same time.
  • 47. Space & Time Enables Absolute Disambiguation When When When Where Where Where People Cars Router Name License Plate No. Serial Number Address VIN MAC Address Date of Birth Make IP Address Phone Model Make Passport Year Model © 2014 IBM Corporation 47474747 Passport Year Model Nationality Color Firmware Version Biometric Etc. Etc. Etc.
  • 48. “Life Arcs” Are Also Telling Bill Smith 4/13/67 Salem, Oregon Bill Smith 4/13/67 Seattle, Washington Address History Address History © 2014 IBM Corporation 48484848 Address History Tampa, FL 2008-2008 Biloxi, MS 2005-2008 NY, NY 1996-2005 Tampa, FL 1984-1996 Address History San Diego, CA 2005-2009 San Fran, CA 2005-2005 Phoenix, AZ 1990-2005 San Jose, CA 1982-1990
  • 49. OMG © 2014 IBM Corporation 49494949
  • 50. Space-Time-Travel Cell phones are generating a staggering amount of geo- locational data – 600B transactions per day being created in the US alone This data is being “de-identified” and shared with third parties – in volume and in real-time © 2014 IBM Corporation 50505050 parties – in volume and in real-time Your movement quickly reveals where you spend your time (e.g., evenings vs. working hours) Re-identification (figuring out who is who) is somewhat trivial And, oh so powerful predictions …
  • 51. The 10 People I Spend the Most Time With (Not at Home and Not at Work) 1. Michelle J 2. Renee M 3. Peggy M 4. Erin E 5. Joshua J He must be following me! © 2014 IBM Corporation 51515151 4. Erin E 5. Joshua J 6. Ivan X 7. Bob Y 8. Amanda H 9. Dane J 10. Wesley R He must be following me!
  • 52. Consequences Space-time-travel data is the ultimate biometric It will enable enormous opportunity It will unravel one’s secrets © 2014 IBM Corporation 52525252 It will unravel one’s secrets It will challenge existing notions of privacy Adoption is now accelerating at a blistering pace
  • 53. [Theatrical Pause] © 2014 IBM Corporation 53535353 [Theatrical Pause]
  • 54. The G2 | Sensemaking Project © 2014 IBM Corporation 54545454
  • 55. The G2 Vision 1) Evaluate each new observation against previous observations. 2) Determine if what is being observed is relevant. 3) Delivering this actionable insight to its consumer © 2014 IBM Corporation 55555555 3) Delivering this actionable insight to its consumer … fast enough to do something about it while it is still happening. 4) Doing this with sufficient accuracy and scale to really matter.
  • 56. Uniquely G2 Real “Context Computing” – Complete Context: Contextualize diverse observations, each observation benefiting from others – Current Context: Real-time, incremental integration – Conflicting Context: High tolerance for disagreement, confusion and uncertainty – Self-Correcting Context: New observations able to reverse earlier assertions Engineered ground-up for cloud compute … in support of hemisphere-scale data © 2014 IBM Corporation 56565656 Introduce new data sources (e.g., geospatial), new entity types (e.g., vessels), new features (e.g., MAC addresses) … without schema change/re-engineering From sense to respond in sub-200ms– fast enough to do something about the transaction while it is still happening Unprecedented number of Privacy by Design (PbD) features baked-in
  • 57. Privacy by Design (PbD) 1. Full Attribution 2. Tamper Resistant Audit Log 3. Information Transfer Accounting 4. Data Tethering © 2014 IBM Corporation 57575757 http://jeffjonas.typepad.com/jeff_jonas/2012/06/privacy-by-design-in-the-era-of-big-data.html 4. Data Tethering 5. False Negative Favoring 6. Self-Correcting False Positives 7. Analytics on Anonymized Data
  • 58. Example: Self-Correcting False Positive John T Smith Jr 123 Main Street 703 111-2000 DOB: 03/12/1984 John T Smith 123 Main Street A plausible claim these two people are the same 1 2 John T Smith Sr 123 Main Street Until this record 3 © 2014 IBM Corporation 58585858 Which reveals this is a FALSE POSITIVE 123 Main Street 703 111-2000 DL: 009900991 2 123 Main Street 703 111-2000 DL: 009900991 Until this record comes into view
  • 59. Example: Self-Correcting False Positive John T Smith Jr 123 Main Street 703 111-2000 DOB: 03/12/1984 John T Smith 123 Main Street John T Smith Sr 123 Main Street 1 3 2 © 2014 IBM Corporation 59595959 123 Main Street 703 111-2000 DL: 009900991 123 Main Street 703 111-2000 DL: 009900991 New Best Practice: FIXED IN REAL-TIME (not end of month) John T Smith 123 Main Street 703 111-2000 DL: 009900991 2 2
  • 60. Use Cases Maritime Domain Awareness New system lets authorities track suspicious ships http://www.asiaone.com/print/News/Latest%2BNews/Science%2Band%2BTech/Story/A1Story201 30703-434337.html Voter Registration Modernization © 2014 IBM Corporation 60606060 Voter Registration Modernization David Becker (PEW Charitable Trust) and Jeff Jonas (IBM) Discuss How G2 Has Helped Modernize Voter Registration in America http://ibmreferencehub.com/STG/ibm_executive_edge_2013/#gensession_daytwo_jonasbecker
  • 61. Closing Thoughts © 2014 IBM Corporation 61616161
  • 62. Available Observation Space Context Wish This on the Adversary Enterprise Amnesia ComputingPowerGrowth © 2014 IBM Corporation 62626262 Time Sensemaking Algorithms ComputingPowerGrowth
  • 63. Wish This for Yourself: Better Sensemaking Skills Available Observation Space Context ComputingPowerGrowth © 2014 IBM Corporation 63636363 Time Sensemaking Algorithms ComputingPowerGrowth
  • 64. State of the Union: Isolated Analytics Structured Data Analytics Unstructured Data Analytics © 2014 IBM Corporation 64646464 Observation Space Action Social Network Analytics
  • 65. The Future: General Purpose Context Accumulation Data Finds Data Relevance Finds You This is GThis is GThis is GThis is G2222 © 2014 IBM Corporation 65656565 Observation Space Consumer (An analyst, a system, the sensor itself, etc.) Information In Context
  • 66. The most competitive organizations are going to make sense of what they are observing fast enough to do something about it © 2014 IBM Corporation 66666666 fast enough to do something about it while they are observing it.
  • 67. Related Blog Posts Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel Puzzling: How Observations Are Accumulated Into Context Big Data. New Physics. On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others © 2014 IBM Corporation 67676767 Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food! When Federated Search Bites Data Finds Data Structuring Unstructured Data Fantasy Analytics
  • 68. Questions? © 2014 IBM Corporation 68686868 Email: jeffjonas@us.ibm.com Blog: www.jeffjonas.typepad.com Twitter: http://www.twitter.com/jeffjonas
  • 69. Big Data. New Physics. And Geospatial “Superfood” © 2014 IBM Corporation 69696969 Jeff Jonas,Jeff Jonas,Jeff Jonas,Jeff Jonas, IBM Fellow Chief Scientist, Context Computing Email: jeffjonas@us.ibm.com Blog: www.jeffjonas.typepad.com Twitter: http://www.twitter.com/jeffjonas