Nick Patience, Director Product Marketing & Strategy at Recommind - Big Data: The role of The CIO in dealing with large amounts of unstructured information
Ähnlich wie Nick Patience, Director Product Marketing & Strategy at Recommind - Big Data: The role of The CIO in dealing with large amounts of unstructured information
Ähnlich wie Nick Patience, Director Product Marketing & Strategy at Recommind - Big Data: The role of The CIO in dealing with large amounts of unstructured information (20)
Gordon Tredgold, SVP Global IT at Henkel - Fast Leadership - Accelerating Pro...
Nick Patience, Director Product Marketing & Strategy at Recommind - Big Data: The role of The CIO in dealing with large amounts of unstructured information
1. BIG DATA: THE ROLE OF THE CIO IN DEALING WITH
LARGE AMOUNTS OF UNSTRUCTURED INFORMATION
Nick Patience, Director, product marketing & strategy
March 19 2013
RECOMMIND PROPRIETARY & CONFIDENTIAL | 1
2. ABOUT RECOMMIND
Founded in 2001
450+ employees
Recognized leader by top analyst
firms
˗ Gartner Magic Quadrant Leader
˗ IDC MarketScape Leader
Offices in San Francisco, Boston,
NYC, London, Bonn & Sydney
RECOMMIND PROPRIETARY & CONFIDENTIAL | 2
3. WHAT WE DO…
Software solutions &
infrastructure for large-
volume unstructured
information management
and analysis
RECOMMIND PROPRIETARY & CONFIDENTIAL | 3
4. PRODUCTS AND SOLUTIONS
VERTICAL
MARKETS
ENTERPRISE
APPLICATIONS 3rd
Solutions
Party
NoSQL ENRICHED
ANALYTICS
CORE DATABASE INDEX
PLATFORM
ENTERPRISE
DATA
Databases Machine Office System Social ESI
Email Web XML
Data Documents Logs Media
RECOMMIND PROPRIETARY & CONFIDENTIAL | 4
6. AGENDA
Big Data and the importance of analysing both
structured and unstructured information
Role of the CIO in helping to alleviate risk &
compliance issues within the enterprise
How to categorise, find, manage and analyse
information from disparate repositories into one
overarching platform
RECOMMIND PROPRIETARY & CONFIDENTIAL | 6
7. BIG DATA RISKS &
OPPORTUNITIES
AND WHAT YOU CAN DO TO AVOID ONE AND EMBRACE THE OTHER
RECOMMIND PROPRIETARY & CONFIDENTIAL | 7
8. BIG DATA DEFINITION
“ Data that exceeds the processing
capacity of conventional database
systems. The data is too big, moves too
fast, or doesn’t fit the structures of your
database architectures.
”
Source: Edd Dumbill, Forbes
RECOMMIND PROPRIETARY & CONFIDENTIAL | 8
9. VOLUME, VARIETY AND VELOCITY
FURTHER DEFINE BIG DATA
VOLUME (Petabytes)
>3500 >2000
North America Europe >250
China >400
Japan
>50
>200 >50
Middle East India
Latin America
VARIETY VELOCITY
PEOPLE PEOPLE MACHINE TO 2.9 MILLION 20 HOURS 50 MILLION
TO PEOPLE TO MACHINE MACHINE Emails sent every Of video uploaded Tweets per
Email, social Medical devices, Sensors, GPS, bar second every minute day
networks, blogs ecommerce, bank code scanners
transactions Source: McKinsey, comScore, Radicati
RECOMMIND PROPRIETARY & CONFIDENTIAL | 9
10. TRADITIONAL VS. BIG DATA
TRADITIONAL DATA BIG DATA
Gigabytes to Terabytes Petabytes to Exabytes
Centralised Distributed
Structured Unstructured
Known Relationships Complex, Undefined Interrelationships
RECOMMIND PROPRIETARY & CONFIDENTIAL | 10
11. MASSIVE GROWTH IN UNSTRUCTURED CONTENT
Worldwide Corporate Data Growth
80% of Data Growth is Unstructured
45,000
40,000
35,000
Exabytes
30,000
25,000
20,000
15,000
10,000
5,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Source: IDC
The Digital Universe, Dec 2012
Structured Data Unstructured Data
RECOMMIND PROPRIETARY & CONFIDENTIAL | 11
12. WHAT BIG DATA IS NOT…
NOW
THEN
Doing what you did before
- just scaled up
RECOMMIND PROPRIETARY & CONFIDENTIAL | 12
13. BUT…
You can do things now you could not do
a few years ago because of:
• New analytics techniques
• Faster, more powerful computers
• New architectures, such as the cloud
RECOMMIND PROPRIETARY & CONFIDENTIAL | 13
14. AGENDA
Big Data and the importance of analysing both
structured and unstructured information
Role of the CIO in helping to alleviate risk &
compliance issues within the enterprise
How to categorise, find, manage and analyse
information from disparate repositories into one
overarching platform
RECOMMIND PROPRIETARY & CONFIDENTIAL | 14
16. INFORMATION VALUE DECLINES OVER
TIME, COST AND RISK DO NOT
Risk-value
delta
Cost-value
delta
RECOMMIND PROPRIETARY & CONFIDENTIAL | 16
17. SPECIFIC BIG DATA RISKS
• Unmanaged file servers can pose legal and
compliance risks
• Unmanaged information represents a risk because it
makes it hard to find
• When litigation occurs, if information cannot be
found, organisations may ultimately face court sanctions
• Compliance risks - sensitive client information may reside
on servers that are not managed may be misused, lost or
even destroyed
RECOMMIND PROPRIETARY & CONFIDENTIAL | 17
19. BIG DATA MARKET OPPORTUNITIES BY
INDUSTRY
Source: Gartner
RECOMMIND PROPRIETARY & CONFIDENTIAL | 19
20. BIG DATA INVESTMENTS BY INDUSTRY
This Year Next Year Within 2 Years
17%
21% 15%
29% 11%
15%
17% 21%
15% 9%
11% 18%
17% 18%
12% 20%
18% 18%
12% 8%
39% 36% 36%
29% 31%
25% 21% 22% 23% 23%
Source: Gartner
RECOMMIND PROPRIETARY & CONFIDENTIAL | 20
21. BIG DATA ANALYTICS - OPPORTUNITIES
Recommendation Sentiment Marketing Campaign Fraud
Engines Analysis Analysis Analytics
Match and recommend Determine the how Improve accuracy of Identify fraudulent activity
users to one another or to consumers feel about forecasting, prediction of and stolen credit cards
products and services by particular buyer behavior by through active monitoring
understanding profiles and companies, brands or reviewing increasingly of customer
buyer behavior. products from Tweets and granular data, click behavior, historical and
Facebook posts. streams, call details. transaction data.
RECOMMIND PROPRIETARY & CONFIDENTIAL | 21
22. FRAUD, WASTE AND ABUSE (FWA) IN
HEALTHCARE
According to a 2010 white paper by the US National Health Care
Anti-Fraud Association (NHCAA)
The US Federal Bureau of Investigation (FBI) estimates that 3-
10% of $2.34 trillion spent on healthcare in 2008 was lost to fraud
Represents $70-$234 billion annually
$234 billion is roughly equivalent to the gross domestic product
(GDP) of Finland
Source: http://www.nhcaa.org/media/5994/whitepaper_oct10.pdf
RECOMMIND PROPRIETARY & CONFIDENTIAL | 22
23. BIG DATA ANALYTICS - OPPORTUNITIES
Customer Network Contract Patent
Churn Management Analysis Analysis
Evaluate customer Ingest data from Mine large volumes of Comb through enormous
behavior to identify servers, storage devices transactional data and volumes of text-based
patterns that indicate and other hardware to documentation to information and prior art to
which customers are most monitor network determine risk and assist in the development
likely to leave for a activity, diagnose exposure of financial of new products, guide
competing vendor. bottlenecks. assets. portfolio strategies.
RECOMMIND PROPRIETARY & CONFIDENTIAL | 23
24. MITIGATING BIG DATA RISKS
THROUGH DEFENSIBLE DELETION
RECOMMIND PROPRIETARY & CONFIDENTIAL | 24
25. THE LIFECYCLE OF DATA & DEFENSIBLE DELETION
RECOMMIND PROPRIETARY & CONFIDENTIAL | 25
26. DEFENSIBLE DELETION: PIPE DREAM OR
REALITY?
• Survey by Enterprise Strategy Group in Q4 2012
• 253 business and IT professionals familiar with their organisation’s data
disposition policies (all organisations currently dispose of data)
- 36% IT professionals
- 64% business professionals
• Midmarket (100 to 999 employees) and enterprise-class (1,000+
employees) organisations
- 32% midmarket
- 68% enterprise-class
• Multiple verticals
RECOMMIND PROPRIETARY & CONFIDENTIAL | 26
35. PERCENT OF TOTAL AMOUNT OF DATA
DISPOSED OF ANNUALLY
On average, what percentage of your organisation’s total amount of data do you
estimate is disposed on an annual basis? (N=171)
40%
37%
35%
30%
25%
25%
20%
16%
15%
11%
10%
9%
5%
2%
0%
Less than 5% 5% to 10% 11% to 15% 16% to 20% 21% to 25% More than 25%
RECOMMIND PROPRIETARY & CONFIDENTIAL | 35
37. AGENDA
Big Data and the importance of analysing both
structured and unstructured information
Role of the CIO in helping to alleviate risk &
compliance issues within the enterprise
How to categorise, find, manage and analyse
information from disparate repositories into one
overarching platform
RECOMMIND PROPRIETARY & CONFIDENTIAL | 37
38. CUSTOMER CASE STUDY –
US DEPT. OF ENERGY
RECOMMIND PROPRIETARY & CONFIDENTIAL | 38
39. CASE STUDY – US DEPARTMENT OF ENERGY
The Challenge
We have thousands of users
generating many records a day. How
do we manage this information like an
asset so that it can be useful and we
comply with the government’s records
management mandate?
RECOMMIND PROPRIETARY & CONFIDENTIAL | 39
40. EMAIL & RECORDS MANAGEMENT AT US
DEPARTMENT OF ENERGY
Need to preserve history
Importance of vital records for continuity of operations if
emergencies arise
Need to provide copies of records for legal actions or FOIA
legal requests
Lack of motivation to categorize content
RECOMMIND PROPRIETARY & CONFIDENTIAL | 40
41. AUTOMATIC CATEGORISATION APPROACH
Auto
Categorization
Uncategorized
Journaling Content
Drop-Off Library
Categorized
Content
Organizer Categorized
Mov
e
Site 1 Site 2
Site 3 Site 4
RECOMMIND PROPRIETARY & CONFIDENTIAL | 41
42. IMPACT
Requires one system administrator/engineer &
two people to manage the electronic records
center – for 1,000 users
End users more productive due to no longer
having to categorize content
RECOMMIND PROPRIETARY & CONFIDENTIAL | 42
43. EMAIL CATEGORISATION ACCURACY
100%
80%
60%
Accuracy Average
B a s e d
R u l e -
40%
86%
20%
0%
Administrative Notices
Budget Records Customer Service
IT Management Improvement
Procurement Records
Travel Records
RECOMMIND PROPRIETARY & CONFIDENTIAL | 43
44. SIMILAR USE CASES
Email compliance in financial services
˗ Email archiving capture emails from target employees
˗ Random sampling & manual review of emails
˗ Automatic sampling, initial review & assignment to senior reviewers is more cost and time
efficient, accurate & defensible
Predictive coding in e-Discovery
˗ Predictive Sampling to estimate the percentage of responsive documents
˗ Predictive Analytics (Concepts, Phrases, Smart Filters) to find potentially relevant documents
˗ Complete iterative cycle until zero documents are computer-suggested or responsive
˗ Use Predictive Sampling to QC the non-reviewed documents
Predictive modelling in healthcare:
˗ Find at risk patients using guided data mining against a pre-built, validated predictive model for a
specific issue such as hospital acquired conditions
˗ Predict the patients who should be isolated upon arrival, and the most reliable approach to
screening
RECOMMIND PROPRIETARY & CONFIDENTIAL | 44
45. DIFFERENT USE CASES, DIFFERENT ROIs
• Predictive Analytics
Optimize • Operational Efficiencies
Value • Business Intelligence
Lower
Costs
• Storage Management
• Personnel optimization
• Operational efficiencies Minimize
Risk
• Security Breaches
• eDiscovery Costs
• Data Leakage
• Regulatory Inquiries
RECOMMIND PROPRIETARY & CONFIDENTIAL | 45
46. CUSTOMER CASE STUDY –
SWISS RE INSURANCE
RECOMMIND PROPRIETARY & CONFIDENTIAL | 46
47. SWISS RE - ACCESS, COMPLIANCE & EDISCOVERY
100s of TB data
Index once, use many
True 360 degree view of enterprise data
Based on CORE platform
Custom-built apps
NoSQL ENRICHED
ANALYTICS INDEX
DATABASE
RECOMMIND PROPRIETARY & CONFIDENTIAL | 47
48. PRODUCTS AND SOLUTIONS
VERTICAL
MARKETS
ENTERPRISE
APPLICATIONS 3rd
Solutions
Party
NoSQL ENRICHED
ANALYTICS
CORE DATABASE INDEX
PLATFORM
ENTERPRISE
DATA
Databases Machine Office System Social ESI
Email Web XML
Data Documents Logs Media
RECOMMIND PROPRIETARY & CONFIDENTIAL | 48
49. WHAT MAKES CORE UNIQUE?
Powerful and scalable indexing and retrieval
FIND Keyword and language-agnostic machine learning
Unstructured information “joins”
CONNECT
Unstructured data extraction & analytics
ANALYSE
Delivers ability to confidently act on data
ACT
RECOMMIND PROPRIETARY & CONFIDENTIAL | 49
50. SUMMARY
Big Data and the importance of analysing both structured and
unstructured information
˗ What it is
˗ What it is not
˗ Risks & opportunities
Role of the CIO in helping to alleviate risk & compliance issues
within the enterprise
˗ Defensible deletion
˗ Categorization – US Dept of Energy
How to categorise, find, manage and analyse information from
disparate repositories into one overarching platform
˗ CORE platform
˗ Swiss Re
RECOMMIND PROPRIETARY & CONFIDENTIAL | 50
Unmanaged file servers can pose legal and compliance risks and may cause vast disruption if a company is asked to produce documents to a court or for an internal enquiry and this is an essential starting point in that process. Unmanaged information also represents a risk because it makes it hard to find information – particularly when much of that information is unstructuredWhen litigation occurs, if information cannot be found, organisations may ultimately face court sanctions – or settle on punitive termsCompliance risks - sensitive client information may reside on servers that are not managed may be misused, lost or even destroyed.
Who is answering this - ?
How do we Preserve email in accordance with Departments records management policies and regulatory requirementsUp to 10k users with thousands of emails a day who’s emails are not being classifiedData everywhereOver 50,000 emails received a dayInconsistency classification when done manuallyLack of ownershipRequirementsEasy for employees to useComplies with Departments RM policiesAutomatically categorizes recordsOperates within the Departments information architectureIs easily modifiable to meet future needs