Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
USING MACHINE LEARNING TO CAPTURE DATA MEANING AND WRANGLE IT TO LIBERATE ITS VALUE
GIANTHOMAS VOLPE - HEAD OF EUROPEAN CU...
Alation
Data Discovery & Curation
Across the Enterprise
Spreadsheets
Overwhelming Path through Systems and Documentation
Databases
Hadoop
Wikis
Emails, Chat
Business
Glossary
Sou...
Where is the
finance data?
How do you join
these tables?
How is that KPI
calculated?
Is this field up to
date?
Is this rep...
Modern catalogs enable you to self serve with context
helps you
find recommended
hotels and restaurants for
your vacation
...
Tap into the Knowledge of the Entire Organization
Find and converse with
data experts
Embedded Glossaries
and Data Wikis
R...
Learn about and trust the information you get
Understand who uses the data
See where the data came from
Find annotations a...
Scale across all growing data environments
Search enriched by
popularity
rankings and annotations
Smart Data Documentation...
Trifacta
Self-Service Data Preparation
Before Data Can Be Reported, It Has to Be Prepared
To drive more
value here
Optimizing &
Publishing
Enriching &
Blending
C...
Before Data Can Be Reported, It Has to Be Prepared
To drive more
value here
You have to make this more efficient
20% of th...
Machine Learning at Every Step of the Data Preparation Process
Clean
Structure
Enrich
Validate
PublishDiscover
Business Da...
13
VISUAL INTERACTIVE
Trifacta Self-Service Experience
SCALABLE
PREDICTIVE
Usability Inspiration: Hints of Intelligent Interfaces
Type-ahead uses context
and data to predict your
search term — and ...
VISUAL
data insight and complex manipulation made easy
15
Identifies multiple data
types out of the box…even
on unstructur...
INTERACTIVE
proactive suggestions and visual feedback reducee cycle time
16
Automatically identifies
join keys for blendin...
Alation + Trifacta
Self-Service Data
Democratization
Metadata,
Logs & APIs
Alation + Trifacta = Self-Service Data Democratization
Search, Discover
& Collaborate
Data Wrangling...
Search Alation Catalog from Trifacta
Wrangle Data from the Alation Catalog
Alation Lineage Integration with Trifacta
Alation + Trifacta Demo
INSURANCE ANALYTICS
Claims Analytics,
Risk Modeling
and Loss Forecasting
Claim Analysis Demo
• Calculate paid and incurred claims amounts
• Flatten a development triangle and extrapolate a 5 year...
Preparing Data for Claims Risk Modeling and Loss Forecasting
Clean
Current
Claims
Identify
Claim
Exceptions
Calculate
paid...
USING MACHINE LEARNING TO CAPTURE DATA MEANING AND WRANGLE IT TO LIBERATE ITS VALUE
GIANTHOMAS VOLPE - HEAD OF EUROPEAN CU...
Nächste SlideShare
Wird geladen in …5
×

Using Machine Learning to Capture Data Meaning and Wrangle it to Liberate its Value

3.413 Aufrufe

Veröffentlicht am

There’s no doubt that everyone wants to access data to be better in their operations, decision making and innovate to make a difference. The good news is that data exists, it’s everywhere indeed. The bad news is that one can’t figure out what data is available, what it means, and if it is trustworthy. If you’re lucky enough to solve this first challenge, then you’ll face disparate formats, incompatible relationships, and various level of quality that will prevent you in turning data into its real insight and value for you.
Gianthomas Volpe from Alation and Bertrand Cariou from Trifacta will explain how they have solved these incredibly complex challenges using applied machine learning in their solutions and leveraging the unlimited scaling capabilities of Hadoop

Veröffentlicht in: Technologie

Using Machine Learning to Capture Data Meaning and Wrangle it to Liberate its Value

  1. 1. USING MACHINE LEARNING TO CAPTURE DATA MEANING AND WRANGLE IT TO LIBERATE ITS VALUE GIANTHOMAS VOLPE - HEAD OF EUROPEAN CUSTOMER DEVELOPMENT – ALATION BERTRAND CARIOU - SENIOR DIRECTOR SOLUTIONS - TRIFACTA
  2. 2. Alation Data Discovery & Curation Across the Enterprise
  3. 3. Spreadsheets Overwhelming Path through Systems and Documentation Databases Hadoop Wikis Emails, Chat Business Glossary Source Code BI Tools
  4. 4. Where is the finance data? How do you join these tables? How is that KPI calculated? Is this field up to date? Is this report broken? Who is an expert on this? What does that mean? Is there an approved query? Is it ok to share this data? Complexity Leads to Tribal Knowledge and Silos
  5. 5. Modern catalogs enable you to self serve with context helps you find recommended hotels and restaurants for your vacation helps you learn about your entire professional network Learn from a broader Community Trust the Information You Find Scale with Growing Data Volumes helps you quickly find the most relevant pages on the internet
  6. 6. Tap into the Knowledge of the Entire Organization Find and converse with data experts Embedded Glossaries and Data Wikis Receive usage-based recommendations
  7. 7. Learn about and trust the information you get Understand who uses the data See where the data came from Find annotations and flags
  8. 8. Scale across all growing data environments Search enriched by popularity rankings and annotations Smart Data Documentation Suggestions Automated Catalog Creation and Updates
  9. 9. Trifacta Self-Service Data Preparation
  10. 10. Before Data Can Be Reported, It Has to Be Prepared To drive more value here Optimizing & Publishing Enriching & Blending Cleaning 80% of the time spent 20% StructuringDiscovery Data You have to make this more efficient
  11. 11. Before Data Can Be Reported, It Has to Be Prepared To drive more value here You have to make this more efficient 20% of the time spent 20% Optimizing & Publishing Enriching & Blending CleaningStructuringDiscovery Data Applied Machine Learning to User Experience
  12. 12. Machine Learning at Every Step of the Data Preparation Process Clean Structure Enrich Validate PublishDiscover Business Data Reporting & modeling Machine Generated Third-party data Data Analyst
  13. 13. 13 VISUAL INTERACTIVE Trifacta Self-Service Experience SCALABLE PREDICTIVE
  14. 14. Usability Inspiration: Hints of Intelligent Interfaces Type-ahead uses context and data to predict your search term — and preview results
  15. 15. VISUAL data insight and complex manipulation made easy 15 Identifies multiple data types out of the box…even on unstructured data *Potter’s Wheel: An Interactive Data Cleaning System – Raman, Hellerstein; University of California, Berkeley (2001) Identifies outliers that take time to find and skew your results Groups records based on patterns in the data to harmonize datasets
  16. 16. INTERACTIVE proactive suggestions and visual feedback reducee cycle time 16 Automatically identifies join keys for blending multiple data sets Proactively suggests data preparation tasks based on interaction *10X faster results found in Research *Predictive Interaction for Data Transformation – Heer, Hellerstein, Kandel; Stanford University & University of California, Berkeley (2015) Previews results to immediately see the impact & accuracy of transformations
  17. 17. Alation + Trifacta Self-Service Data Democratization
  18. 18. Metadata, Logs & APIs Alation + Trifacta = Self-Service Data Democratization Search, Discover & Collaborate Data Wrangling Analysis & Consumption Data lakes bring tremendous value to organizations Trifacta + Alation allows for an open and integrated stack for wrangling, discovery and governance
  19. 19. Search Alation Catalog from Trifacta
  20. 20. Wrangle Data from the Alation Catalog
  21. 21. Alation Lineage Integration with Trifacta
  22. 22. Alation + Trifacta Demo INSURANCE ANALYTICS Claims Analytics, Risk Modeling and Loss Forecasting
  23. 23. Claim Analysis Demo • Calculate paid and incurred claims amounts • Flatten a development triangle and extrapolate a 5 years payment • Claim Reporting – Report claim difference between prior and current year, paid and outstanding amounts for each LOB, etc. • Claim Exceptions – Closed claims – New Claims – Missing Claims 23
  24. 24. Preparing Data for Claims Risk Modeling and Loss Forecasting Clean Current Claims Identify Claim Exceptions Calculate paid and incurred delta Calculate the development factor Extrapolate Payment for 5 years Automate
  25. 25. USING MACHINE LEARNING TO CAPTURE DATA MEANING AND WRANGLE IT TO LIBERATE ITS VALUE GIANTHOMAS VOLPE - HEAD OF EUROPEAN CUSTOMER DEVELOPMENT – ALATION BERTRAND CARIOU - SENIOR DIRECTOR SOLUTIONS - TRIFACTA

×