SlideShare a Scribd company logo
1 of 16
Speakers
Simon Elliston Ball – Solutions Architect, Hortonworks
Adam Morton – Enterprise Data Architect, Admiral Group plc
• Over 10 years experience in Data Warehousing, Business Intelligence and
Analytics
• Working at Admiral for the past 2 years delivering a greenfield Enterprise Data
Warehouse as part of an overall Data Architecture modernisation programme
The Admiral Group
Admiral Group has grown from a small start up to one of the largest car
insurance providers in the UK with a presence in seven countries.
Our strategy is simple: To continue to progress in the UK Car Insurance market
whilst taking what we do well to new markets and products: keep doing what
we’re doing and do it better year after year.
Admiral – International Operations
Admiral employs more than 7,000 people at its offices in the UK, Spain, Italy, France,
USA, Canada and India.
"People who like what they do, do it better"
R&D at Admiral
• Strong history of using data to drive innovation which needs to be continued
• New function aimed at testing and learning through technology
• Time-boxed iterative efforts of no more than 4-6 weeks
• Fail fast, fail quickly approach; success or failure can end the PoC early
• Understand ‘Big Data’ and trial Hadoop ecosystem projects
Why Telematics?
• Scalability – A product with large potential and potentially huge volumes
• Timeliness - Data & Scoring was processed in batch – how quickly can this be done?
• Granularity - Suppliers provide aggregated data – could map matching be improved?
• Event Notification – Can we respond quickly
to NRT events in the data?
• Data Enrichment - Opportunity to uncover
further insights by integrating with interesting
data sources
Objectives of the Telematics PoC
• Scalability - Prove that data storage and high performance analytics can be
accomplished on large data sets cost effectively
• Timeliness - Reduce scoring time
• Data Enrichment
• NRT data processing – acting on events such as proximity to an airport
• Improve stability and flexibility
• Test the viability of a cloud solution
• Data Visualisation
Technical Challenges – Networking and Security
• Privacy Sensitive
• Third Party Sources
• Real-time data
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
There’s a VPN, it will be fine!
Admiral vNET
Third Party vNET
Telematics
Provider
DC
External
Users
Internal
Users
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka SSL
Admiral vNET
Telematics
Provider
DC
External
Users
Internal
Users
K
SSL
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ingest with NiFi
Admiral vNET
Telematics
Provider
DC
External
Users
Internal
Users
K
HDF
Other
Providers
Other
Providers
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Real-time Scoring
 Clean up done in NiFi
– Basic data correctness
– Format changes
 Fed To Kafka
 Spark Streaming
– NEAR Real time requirement
– Mixing Scala RDD and Data Frames code
– Integrating with map matching library
 Output fed into Kafka
– Kafka to WebSockets bridge for real-time visualization
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Batch Scoring
 More Spark!
 Zeppelin for ease of use, interaction
 Productionized into batch Spark Jobs
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SAS on Hive
 Spark as ETL engine
 Hive for Large Scale processing
 SAS connector using Hive
 ORC as a file format
– Significantly smaller than JSON
– So much faster to process
Technical Challenges – Map Matching
• GPS data is messy
• Open Data sources based on roads
• Nearest road is fast, but not very good
• Hidden Markov Models. Know where you’re going,
and where you’ve been.
• Open source to the rescue…
14
Barefoot – Map Matching
• https://github.com/bmwcarit/barefoot
• Docker based service
• PostGIS map server loaded from OSM data
• Serializable map, distributed in Spark
15
Next Steps
 Completing knowledge transfer workshops with Hortonworks
 How to move from a POC to Production – ready?
 Establishing a in-house R&D function
 Deciding on the tools and frameworks to use within a POC
environment in the future

More Related Content

What's hot

Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...DataWorks Summit/Hadoop Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureDataWorks Summit
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark DataWorks Summit/Hadoop Summit
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionDataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...DataWorks Summit/Hadoop Summit
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceDataWorks Summit/Hadoop Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudDataWorks Summit
 
Spark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleSpark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleMapR Technologies
 

What's hot (20)

Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
 
High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark High Performance Spatial-Temporal Trajectory Analysis with Spark
High Performance Spatial-Temporal Trajectory Analysis with Spark
 
Real-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in ActionReal-Time Robot Predictive Maintenance in Action
Real-Time Robot Predictive Maintenance in Action
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Benefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business IntelligenceBenefits of an Agile Data Fabric for Business Intelligence
Benefits of an Agile Data Fabric for Business Intelligence
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
Spark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleSpark & Hadoop at Production at Scale
Spark & Hadoop at Production at Scale
 

Viewers also liked

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkHortonworks
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFHortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 

Viewers also liked (11)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 

Similar to Admiral Group

First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
CloudCity Working Together Breakfast 9 April 2013
CloudCity Working Together Breakfast 9 April 2013CloudCity Working Together Breakfast 9 April 2013
CloudCity Working Together Breakfast 9 April 2013CollaborationWorks
 
Data-as-a-Service: DataGraft
Data-as-a-Service: DataGraftData-as-a-Service: DataGraft
Data-as-a-Service: DataGraftdapaasproject
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformRackspace
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfTarekHassan840678
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...DataWorks Summit
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataZaloni
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale
 
Oracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewOracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewJinyu Wang
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataMatt Stubbs
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"jstrobl
 

Similar to Admiral Group (20)

First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
IBM Aspera overview
IBM Aspera overview IBM Aspera overview
IBM Aspera overview
 
CloudCity Working Together Breakfast 9 April 2013
CloudCity Working Together Breakfast 9 April 2013CloudCity Working Together Breakfast 9 April 2013
CloudCity Working Together Breakfast 9 April 2013
 
Data-as-a-Service: DataGraft
Data-as-a-Service: DataGraftData-as-a-Service: DataGraft
Data-as-a-Service: DataGraft
 
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data PlatformDeploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
Deploy Apache Spark™ on Rackspace OnMetal™ for Cloud Big Data Platform
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Location-independent SharePoint
Location-independent SharePointLocation-independent SharePoint
Location-independent SharePoint
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
 
RightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to CloudRightScale Roadtrip - Accelerate to Cloud
RightScale Roadtrip - Accelerate to Cloud
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Oracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service OverviewOracle GoldenGate Cloud Service Overview
Oracle GoldenGate Cloud Service Overview
 
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast DataBig Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Admiral Group

  • 1. Speakers Simon Elliston Ball – Solutions Architect, Hortonworks Adam Morton – Enterprise Data Architect, Admiral Group plc • Over 10 years experience in Data Warehousing, Business Intelligence and Analytics • Working at Admiral for the past 2 years delivering a greenfield Enterprise Data Warehouse as part of an overall Data Architecture modernisation programme
  • 2. The Admiral Group Admiral Group has grown from a small start up to one of the largest car insurance providers in the UK with a presence in seven countries. Our strategy is simple: To continue to progress in the UK Car Insurance market whilst taking what we do well to new markets and products: keep doing what we’re doing and do it better year after year.
  • 3. Admiral – International Operations Admiral employs more than 7,000 people at its offices in the UK, Spain, Italy, France, USA, Canada and India. "People who like what they do, do it better"
  • 4. R&D at Admiral • Strong history of using data to drive innovation which needs to be continued • New function aimed at testing and learning through technology • Time-boxed iterative efforts of no more than 4-6 weeks • Fail fast, fail quickly approach; success or failure can end the PoC early • Understand ‘Big Data’ and trial Hadoop ecosystem projects
  • 5. Why Telematics? • Scalability – A product with large potential and potentially huge volumes • Timeliness - Data & Scoring was processed in batch – how quickly can this be done? • Granularity - Suppliers provide aggregated data – could map matching be improved? • Event Notification – Can we respond quickly to NRT events in the data? • Data Enrichment - Opportunity to uncover further insights by integrating with interesting data sources
  • 6. Objectives of the Telematics PoC • Scalability - Prove that data storage and high performance analytics can be accomplished on large data sets cost effectively • Timeliness - Reduce scoring time • Data Enrichment • NRT data processing – acting on events such as proximity to an airport • Improve stability and flexibility • Test the viability of a cloud solution • Data Visualisation
  • 7. Technical Challenges – Networking and Security • Privacy Sensitive • Third Party Sources • Real-time data
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved There’s a VPN, it will be fine! Admiral vNET Third Party vNET Telematics Provider DC External Users Internal Users
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka SSL Admiral vNET Telematics Provider DC External Users Internal Users K SSL
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ingest with NiFi Admiral vNET Telematics Provider DC External Users Internal Users K HDF Other Providers Other Providers
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Real-time Scoring  Clean up done in NiFi – Basic data correctness – Format changes  Fed To Kafka  Spark Streaming – NEAR Real time requirement – Mixing Scala RDD and Data Frames code – Integrating with map matching library  Output fed into Kafka – Kafka to WebSockets bridge for real-time visualization
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Batch Scoring  More Spark!  Zeppelin for ease of use, interaction  Productionized into batch Spark Jobs
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SAS on Hive  Spark as ETL engine  Hive for Large Scale processing  SAS connector using Hive  ORC as a file format – Significantly smaller than JSON – So much faster to process
  • 14. Technical Challenges – Map Matching • GPS data is messy • Open Data sources based on roads • Nearest road is fast, but not very good • Hidden Markov Models. Know where you’re going, and where you’ve been. • Open source to the rescue… 14
  • 15. Barefoot – Map Matching • https://github.com/bmwcarit/barefoot • Docker based service • PostGIS map server loaded from OSM data • Serializable map, distributed in Spark 15
  • 16. Next Steps  Completing knowledge transfer workshops with Hortonworks  How to move from a POC to Production – ready?  Establishing a in-house R&D function  Deciding on the tools and frameworks to use within a POC environment in the future

Editor's Notes

  1. Launched in 1993 Admiral Group is an insurance company based out of Cardiff in the UK. It has grown from a start up to become a household name of car insurance in the UK. Historically the business model has been simple and straightforward; “keep doing what we’re doing and do it better year after year” Admiral adopts a culture which encourages people to innovate and suggest new ways of working; whether through new products, processes or technology. All staff are shareholders and lay claim to a small stake of the company.
  2. Admiral is also the youngest company in the FTSE 100 employing more than 7,000 staff worldwide. Our philosophy at Admiral is that people who like what they do, do it better so we go out of our way to ensure coming to work here is enjoyable. As a result the Admiral Group is consistently being voted in the top 5 of the best places to work in each office it operates in.