SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Introduction to HPCC Systems®
Powered by LexisNexis Risk Solutions
Ignacio Calvo
Senior Software Engineer
07/03/2016
1. A brief history of HPCC
2. Architecture with use case
3. Integration
4. Q&A
Mapflow : Geospatial
LexisNexis : global markets
HPCC : BigData
A brief history of HPCC
Case study
Introduction to HPCC Systems4
For a given X and Y coordinate, calculate
within a specified radius the following :
• Total number of policies
• Total value of policies
Update each record with this information
THE CHALLENGE
Data Flow Oriented Big Data Platform
Introduction to HPCC Systems5
ESP
Middleware
Services
Raw data from
several sources
BatchSubscribersPortal
Thor (data refinery)
• Shared Nothing MPP Architecture
• Commodity Hardware
• Batch ETL and Analytics
ECL
Batch requests for
scoring and analytics • Easy to use • Implicitly Parallel • Compiles to C++
ROXIE (data delivery)
• Shared Nothing MPP Architecture
• Commodity Hardware
• Real-time Indexed Based Query
• Low Latency, Highly Concurrent
and Highly Redundant
Batch Processed
Data
BatchSubscribers
Thor
Thor – The Batch Processing Analytics Engine
Introduction to HPCC Systems6
Raw data
from
several
sources
Reporting
ECL
Batch
reporting
requests
ROXIE
Batch
reporting
requests
Massively Parallel Extract Transform and
Load (ETL) engine
• Built from the ground up as a parallel data
environment
Enables data integration on a scale not
previously available
• Current LexisNexis person data build process
generates 350 billion intermediate results at peak
Suitable for:
• Massive joins/merges
• Massive sorts and transformations
• Any N2 problem
“Identify and catalog all the
stars in the Milky Way galaxy”
BatchSubscribers
Thor
ROXIE – The Real-Time Analytics Delivery Engine
Introduction to HPCC Systems7
Raw data
from
several
sources
Reporting
ECL
Batch
reporting
requests
ROXIE
Batch
reporting
requests
A massively parallel, high throughput,
structured query response engine
Ultra fast due to its read-only nature
Allows indices to be built onto data for
efficient multi-user retrieval of data
Suitable for:
• Volumes of structured queries
• Full text ranked Boolean search
“I want the star Alpha Centauri”
ECL – The Data Flow Oriented Programming Language
BatchSubscribers
Thor
Introduction to HPCC Systems8
Raw data
from
several
sources
Reporting
ECL
Batch
reporting
requests
ROXIE
Batch
reporting
requests
• An easy to use, data-centric programming
language optimized for large-scale data
management and query processing
• Highly efficient — automatically distributes
workload across all nodes.
• Industry analysts: “80% more efficient than C++,
Java and SQL — 1/3 reduction in programmer
time to maintain/enhance existing applications”
• Benchmark against SQL (5 times more efficient)
for code generation
• Automatic parallelization and synchronization
of sequential algorithms for parallel and
distributed processing. Compiles to C++
• Large library of built-in modules to handle
common data manipulation tasks. Can embed /
import : C++, Python, JavaScript, R, Java
Declarative programming language … powerful, extensible,
implicitly parallel, maintainable, complete and homogeneous
Graph viewer
Introduction to HPCC Systems9
A Robust — and Proven — Platform for IoT
Introduction to HPCC Systems10
ROXIE
HPCC Systems Platform
Data Collection
Rules Execution
Alert Delivery
Search
BI
• Real-time indexed based search
• Real-time rules execution
• Alert call back
• Real-time store
• Real-time analytics on
real-time data
• Long term store
• Batch analytics
Distributed Massively Parallel Architecture
Real-time Services
ThorCassandra
Lambda architecture
Introduction to HPCC Systems11
Lambda architecture
Introduction to HPCC Systems12
HPCC: Internet of Things Architecture
Introduction to HPCC Systems13
ROXIE
• REST
• SOAP
• Websocket
• IPv6
• 6LoWPAN
• UDP
• uIP
• DTLS
• MQTT
• CoAP
• ROLL
• XMPP-IoT
• Mihini/M3DA
Thor
Index Updates
• AMQP
• DDS
• LLAP
• LWM2M
• SSI
• IOTDB
• SensorML
• IPSO
• Telehash
• TSMP
• NanoIP
• ONS 2.0
Adapter
Blueberries KiwisFigs BananasGrapes Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
10
9
8
7
6
5
4
3
2
AMT
DATE
Grapes
12.5%
Figs
12.5%
Blueberries
12.5%
Apples
12.5%
Bananas
12.5%
Kiwis
12.5%
Oranges
12.5%
Cherries
12.5%
Good
Fair
Danger
HPCC Systems Technology: Big Data Is Our Core Competency
14
SPEED
• Scales to extreme workloads
quickly and easily
• Increases speed of
development leads to
faster production/delivery
• Improves developer
productivity
Introduction to HPCC Systems
HPCC Systems Technology: Big Data Is Our Core Competency
15
SPEED CAPACITY
• Scales to extreme workloads
quickly and easily
• Increases speed of
development leads to
faster production/delivery
• Improves developer
productivity
• Enables massive joins,
merges, transformations,
sorts, or tough N2 problems
• Increases business
responsiveness
• Accelerates creation of
new services via rapid
prototyping capabilities
• Offers a platform for
collaboration and innovation
leading to better results
Introduction to HPCC Systems
HPCC Systems Technology: Big Data Is Our Core Competency
16
SPEED CAPACITY COST SAVINGS
• Scales to extreme workloads
quickly and easily
• Increases speed of
development leads to
faster production/delivery
• Improves developer
productivity
• Enables massive joins,
merges, transformations,
sorts, or tough N2 problems
• Increases business
responsiveness
• Accelerates creation of
new services via rapid
prototyping capabilities
• Offers a platform for
collaboration and innovation
leading to better results
• Leverages commodity
hardware so fewer people can
do much more in less time
• Uses IT resources efficiently
via sharing and higher system
utilization
• Open source since 2011
Introduction to HPCC Systems
• Grid computing
• Data-centric language (ECL)
• Integrated delivery system that offers data plus analytics
Our Solutions Are Powered by HPCC at Their Core
Introduction to HPCC Systems17
Big
Data
Structured
Records
Unstructured
Records
News
Articles
Proprietary
Data
Public
Records
Unstructured and
Structured Content High Performance Computing Cluster Platform (HPCC) Analysis Applications Key Capabilities
• Over 4 petabytes of content
• 50 billion records
• 20,000 sources
• 8.9 billion unique name and
address combinations
• Multi-bureau/multi-source
models and bureau roll-over
support
• Extensive experience
leveraging atomic level data,
combining and leveraging
disparate data
• Approximately 400 models
deployed (custom and
flagship)
• Data and analytics
• Identity verification and
authentication
• Fraud detection and prevention
• Investigation
• Screening
• Receivables management
Fusion
Linking
Refinery
Financial Services
Government
Health Care
Insurance
Legal
Retail
Open Source Components
Complex Analysis
Clustering Analysis
Link Analysis
Entity Resolution
Example : Understanding People Relations Helps Us Predict Risk
8.9 B
unique name/
address combos
4 B
property
records
37 M
unique
businesses
417 M
criminal
records
269 M
auto and home
claim records
188.5 M
unique
cell phones
16.5 B
consumer
records
3.7 B
motor vehicle
registrations
SSN
xxx-xx-xxxxx
321 High St.
Chicago, IL 60540
2000 – 2013
Mobile Phone
630.555.9876
Boat License
#414567
K.R.
Jones
Kathy
Jones
Kathy R.
Jones
Kathy
Schroeder
Car VIN
#RGSWA04A87B1xxxxx
123 Avenue
San Francisco, CA 94107
2013 – Present
Lived at …
Owns …
Aliases …
Personal info …
Involved in …
DUI Case
#4859xxx-xxx
Felony Indictment
Chicago C#0404-xxx
Bankruptcy
September 12, 2013
Filed for …Loan Application
January 30, 2015
Introduction to HPCC Systems18
Four Petabytes of Information :
• 50 billion records
• 20,000 sources
• Several million records added daily
Example : Understanding People Relations Helps Us Predict Risk
8.9 B
unique name/
address combos
4 B
property
records
37 M
unique
businesses
417 M
criminal
records
269 M
auto and home
claim records
188.5 M
unique
cell phones
16.5 B
consumer
records
3.7 B
motor vehicle
registrations
• Collect largest, broadest,
deepest, most accurate,
up-to-date repository
of public record and
contributory data
• Clean and standardize
the data
• Identify unique entities
using sophisticated
learning techniques
• Create the social
relationships
SSN
xxx-xx-xxxxx
321 High St.
Chicago, IL 60540
2000 – 2013
Mobile Phone
630.555.9876
Boat License
#414567
K.R.
Jones
Kathy
Jones
Kathy R.
Jones
Kathy
Schroeder
Car VIN
#RGSWA04A87B1xxxxx
123 Avenue
San Francisco, CA 94107
2013 – Present
Lived at …
Owns …
Aliases …
Personal info …
Involved in …
DUI Case
#4859xxx-xxx
Felony Indictment
Chicago C#0404-xxx
Bankruptcy
September 12, 2013
Filed for …Loan Application
January 30, 2015
Introduction to HPCC Systems19
Intel Xeon / 16 cores
qsort New merge sort
33M rows 11.464s 1.433s
503M rows 29.9s 24.2s
Power 8 / 160 execution threads
qsort New merge sort
33M rows 26.5s 4.0s
503M rows 120.0s 18.0s
Performance
Integration
• Embed / import : C++, Python,
JavaScript, R, Java
• HDFS to HPCC Connector
• Amazon Web Services (AWS)
• JDBC Driver
Integration : JDBC Driver
Why HPCC?
• Efficient MPP + sub-second queries
• Consistent support, all in one platform
• Scales out to thousands of nodes
• Great learning curve
• Fast development
• Open source since 2011 : Apache 2.0
• Reliable, mature : 10+ years in production
Next steps
• Virtual Machine image
• Online training : vouchers available
• Documentation
• Forum : online community
• External testimonies and use cases
• Meetups
Useful Links
• HPCC Meetups : http://www.meetup.com/HPCC-Dublin-Big-Data
• HPCC Systems: https://hpccsystems.com/
• Community forums: https://hpccsystems.com/bb
• The HPCC Systems blog: https://hpccsystems.com/resources/blog
• Online training: learn.lexisnexis.com/hpcc
• Summit: https://hpccsystems.com/community/events/2015-hpcc-systems-engineering-summit-community-day
• HPCC on YouTube: https://www.youtube.com/user/HPCCSystems/videos
• GitHub: https://github.com/hpcc-systems
• Lambda architecture : http://cdn.hpccsystems.com/whitepapers/Lambda.pdf
• Performance : https://hpccsystems.com/resources/blog/lchapman/look-whats-coming-soon-hpcc-systems-600-beta-2
• JDBC Driver : https://hpccsystems.com/download/third-party-integrations/hpcc-jdbc-driver
• HDFS to HPCC Connector : http://cdn.hpccsystems.com/install/h2h/1.4.4-1/docs/HDFS_to_HPCC_Connector-1.4.4-1.pdf
• HPCC on AWS : https://aws.hpccsystems.com/aws/getting_started/
HPCC Systems - Online Resources25
hpccsystems.com

Weitere ähnliche Inhalte

Was ist angesagt?

Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystemData Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystemDataWorks Summit/Hadoop Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...DataWorks Summit
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopDataWorks Summit
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonDataWorks Summit
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreDataWorks Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 

Was ist angesagt? (20)

Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Data Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystemData Regions: Modernizing your company's data ecosystem
Data Regions: Modernizing your company's data ecosystem
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery ...
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Enabling real interactive BI on Hadoop
Enabling real interactive BI on HadoopEnabling real interactive BI on Hadoop
Enabling real interactive BI on Hadoop
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Ingesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache OrcIngesting Data at Blazing Speed Using Apache Orc
Ingesting Data at Blazing Speed Using Apache Orc
 
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJIntro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
 
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with PythonApache Spark 2.3 boosts advanced analytics and deep learning with Python
Apache Spark 2.3 boosts advanced analytics and deep learning with Python
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Big Data Analytics from Edge to Core
Big Data Analytics from Edge to CoreBig Data Analytics from Edge to Core
Big Data Analytics from Edge to Core
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
Deep learning 101
Deep learning 101Deep learning 101
Deep learning 101
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 

Ähnlich wie HUG Ireland Event - HPCC Presentation Slides

Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Dataconomy Media
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11HPCC Systems
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민NAVER D2
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016INDUSCommunity
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Data
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Dataconomy Media
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowEric Kavanagh
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableTim Case
 

Ähnlich wie HUG Ireland Event - HPCC Presentation Slides (20)

Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
 
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time Analytics
 
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
Calum McCrea, Software Engineer at Kx Systems, "Kx: How Wall Street Tech can ...
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data Now
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Graph Day 2017 Spring Boot
Graph Day 2017 Spring BootGraph Day 2017 Spring Boot
Graph Day 2017 Spring Boot
 
Igniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner CableIgniting Audience Measurement at Time Warner Cable
Igniting Audience Measurement at Time Warner Cable
 

Mehr von John Mulhall

cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptxJohn Mulhall
 
HUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflowsHUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflowsJohn Mulhall
 
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfHUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfJohn Mulhall
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallJohn Mulhall
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran John Mulhall
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016John Mulhall
 
HUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory DatabasesHUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory DatabasesJohn Mulhall
 
HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111John Mulhall
 
HUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slidesHUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slidesJohn Mulhall
 
Periscope Getting Started-2
Periscope Getting Started-2Periscope Getting Started-2
Periscope Getting Started-2John Mulhall
 
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIBAIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIBJohn Mulhall
 
Sonra Intelligence Ltd
Sonra Intelligence LtdSonra Intelligence Ltd
Sonra Intelligence LtdJohn Mulhall
 

Mehr von John Mulhall (13)

cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptx
 
HUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflowsHUGIreland_VincentDeStocklin_DataScienceWorkflows
HUGIreland_VincentDeStocklin_DataScienceWorkflows
 
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdfHUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
HUGIreland_CronanMcNamara_DataScience_ExpertModels.pdf
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John Mulhall
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran HUG_Ireland_Apache_Arrow_Tomer_Shiran
HUG_Ireland_Apache_Arrow_Tomer_Shiran
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
 
HUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory DatabasesHUG Ireland Event Presentation - In-Memory Databases
HUG Ireland Event Presentation - In-Memory Databases
 
HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111HUG_Ireland_BryanQuinnPresentation_20160111
HUG_Ireland_BryanQuinnPresentation_20160111
 
HUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slidesHUG Ireland Event - Dama Ireland slides
HUG Ireland Event - Dama Ireland slides
 
Periscope Getting Started-2
Periscope Getting Started-2Periscope Getting Started-2
Periscope Getting Started-2
 
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIBAIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
AIB's road-to-Real-Time-Analytics - Tommy Mitchell and Kevin McTiernan of AIB
 
Sonra Intelligence Ltd
Sonra Intelligence LtdSonra Intelligence Ltd
Sonra Intelligence Ltd
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

HUG Ireland Event - HPCC Presentation Slides

  • 1. Introduction to HPCC Systems® Powered by LexisNexis Risk Solutions Ignacio Calvo Senior Software Engineer 07/03/2016
  • 2. 1. A brief history of HPCC 2. Architecture with use case 3. Integration 4. Q&A
  • 3. Mapflow : Geospatial LexisNexis : global markets HPCC : BigData A brief history of HPCC
  • 4. Case study Introduction to HPCC Systems4 For a given X and Y coordinate, calculate within a specified radius the following : • Total number of policies • Total value of policies Update each record with this information THE CHALLENGE
  • 5. Data Flow Oriented Big Data Platform Introduction to HPCC Systems5 ESP Middleware Services Raw data from several sources BatchSubscribersPortal Thor (data refinery) • Shared Nothing MPP Architecture • Commodity Hardware • Batch ETL and Analytics ECL Batch requests for scoring and analytics • Easy to use • Implicitly Parallel • Compiles to C++ ROXIE (data delivery) • Shared Nothing MPP Architecture • Commodity Hardware • Real-time Indexed Based Query • Low Latency, Highly Concurrent and Highly Redundant Batch Processed Data
  • 6. BatchSubscribers Thor Thor – The Batch Processing Analytics Engine Introduction to HPCC Systems6 Raw data from several sources Reporting ECL Batch reporting requests ROXIE Batch reporting requests Massively Parallel Extract Transform and Load (ETL) engine • Built from the ground up as a parallel data environment Enables data integration on a scale not previously available • Current LexisNexis person data build process generates 350 billion intermediate results at peak Suitable for: • Massive joins/merges • Massive sorts and transformations • Any N2 problem “Identify and catalog all the stars in the Milky Way galaxy”
  • 7. BatchSubscribers Thor ROXIE – The Real-Time Analytics Delivery Engine Introduction to HPCC Systems7 Raw data from several sources Reporting ECL Batch reporting requests ROXIE Batch reporting requests A massively parallel, high throughput, structured query response engine Ultra fast due to its read-only nature Allows indices to be built onto data for efficient multi-user retrieval of data Suitable for: • Volumes of structured queries • Full text ranked Boolean search “I want the star Alpha Centauri”
  • 8. ECL – The Data Flow Oriented Programming Language BatchSubscribers Thor Introduction to HPCC Systems8 Raw data from several sources Reporting ECL Batch reporting requests ROXIE Batch reporting requests • An easy to use, data-centric programming language optimized for large-scale data management and query processing • Highly efficient — automatically distributes workload across all nodes. • Industry analysts: “80% more efficient than C++, Java and SQL — 1/3 reduction in programmer time to maintain/enhance existing applications” • Benchmark against SQL (5 times more efficient) for code generation • Automatic parallelization and synchronization of sequential algorithms for parallel and distributed processing. Compiles to C++ • Large library of built-in modules to handle common data manipulation tasks. Can embed / import : C++, Python, JavaScript, R, Java Declarative programming language … powerful, extensible, implicitly parallel, maintainable, complete and homogeneous
  • 10. A Robust — and Proven — Platform for IoT Introduction to HPCC Systems10 ROXIE HPCC Systems Platform Data Collection Rules Execution Alert Delivery Search BI • Real-time indexed based search • Real-time rules execution • Alert call back • Real-time store • Real-time analytics on real-time data • Long term store • Batch analytics Distributed Massively Parallel Architecture Real-time Services ThorCassandra
  • 13. HPCC: Internet of Things Architecture Introduction to HPCC Systems13 ROXIE • REST • SOAP • Websocket • IPv6 • 6LoWPAN • UDP • uIP • DTLS • MQTT • CoAP • ROLL • XMPP-IoT • Mihini/M3DA Thor Index Updates • AMQP • DDS • LLAP • LWM2M • SSI • IOTDB • SensorML • IPSO • Telehash • TSMP • NanoIP • ONS 2.0 Adapter Blueberries KiwisFigs BananasGrapes Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 10 9 8 7 6 5 4 3 2 AMT DATE Grapes 12.5% Figs 12.5% Blueberries 12.5% Apples 12.5% Bananas 12.5% Kiwis 12.5% Oranges 12.5% Cherries 12.5% Good Fair Danger
  • 14. HPCC Systems Technology: Big Data Is Our Core Competency 14 SPEED • Scales to extreme workloads quickly and easily • Increases speed of development leads to faster production/delivery • Improves developer productivity Introduction to HPCC Systems
  • 15. HPCC Systems Technology: Big Data Is Our Core Competency 15 SPEED CAPACITY • Scales to extreme workloads quickly and easily • Increases speed of development leads to faster production/delivery • Improves developer productivity • Enables massive joins, merges, transformations, sorts, or tough N2 problems • Increases business responsiveness • Accelerates creation of new services via rapid prototyping capabilities • Offers a platform for collaboration and innovation leading to better results Introduction to HPCC Systems
  • 16. HPCC Systems Technology: Big Data Is Our Core Competency 16 SPEED CAPACITY COST SAVINGS • Scales to extreme workloads quickly and easily • Increases speed of development leads to faster production/delivery • Improves developer productivity • Enables massive joins, merges, transformations, sorts, or tough N2 problems • Increases business responsiveness • Accelerates creation of new services via rapid prototyping capabilities • Offers a platform for collaboration and innovation leading to better results • Leverages commodity hardware so fewer people can do much more in less time • Uses IT resources efficiently via sharing and higher system utilization • Open source since 2011 Introduction to HPCC Systems
  • 17. • Grid computing • Data-centric language (ECL) • Integrated delivery system that offers data plus analytics Our Solutions Are Powered by HPCC at Their Core Introduction to HPCC Systems17 Big Data Structured Records Unstructured Records News Articles Proprietary Data Public Records Unstructured and Structured Content High Performance Computing Cluster Platform (HPCC) Analysis Applications Key Capabilities • Over 4 petabytes of content • 50 billion records • 20,000 sources • 8.9 billion unique name and address combinations • Multi-bureau/multi-source models and bureau roll-over support • Extensive experience leveraging atomic level data, combining and leveraging disparate data • Approximately 400 models deployed (custom and flagship) • Data and analytics • Identity verification and authentication • Fraud detection and prevention • Investigation • Screening • Receivables management Fusion Linking Refinery Financial Services Government Health Care Insurance Legal Retail Open Source Components Complex Analysis Clustering Analysis Link Analysis Entity Resolution
  • 18. Example : Understanding People Relations Helps Us Predict Risk 8.9 B unique name/ address combos 4 B property records 37 M unique businesses 417 M criminal records 269 M auto and home claim records 188.5 M unique cell phones 16.5 B consumer records 3.7 B motor vehicle registrations SSN xxx-xx-xxxxx 321 High St. Chicago, IL 60540 2000 – 2013 Mobile Phone 630.555.9876 Boat License #414567 K.R. Jones Kathy Jones Kathy R. Jones Kathy Schroeder Car VIN #RGSWA04A87B1xxxxx 123 Avenue San Francisco, CA 94107 2013 – Present Lived at … Owns … Aliases … Personal info … Involved in … DUI Case #4859xxx-xxx Felony Indictment Chicago C#0404-xxx Bankruptcy September 12, 2013 Filed for …Loan Application January 30, 2015 Introduction to HPCC Systems18 Four Petabytes of Information : • 50 billion records • 20,000 sources • Several million records added daily
  • 19. Example : Understanding People Relations Helps Us Predict Risk 8.9 B unique name/ address combos 4 B property records 37 M unique businesses 417 M criminal records 269 M auto and home claim records 188.5 M unique cell phones 16.5 B consumer records 3.7 B motor vehicle registrations • Collect largest, broadest, deepest, most accurate, up-to-date repository of public record and contributory data • Clean and standardize the data • Identify unique entities using sophisticated learning techniques • Create the social relationships SSN xxx-xx-xxxxx 321 High St. Chicago, IL 60540 2000 – 2013 Mobile Phone 630.555.9876 Boat License #414567 K.R. Jones Kathy Jones Kathy R. Jones Kathy Schroeder Car VIN #RGSWA04A87B1xxxxx 123 Avenue San Francisco, CA 94107 2013 – Present Lived at … Owns … Aliases … Personal info … Involved in … DUI Case #4859xxx-xxx Felony Indictment Chicago C#0404-xxx Bankruptcy September 12, 2013 Filed for …Loan Application January 30, 2015 Introduction to HPCC Systems19
  • 20. Intel Xeon / 16 cores qsort New merge sort 33M rows 11.464s 1.433s 503M rows 29.9s 24.2s Power 8 / 160 execution threads qsort New merge sort 33M rows 26.5s 4.0s 503M rows 120.0s 18.0s Performance
  • 21. Integration • Embed / import : C++, Python, JavaScript, R, Java • HDFS to HPCC Connector • Amazon Web Services (AWS) • JDBC Driver
  • 23. Why HPCC? • Efficient MPP + sub-second queries • Consistent support, all in one platform • Scales out to thousands of nodes • Great learning curve • Fast development • Open source since 2011 : Apache 2.0 • Reliable, mature : 10+ years in production
  • 24. Next steps • Virtual Machine image • Online training : vouchers available • Documentation • Forum : online community • External testimonies and use cases • Meetups
  • 25. Useful Links • HPCC Meetups : http://www.meetup.com/HPCC-Dublin-Big-Data • HPCC Systems: https://hpccsystems.com/ • Community forums: https://hpccsystems.com/bb • The HPCC Systems blog: https://hpccsystems.com/resources/blog • Online training: learn.lexisnexis.com/hpcc • Summit: https://hpccsystems.com/community/events/2015-hpcc-systems-engineering-summit-community-day • HPCC on YouTube: https://www.youtube.com/user/HPCCSystems/videos • GitHub: https://github.com/hpcc-systems • Lambda architecture : http://cdn.hpccsystems.com/whitepapers/Lambda.pdf • Performance : https://hpccsystems.com/resources/blog/lchapman/look-whats-coming-soon-hpcc-systems-600-beta-2 • JDBC Driver : https://hpccsystems.com/download/third-party-integrations/hpcc-jdbc-driver • HDFS to HPCC Connector : http://cdn.hpccsystems.com/install/h2h/1.4.4-1/docs/HDFS_to_HPCC_Connector-1.4.4-1.pdf • HPCC on AWS : https://aws.hpccsystems.com/aws/getting_started/ HPCC Systems - Online Resources25