SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Apache Arrow @ RMS
March 2019
The World's Leading Catastrophe Risk
Modeling Company
From earthquakes, hurricanes, and floods to terrorism and
infectious diseases, RMS helps financial institutions and
public agencies understand, quantify, and manage risk
3
So what do we actually do?
● Models
○ We have complex models for various types of risk
■ Fire, flood, earthquakes, etc
○ Our customers run our models against their portfolios of risk items (e.g.
properties) to understand financial impact
○ The models produce a lot of data
● Interactive Queries
○ Insurance analysts are similar to data scientists
○ Lots of result data to slice and dice and visualize
○ Low latency analytics on relatively large datasets
■ Too much for a SQL database but not PB scale
4
5
RMS Datastore Stack
Intelligent query parsing, rewriting
and routing.
Cost-based optimizations.
Ability to use different query
engines depending on use case or
size of data set.
6
Query Service 1.0
● Native Query Execution
○ Scala code, using Apache Arrow and Parquet libraries
○ Column-based file readers with projection push-down
○ Row-based query execution
○ Apache Arrow for the type system
● Performance
○ Order of magnitude improvements compared to Spark for some use cases
○ Slower than Spark for other use cases (larger data sets, JOINs, etc)
● SQL Interface
○ Apache Hive for our internal SQL dialect
○ Apache Hive protocol for compatibility with ODBC/JDBC drivers
○ REST API for integration with microservices
7
Query Service Conclusions & Next Steps
● The Query Service was successful
○ Reduced TCO (fewer Spark nodes required)
○ Improved performance for interactive queries
● In my spare time I had been working on an open source project called
DataFusion
○ DataFusion started out as a generic Rust query engine
○ I felt that Rust was much better suited than JVM
○ I learned a lot more about Apache Arrow and the benefits of columnar
processing
● So how could we leverage this at RMS?
○ I donated the initial Rust implementation of Apache Arrow and later donated
DataFusion as well
8
Why Columnar?
99
Row vs Column
Source code available:
https://github.com/andygrove/row-vs-col-rs
Compares:
● Rust Vec<Row>
● Rust Vec<Column>
● Rust Vec<Array> // Apache Arrow
Columnar benefits:
● Cache pipelining
● SIMD (Same instruction, multiple data)
● GPU vectorized processing
(higher is better)
10
Apache Arrow
11
Apache Arrow
● Standardized language-independent columnar memory format
○ for flat and hierarchical data
○ organized for efficient analytic operations on modern hardware
■ Vectorized processing, SIMD, GPU
● Implementations available for many programming languages
○ C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
● Zero-copy interprocess communication
○ IPC metadata defined in flatbuffer format
12
Apache Arrow
● Computational libraries
○ C++ libraries that leverage LLVM (donated by Dremio)
○ NVIDIA CUDA support
● Query Engines
○ Ursa Labs initiative
■ C++ query engine
○ DataFusion
■ Rust query engine
13
Apache Arrow
● 3 years as a top level project
● Project Management Committee (PMC) members work for ...
○ Cloudera, Databricks, DataStax, Dremio, Hortonworks, Looker, MapR, RMS,
RStudio, Salesforce, Twitter, UC Berkeley RISELab, Ursa Labs, WeWork,
Workday
● Committers work for ...
○ Amazon, CERN, Google, IBM
● Also many individual contributors
● Companies providing financial support (via Ursa Labs)
○ nVIDIA, ODSC, RStudio, Two Sigma
Huge overhead converting
between different data formats
and duplicating data.
Zero-copy data access
Exchange metadata and pointers
to Arrow arrays
16
DataFusion
Rust-native in-memory query engine for Apache Arrow
17
Why Rust
● See https://www.rust-lang.org/ for detailed information
● My take
○ Speed of C++ with the safety of Java
○ Memory efficient (no GC)
○ Predictable performance
○ Lower TCO
○ Forces you to think about what you are doing
■ Thread safety has to be explicit
■ Memory management has to be explicit
○ The compiler acts as a peer reviewer … tough but fair
18
DataFusion current functionality
● SQL query planner and optimizer
● Supported SQL features
○ Projection (SELECT)
○ Selection (WHERE)
○ Aggregates (MIN, MAX, SUM)
● Expressions
○ identifiers (column names)
○ Literal values
● Operators
○ Arithmetic (+, -, *, /, %)
○ Comparison (<, <=, =, >=, >, !=, etc)
○ Binary (AND, OR)
19
20
Demo Time
PoC of a Rust-based Query Service using Apache Arrow
2222
Benchmarks!
SELECT
riskitem_occupancyId,
occupancy_occupancyName,
SUM(risk_totalTIV)
FROM
ContractPrimaryRealPropertyView_1234
GROUP BY
riskitem_occupancyId,
occupancy_occupancyName
● Spark
○ Running in local mode
○ Parquet files on local SSD
○ Cached DataFrames
● DataFusion
○ Arrow format “MemTable”
23
Benchmark ResultsEC2 c5.18xlarge instance
72 vCPUs
144 GB
SSD (100 IOPS / 3000 burst)
Data set:
5MM risk items
Wide table (~600 columns)
~16 GB on disk
(higher is better)
24
DataFusion Roadmap
● DataFrame-style API for building logical query plans, as alternative to SQL
● Parallel Query Execution (threads, partitions)
● Support for more data sources (Parquet, JSON)
● More complete SQL support (joins, subqueries, columnar UDFs)
● Distributed Execution
○ Distributed query planner & optimizer
○ Kubernetes & Docker deployment model
○ Apache Flight protocol for streaming data between nodes
Apache Arrow is a “do-ocracy” where the individual contributors get to decide the
roadmap, but here are some things that I am planning on working on
25
Want to contribute?
● Great time to get involved!
○ The code base is still relatively small
■ Core Arrow library is 6k LOC
■ DataFusion is 4k LOC
○ Small number of regular contributors
○ Where to start?
■ https://cwiki.apache.org/confluence/display/ARROW/Rust+JIRA+Dashboard
○ Try adding DataFusion as a crate dependency
Thanks! Questions?
Contact Details
▪ @AndyGrove73
▪ andy.grove@rms.com
▪ https://www.linkedin.com/in/andygrove
Arrow Resources:
▪ @ApacheArrow
▪ https://arrow.apache.org
▪ https://github.com/apache/arrow

Weitere ähnliche Inhalte

Was ist angesagt?

What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
 

Was ist angesagt? (20)

Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
Apache Spark & MLlib
Apache Spark & MLlibApache Spark & MLlib
Apache Spark & MLlib
 
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
From DataFrames to Tungsten: A Peek into Spark's Future @ Spark Summit San Fr...
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Presto
PrestoPresto
Presto
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and REnabling Exploratory Analysis of Large Data with Apache Spark and R
Enabling Exploratory Analysis of Large Data with Apache Spark and R
 
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 

Ähnlich wie Rust & Apache Arrow @ RMS

Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Sînică Alboaie - Programming for cloud computing Flows of asynchronous messages
Sînică Alboaie - Programming for cloud computing Flows of asynchronous messagesSînică Alboaie - Programming for cloud computing Flows of asynchronous messages
Sînică Alboaie - Programming for cloud computing Flows of asynchronous messages
Codecamp Romania
 

Ähnlich wie Rust & Apache Arrow @ RMS (20)

Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
State of the (J)PMML art
State of the (J)PMML artState of the (J)PMML art
State of the (J)PMML art
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
Sînică Alboaie - Programming for cloud computing Flows of asynchronous messages
Sînică Alboaie - Programming for cloud computing Flows of asynchronous messagesSînică Alboaie - Programming for cloud computing Flows of asynchronous messages
Sînică Alboaie - Programming for cloud computing Flows of asynchronous messages
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
OrientDB the database for the web 1.1
OrientDB the database for the web 1.1OrientDB the database for the web 1.1
OrientDB the database for the web 1.1
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 

Kürzlich hochgeladen

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 

Rust & Apache Arrow @ RMS

  • 1. Apache Arrow @ RMS March 2019
  • 2. The World's Leading Catastrophe Risk Modeling Company From earthquakes, hurricanes, and floods to terrorism and infectious diseases, RMS helps financial institutions and public agencies understand, quantify, and manage risk
  • 3. 3 So what do we actually do? ● Models ○ We have complex models for various types of risk ■ Fire, flood, earthquakes, etc ○ Our customers run our models against their portfolios of risk items (e.g. properties) to understand financial impact ○ The models produce a lot of data ● Interactive Queries ○ Insurance analysts are similar to data scientists ○ Lots of result data to slice and dice and visualize ○ Low latency analytics on relatively large datasets ■ Too much for a SQL database but not PB scale
  • 4. 4
  • 5. 5 RMS Datastore Stack Intelligent query parsing, rewriting and routing. Cost-based optimizations. Ability to use different query engines depending on use case or size of data set.
  • 6. 6 Query Service 1.0 ● Native Query Execution ○ Scala code, using Apache Arrow and Parquet libraries ○ Column-based file readers with projection push-down ○ Row-based query execution ○ Apache Arrow for the type system ● Performance ○ Order of magnitude improvements compared to Spark for some use cases ○ Slower than Spark for other use cases (larger data sets, JOINs, etc) ● SQL Interface ○ Apache Hive for our internal SQL dialect ○ Apache Hive protocol for compatibility with ODBC/JDBC drivers ○ REST API for integration with microservices
  • 7. 7 Query Service Conclusions & Next Steps ● The Query Service was successful ○ Reduced TCO (fewer Spark nodes required) ○ Improved performance for interactive queries ● In my spare time I had been working on an open source project called DataFusion ○ DataFusion started out as a generic Rust query engine ○ I felt that Rust was much better suited than JVM ○ I learned a lot more about Apache Arrow and the benefits of columnar processing ● So how could we leverage this at RMS? ○ I donated the initial Rust implementation of Apache Arrow and later donated DataFusion as well
  • 9. 99 Row vs Column Source code available: https://github.com/andygrove/row-vs-col-rs Compares: ● Rust Vec<Row> ● Rust Vec<Column> ● Rust Vec<Array> // Apache Arrow Columnar benefits: ● Cache pipelining ● SIMD (Same instruction, multiple data) ● GPU vectorized processing (higher is better)
  • 11. 11 Apache Arrow ● Standardized language-independent columnar memory format ○ for flat and hierarchical data ○ organized for efficient analytic operations on modern hardware ■ Vectorized processing, SIMD, GPU ● Implementations available for many programming languages ○ C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust. ● Zero-copy interprocess communication ○ IPC metadata defined in flatbuffer format
  • 12. 12 Apache Arrow ● Computational libraries ○ C++ libraries that leverage LLVM (donated by Dremio) ○ NVIDIA CUDA support ● Query Engines ○ Ursa Labs initiative ■ C++ query engine ○ DataFusion ■ Rust query engine
  • 13. 13 Apache Arrow ● 3 years as a top level project ● Project Management Committee (PMC) members work for ... ○ Cloudera, Databricks, DataStax, Dremio, Hortonworks, Looker, MapR, RMS, RStudio, Salesforce, Twitter, UC Berkeley RISELab, Ursa Labs, WeWork, Workday ● Committers work for ... ○ Amazon, CERN, Google, IBM ● Also many individual contributors ● Companies providing financial support (via Ursa Labs) ○ nVIDIA, ODSC, RStudio, Two Sigma
  • 14. Huge overhead converting between different data formats and duplicating data.
  • 15. Zero-copy data access Exchange metadata and pointers to Arrow arrays
  • 16. 16 DataFusion Rust-native in-memory query engine for Apache Arrow
  • 17. 17 Why Rust ● See https://www.rust-lang.org/ for detailed information ● My take ○ Speed of C++ with the safety of Java ○ Memory efficient (no GC) ○ Predictable performance ○ Lower TCO ○ Forces you to think about what you are doing ■ Thread safety has to be explicit ■ Memory management has to be explicit ○ The compiler acts as a peer reviewer … tough but fair
  • 18. 18 DataFusion current functionality ● SQL query planner and optimizer ● Supported SQL features ○ Projection (SELECT) ○ Selection (WHERE) ○ Aggregates (MIN, MAX, SUM) ● Expressions ○ identifiers (column names) ○ Literal values ● Operators ○ Arithmetic (+, -, *, /, %) ○ Comparison (<, <=, =, >=, >, !=, etc) ○ Binary (AND, OR)
  • 19. 19
  • 20. 20
  • 21. Demo Time PoC of a Rust-based Query Service using Apache Arrow
  • 23. 23 Benchmark ResultsEC2 c5.18xlarge instance 72 vCPUs 144 GB SSD (100 IOPS / 3000 burst) Data set: 5MM risk items Wide table (~600 columns) ~16 GB on disk (higher is better)
  • 24. 24 DataFusion Roadmap ● DataFrame-style API for building logical query plans, as alternative to SQL ● Parallel Query Execution (threads, partitions) ● Support for more data sources (Parquet, JSON) ● More complete SQL support (joins, subqueries, columnar UDFs) ● Distributed Execution ○ Distributed query planner & optimizer ○ Kubernetes & Docker deployment model ○ Apache Flight protocol for streaming data between nodes Apache Arrow is a “do-ocracy” where the individual contributors get to decide the roadmap, but here are some things that I am planning on working on
  • 25. 25 Want to contribute? ● Great time to get involved! ○ The code base is still relatively small ■ Core Arrow library is 6k LOC ■ DataFusion is 4k LOC ○ Small number of regular contributors ○ Where to start? ■ https://cwiki.apache.org/confluence/display/ARROW/Rust+JIRA+Dashboard ○ Try adding DataFusion as a crate dependency
  • 26. Thanks! Questions? Contact Details ▪ @AndyGrove73 ▪ andy.grove@rms.com ▪ https://www.linkedin.com/in/andygrove Arrow Resources: ▪ @ApacheArrow ▪ https://arrow.apache.org ▪ https://github.com/apache/arrow