SlideShare a Scribd company logo
1 of 61
Download to read offline
Abstract
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua
franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing
robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms.
In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to
"run-anything-anywhere".
This talk will briefly cover the capabilities of the Beam model for data processing, as well as the
current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability
layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and
reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running
on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services,
Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the
challenges Beam aims to address in the future.
This session is a (Intermediate) talk in our IoT and Streaming track. It focuses on Apache Flink,
Apache Kafka, Apache Spark, Cloud, Other and is geared towards Architect, Data Scientist, Data
Analyst, Developer / Engineer, Operations / IT audiences.
Realizing the promise of
portable data processing
with Apache Beam
Davor Bonaci
PMC Chair, Apache Beam
Senior Software Engineer, Google Inc.
Apache Beam: Open Source data processing APIs
● Expresses data-parallel batch and streaming
algorithms using one unified API
● Cleanly separates data processing logic
from runtime requirements
● Supports execution on multiple distributed
processing runtime environments
Apache Beam is
a unified programming model
designed to provide
efficient and portable
data processing pipelines
Agenda
1. Road to the first stable release
2. Expressing data-parallel pipelines with the Beam model
3. The Beam vision for portability
a. Parallel and portable pipelines in practice
4. Extensibility to integrate the entire Big Data ecosystem
Apache Beam at DataWorks Summit
● Realizing the promise of portable data processing with Apache Beam
○ Speaker: Davor Bonaci, Google
○ Wednesday @ 11:30 am
● Stateful processing of massive out-of-order streams with Apache Beam
○ Speaker: Kenneth Knowles, Google
○ Wednesday @ 3:00 pm
● Birds-of-a-feather: IoT, Streaming and Data Flow
○ Panel: Yolanda Davis, Davor Bonaci, P. Taylor Goetz, Sriharsha Chintalapani,
and Joseph Nimiec
○ Thursday @ 5:00 pm
Road to the
first stable release
State of the project
What we accomplished so far?
02/01/2016
Enter Apache
Incubator
5/16/2017
First stable
release
Early 2016
Design for use cases,
begin refactoring
Late 2016
Community growth
Early 2017
API stabilization
06/14/2016
1st incubating
release
01/10/2017
Graduation as a
top-level project
Announcing the first stable release (5/16/17)
Expressing
data-parallel pipelines
with the Beam model
A unified model for batch and
streaming
Processing time vs. event time
The Beam Model: asking the right questions
What results are calculated?
Where in event time are results calculated?
When in processing time are results materialized?
How do refinements of results relate?
PCollection<KV<String, Integer>> scores = input
.apply(Sum.integersPerKey());
The Beam Model: What is being computed?
The Beam Model: What is being computed?
PCollection<KV<String, Integer>> scores = input
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))
.apply(Sum.integersPerKey());
The Beam Model: Where in event time?
The Beam Model: Where in event time?
PCollection<KV<String, Integer>> scores = input
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.triggering(AtWatermark()))
.apply(Sum.integersPerKey());
The Beam Model: When in processing time?
The Beam Model: When in processing time?
PCollection<KV<String, Integer>> scores = input
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))
.triggering(AtWatermark()
.withEarlyFirings(
AtPeriod(Duration.standardMinutes(1)))
.withLateFirings(AtCount(1)))
.accumulatingFiredPanes())
.apply(Sum.integersPerKey());
The Beam Model: How do refinements relate?
The Beam Model: How do refinements relate?
Customizing What Where When How
3
Streaming
4
Streaming
+ Accumulation
1
Classic
Batch
2
Windowed
Batch
The Beam vision for
portability
Write once,
run anywhere“
”
Beam Vision: mix and match SDKs and runtimes
● The Beam Model: the abstractions
at the core of Apache Beam
Runner 1 Runner 3Runner 2
● Choice of SDK: Users write their
pipelines in a language that’s
familiar and integrated with their
other tooling
● Choice of Runners: Users choose
the right runtime for their current
needs -- on-prem / cloud, open
source / not, fully managed / not
● Scalability for Developers: Clean
APIs allow developers to contribute
modules independently
The Beam Model
Language A Language CLanguage B
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
● Beam’s Java SDK runs on multiple
runtime environments, including:
○ Apache Apex
○ Apache Spark
○ Apache Flink
○ Google Cloud Dataflow
○ [in development] Apache Gearpump
● Cross-language infrastructure is in
progress.
○ Beam’s Python SDK currently runs
on Google Cloud Dataflow
Beam Vision: as of June 2017
Beam Model: Fn Runners
Apache
Spark
Cloud
Dataflow
Beam Model: Pipeline Construction
Apache
Flink
Java
Java
Python
Python
Apache
Apex
Apache
Gearpump
Example Beam Runners
Apache Spark
● Open-source
cluster-computing
framework
● Large ecosystem of
APIs and tools
● Runs on premise or in
the cloud
Apache Flink
● Open-source
distributed data
processing engine
● High-throughput and
low-latency stream
processing
● Runs on premise or in
the cloud
Google Cloud Dataflow
● Fully-managed service
for batch and stream
data processing
● Provides dynamic
auto-scaling,
monitoring tools, and
tight integration with
Google Cloud
Platform
How to think about Apache Beam?
How do you build an abstraction layer?
Apache
Spark
Cloud
Dataflow
Apache
Flink
????????
????????
Beam: the intersection of runner functionality?
Beam: the union of runner functionality?
Beam: the future!
Categorizing Runner Capabilities
https://beam.apache.org/
documentation/runners/capability-matrix/
Parallel and portable
pipelines in practice
A Use Case
Getting Started with Apache Beam
Quickstarts
● Java SDK
● Python SDK
Example walkthroughs
● Word Count
● Mobile Gaming
Extensive documentation
Extensibility to integrate the
entire Big Data ecosystem
Integrating
Up, Down, and
Sideways
“
”
Extensibility points
● Software Development Kits (SDKs)
● Runners
● Domain-specific extensions (DSLs)
● Libraries of transformations
● IOs
● File systems
Software Development Kits (SDKs)
Runner 1 Runner 3Runner 2
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
Runners
Runner 1 Runner 3Runner 2
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
Domain-specific extensions (DSLs)
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
DSL 2 DSL 3DSL 1
Libraries of transformations
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
Library 2 Library 3Library 1
IO connectors
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
IO
connector
2
IO
connector
3
IO
connector
1
File systems
The Beam Model
Language A
SDK
Language C
SDK
Language B
SDK
File system
2
File system
3
File system
1
Ecosystem integration
● I have an engine
→ write a Beam runner
● I want to extend Beam to new languages
→ write an SDK
● I want to adopt an SDK to a target audience
→ write a DSL
● I want a component can be a part of a bigger data-processing pipeline
→ write a library of transformations
● I have a data storage or messaging system
→ write an IO connector or a file system connector
Apache Beam is
a glue that integrates
the big data ecosystem
Learn more and get involved!
Apache Beam
https://beam.apache.org
Join the Beam mailing lists!
user-subscribe@beam.apache.org
dev-subscribe@beam.apache.org
Follow @ApacheBeam on Twitter
Apache Beam is
a unified programming model
designed to provide
efficient and portable
data processing pipelines
Still coming up...
● Stateful processing of massive out-of-order streams with Apache Beam
○ Speaker: Kenneth Knowles, Google
○ Wednesday @ 3:00 pm
● Birds-of-a-feather: IoT, Streaming and Data Flow
○ Panel: Yolanda Davis, Davor Bonaci, P. Taylor Goetz, Sriharsha Chintalapani,
and Joseph Nimiec
○ Thursday @ 5:00 pm

More Related Content

What's hot

Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkDataWorks Summit/Hadoop Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Databricks
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentDataWorks Summit/Hadoop Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 

What's hot (20)

Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 

Similar to Realizing the promise of portable data processing with Apache Beam

Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...DataWorks Summit
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Malo Denielou
 
Realizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache BeamRealizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache BeamJ On The Beach
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beamconfluent
 
System to generate speech to text in real time
System to generate speech to text in real timeSystem to generate speech to text in real time
System to generate speech to text in real timeSaptarshi Chatterjee
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...Flink Forward
 
Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architectureArangoDB Database
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...DataWorks Summit/Hadoop Summit
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
Minko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSLMinko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSLMinko3D
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
 

Similar to Realizing the promise of portable data processing with Apache Beam (20)

Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...Portable batch and streaming pipelines with Apache Beam (Big Data Application...
Portable batch and streaming pipelines with Apache Beam (Big Data Application...
 
Realizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache BeamRealizing the promise of portability with Apache Beam
Realizing the promise of portability with Apache Beam
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
System to generate speech to text in real time
System to generate speech to text in real timeSystem to generate speech to text in real time
System to generate speech to text in real time
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
Flink Forward Berlin 2018: Robert Bradshaw & Maximilian Michels - "Universal ...
 
Microservice-based software architecture
Microservice-based software architectureMicroservice-based software architecture
Microservice-based software architecture
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Minko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSLMinko - Targeting Flash/Stage3D with C++ and GLSL
Minko - Targeting Flash/Stage3D with C++ and GLSL
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Realizing the promise of portable data processing with Apache Beam

  • 1. Abstract The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere". This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future. This session is a (Intermediate) talk in our IoT and Streaming track. It focuses on Apache Flink, Apache Kafka, Apache Spark, Cloud, Other and is geared towards Architect, Data Scientist, Data Analyst, Developer / Engineer, Operations / IT audiences.
  • 2. Realizing the promise of portable data processing with Apache Beam Davor Bonaci PMC Chair, Apache Beam Senior Software Engineer, Google Inc.
  • 3. Apache Beam: Open Source data processing APIs ● Expresses data-parallel batch and streaming algorithms using one unified API ● Cleanly separates data processing logic from runtime requirements ● Supports execution on multiple distributed processing runtime environments
  • 4. Apache Beam is a unified programming model designed to provide efficient and portable data processing pipelines
  • 5. Agenda 1. Road to the first stable release 2. Expressing data-parallel pipelines with the Beam model 3. The Beam vision for portability a. Parallel and portable pipelines in practice 4. Extensibility to integrate the entire Big Data ecosystem
  • 6. Apache Beam at DataWorks Summit ● Realizing the promise of portable data processing with Apache Beam ○ Speaker: Davor Bonaci, Google ○ Wednesday @ 11:30 am ● Stateful processing of massive out-of-order streams with Apache Beam ○ Speaker: Kenneth Knowles, Google ○ Wednesday @ 3:00 pm ● Birds-of-a-feather: IoT, Streaming and Data Flow ○ Panel: Yolanda Davis, Davor Bonaci, P. Taylor Goetz, Sriharsha Chintalapani, and Joseph Nimiec ○ Thursday @ 5:00 pm
  • 7. Road to the first stable release State of the project
  • 8. What we accomplished so far? 02/01/2016 Enter Apache Incubator 5/16/2017 First stable release Early 2016 Design for use cases, begin refactoring Late 2016 Community growth Early 2017 API stabilization 06/14/2016 1st incubating release 01/10/2017 Graduation as a top-level project
  • 9. Announcing the first stable release (5/16/17)
  • 10. Expressing data-parallel pipelines with the Beam model A unified model for batch and streaming
  • 11. Processing time vs. event time
  • 12. The Beam Model: asking the right questions What results are calculated? Where in event time are results calculated? When in processing time are results materialized? How do refinements of results relate?
  • 13. PCollection<KV<String, Integer>> scores = input .apply(Sum.integersPerKey()); The Beam Model: What is being computed?
  • 14. The Beam Model: What is being computed?
  • 15. PCollection<KV<String, Integer>> scores = input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))) .apply(Sum.integersPerKey()); The Beam Model: Where in event time?
  • 16. The Beam Model: Where in event time?
  • 17. PCollection<KV<String, Integer>> scores = input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)) .triggering(AtWatermark())) .apply(Sum.integersPerKey()); The Beam Model: When in processing time?
  • 18. The Beam Model: When in processing time?
  • 19. PCollection<KV<String, Integer>> scores = input .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)) .triggering(AtWatermark() .withEarlyFirings( AtPeriod(Duration.standardMinutes(1))) .withLateFirings(AtCount(1))) .accumulatingFiredPanes()) .apply(Sum.integersPerKey()); The Beam Model: How do refinements relate?
  • 20. The Beam Model: How do refinements relate?
  • 21. Customizing What Where When How 3 Streaming 4 Streaming + Accumulation 1 Classic Batch 2 Windowed Batch
  • 22. The Beam vision for portability Write once, run anywhere“ ”
  • 23. Beam Vision: mix and match SDKs and runtimes ● The Beam Model: the abstractions at the core of Apache Beam Runner 1 Runner 3Runner 2 ● Choice of SDK: Users write their pipelines in a language that’s familiar and integrated with their other tooling ● Choice of Runners: Users choose the right runtime for their current needs -- on-prem / cloud, open source / not, fully managed / not ● Scalability for Developers: Clean APIs allow developers to contribute modules independently The Beam Model Language A Language CLanguage B The Beam Model Language A SDK Language C SDK Language B SDK
  • 24. ● Beam’s Java SDK runs on multiple runtime environments, including: ○ Apache Apex ○ Apache Spark ○ Apache Flink ○ Google Cloud Dataflow ○ [in development] Apache Gearpump ● Cross-language infrastructure is in progress. ○ Beam’s Python SDK currently runs on Google Cloud Dataflow Beam Vision: as of June 2017 Beam Model: Fn Runners Apache Spark Cloud Dataflow Beam Model: Pipeline Construction Apache Flink Java Java Python Python Apache Apex Apache Gearpump
  • 25. Example Beam Runners Apache Spark ● Open-source cluster-computing framework ● Large ecosystem of APIs and tools ● Runs on premise or in the cloud Apache Flink ● Open-source distributed data processing engine ● High-throughput and low-latency stream processing ● Runs on premise or in the cloud Google Cloud Dataflow ● Fully-managed service for batch and stream data processing ● Provides dynamic auto-scaling, monitoring tools, and tight integration with Google Cloud Platform
  • 26. How to think about Apache Beam?
  • 27. How do you build an abstraction layer? Apache Spark Cloud Dataflow Apache Flink ???????? ????????
  • 28. Beam: the intersection of runner functionality?
  • 29. Beam: the union of runner functionality?
  • 32. Parallel and portable pipelines in practice A Use Case
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Getting Started with Apache Beam Quickstarts ● Java SDK ● Python SDK Example walkthroughs ● Word Count ● Mobile Gaming Extensive documentation
  • 49. Extensibility to integrate the entire Big Data ecosystem Integrating Up, Down, and Sideways “ ”
  • 50. Extensibility points ● Software Development Kits (SDKs) ● Runners ● Domain-specific extensions (DSLs) ● Libraries of transformations ● IOs ● File systems
  • 51. Software Development Kits (SDKs) Runner 1 Runner 3Runner 2 The Beam Model Language A SDK Language C SDK Language B SDK
  • 52. Runners Runner 1 Runner 3Runner 2 The Beam Model Language A SDK Language C SDK Language B SDK
  • 53. Domain-specific extensions (DSLs) The Beam Model Language A SDK Language C SDK Language B SDK DSL 2 DSL 3DSL 1
  • 54. Libraries of transformations The Beam Model Language A SDK Language C SDK Language B SDK Library 2 Library 3Library 1
  • 55. IO connectors The Beam Model Language A SDK Language C SDK Language B SDK IO connector 2 IO connector 3 IO connector 1
  • 56. File systems The Beam Model Language A SDK Language C SDK Language B SDK File system 2 File system 3 File system 1
  • 57. Ecosystem integration ● I have an engine → write a Beam runner ● I want to extend Beam to new languages → write an SDK ● I want to adopt an SDK to a target audience → write a DSL ● I want a component can be a part of a bigger data-processing pipeline → write a library of transformations ● I have a data storage or messaging system → write an IO connector or a file system connector
  • 58. Apache Beam is a glue that integrates the big data ecosystem
  • 59. Learn more and get involved! Apache Beam https://beam.apache.org Join the Beam mailing lists! user-subscribe@beam.apache.org dev-subscribe@beam.apache.org Follow @ApacheBeam on Twitter
  • 60. Apache Beam is a unified programming model designed to provide efficient and portable data processing pipelines
  • 61. Still coming up... ● Stateful processing of massive out-of-order streams with Apache Beam ○ Speaker: Kenneth Knowles, Google ○ Wednesday @ 3:00 pm ● Birds-of-a-feather: IoT, Streaming and Data Flow ○ Panel: Yolanda Davis, Davor Bonaci, P. Taylor Goetz, Sriharsha Chintalapani, and Joseph Nimiec ○ Thursday @ 5:00 pm