SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Scala: Lingua Franca of Fast
Data
Jamie Allen
Sr. Director of Global Solutions Architects
• Why Scala?
• Who is doing this?
• What is Fast Data?
• Architecting for Fast Data
Agenda
• Cloud portability versus native control
• Application correctness versus speed of development
• Modularity versus global namespace
• Concise syntax versus boilerplate
• Multi-threaded simplicity via abstractions versus low-level control
Tradeoffs
• REPL
• Type safety
• Modularity
• Concise syntax
• Multi-threaded simplicity
• Data-centric semantics
• Managed runtime for cloud portability
• Ecosystem
Scala is the local optimum
Scala is the local optimum
The JVM is a primary reason for Scala’s success
• No REPL or Notebook
• Not a data-centric language, particularly collections semantics
Why not Java?
• Data-centric language, has all of the wonderful collections semantics we
want
• No type safety
• No modularity
Why not Python?
• Weak type safety
• Collections are too elemental
• Native execution is a non-starter, so Go is the only option
• Garbage collection is not generational
Why not Go or C++?
• Scala just so happened to fit well in this space
• Performance
• Correctness
• Conciseness
• Scala will evolve
• Other languages will come in time
Scala is NOT the end of the road
Who is doing this?
One Caveat: Apache Beam and TensorFlow
Why Scala?
At the time we started, I really wanted a PL that supports a
language-integrated interface (where people write
functions inline, etc)… However, I also wanted to be on the
JVM in order to easily interact with the Hadoop filesystem
and data formats for that. Scala was the only somewhat
popular JVM language that offered this kind of
functional syntax and was also statically typed (letting
us have some control over performance), so we chose
that. Today there might be an argument to make the first
version of the API in Java with Java 8, but we also
benefitted from other aspects of Scala in Spark, like type
inference, pattern matching, actor libriaries, etc.
Matei Zaharia, creator of Spark
What is Fast Data?
A bit of history: Hadoop
YARN
HDFS
MR job #1
MR job #2
Flume Sqoop
DBs
Slave Node
DiskDiskDiskDiskDisk
Node Mgr
Data Node
Master
Resource
Manager
Name Node
Worker
Hadoop strengths
• Lowest capital expenditure for big data
• Excellent for ingesting and integrating diverse datasets
• Flexible
• Classic analytics (aggregations and data warehousing)
• Machine learning
Hadoop weaknesses
• Complex administration
• YARN requires dedicated cluster
• MapReduce foibles
• Poor performance
• Imperative programming model
• No stream processing support
Fast Data with Spark
Spark
• 100x faster as a replacement for Hadoop MapReduce
• Uses much fewer machines and resources
• Incredible support from the community and enterprise
Spark use cases
• Primarily anomaly detection
• Risk management
• Fraud detection
• Odds recalculation
• Spam filters
• Update search engine results quickly
• Spark had it with RDDs
• They removed it with the DataFrames API
• Brought it back with DataSets, but not as comprehensively as RDDs
Type safety
Why not Flink?
• Flink has much better stream handling for low latency systems that Spark
currently lacks
• Event timing
• Watermarks
• Triggers
• Exactly-once semantics (assuming connections hold)
• Pipeline portability via Apache Beam integration
Why not Flink?
Architecting for Fast Data
This isn’t enough
Old and busted
Traditional application architectures
and platforms are obsolete.
Gartner
How do we avoid messing this up?
• At the API
• In our source
• For our data
We want isolation
Wikipedia, Creative Commons, created by DFoerster
We want realistic data management
• Use CQRS and Event Sourcing, not CRUD
• Transactions, especially distributed, will not work
• Consistency is an anti-pattern at scale
• Distributed locks and shared data will limit you
• Data fabrics break all of these conventions
Think in terms of compensation, not prevention.
Kevin Webber, Lightbend
We want to ACID v2
• Associativity, not Atomicity
• Commutativity, not Consistency
• Idempotent, not Isolation
• Distributed, not Durable
Wikipedia, Creative Commons, created by Weston.pace
New hotness
Fast DataArchitecture
HTTP/
REST
Learning Spark
• Go to http://bigdatauniversity.com, built by IBM
20160524 ibm fast data meetup

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Real-Time Forecasting Engine with Scala and Akka
Building a Real-Time Forecasting Engine with Scala and Akka Building a Real-Time Forecasting Engine with Scala and Akka
Building a Real-Time Forecasting Engine with Scala and Akka Lightbend
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesLightbend
 
Journey to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataJourney to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataLightbend
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application DesignGlobalLogic Ukraine
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesLightbend
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lightbend
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 
Model-driven and low-code development for event-based systems | Bobby Calderw...
Model-driven and low-code development for event-based systems | Bobby Calderw...Model-driven and low-code development for event-based systems | Bobby Calderw...
Model-driven and low-code development for event-based systems | Bobby Calderw...HostedbyConfluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsHostedbyConfluent
 
Journey to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataJourney to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataLightbend
 
Micro Services Architecture
Micro Services ArchitectureMicro Services Architecture
Micro Services ArchitectureRanjan Baisak
 
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...confluent
 
Stuff About CQRS
Stuff About CQRSStuff About CQRS
Stuff About CQRSthinkddd
 
Event Sourcing - Greg Young
Event Sourcing - Greg YoungEvent Sourcing - Greg Young
Event Sourcing - Greg YoungJAXLondon2014
 
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...HostedbyConfluent
 
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...confluent
 
Docker in the Enterprise
Docker in the EnterpriseDocker in the Enterprise
Docker in the EnterpriseSaul Caganoff
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 

Was ist angesagt? (20)

Building a Real-Time Forecasting Engine with Scala and Akka
Building a Real-Time Forecasting Engine with Scala and Akka Building a Real-Time Forecasting Engine with Scala and Akka
Building a Real-Time Forecasting Engine with Scala and Akka
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and Microservices
 
Journey to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataJourney to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big Data
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Model-driven and low-code development for event-based systems | Bobby Calderw...
Model-driven and low-code development for event-based systems | Bobby Calderw...Model-driven and low-code development for event-based systems | Bobby Calderw...
Model-driven and low-code development for event-based systems | Bobby Calderw...
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
Journey to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataJourney to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big Data
 
Micro Services Architecture
Micro Services ArchitectureMicro Services Architecture
Micro Services Architecture
 
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
Kafka Summit SF 2017 - Worldwide Scalable and Resilient Messaging Services wi...
 
Stuff About CQRS
Stuff About CQRSStuff About CQRS
Stuff About CQRS
 
Event Sourcing - Greg Young
Event Sourcing - Greg YoungEvent Sourcing - Greg Young
Event Sourcing - Greg Young
 
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
Why Kafka Works the Way It Does (And Not Some Other Way) | Tim Berglund, Conf...
 
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
Jay Kreps, Confluent | Kafka Summit SF 2019 Keynote ft. Dev Tagare, Lyft + Pr...
 
Docker in the Enterprise
Docker in the EnterpriseDocker in the Enterprise
Docker in the Enterprise
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 

Ähnlich wie 20160524 ibm fast data meetup

Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Thoughtworks
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataJohn Nestor
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous ApplicationsJohan Edstrom
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Manik Surtani
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Store
StoreStore
StoreESUG
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
Scala adoption by enterprises
Scala adoption by enterprisesScala adoption by enterprises
Scala adoption by enterprisesMike Slinn
 

Ähnlich wie 20160524 ibm fast data meetup (20)

Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Big Data pipeline with Scala by Rohit Rai, Tuplejump - presented at Pune Scal...
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous Applications
 
Spark
SparkSpark
Spark
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Store
StoreStore
Store
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Scala adoption by enterprises
Scala adoption by enterprisesScala adoption by enterprises
Scala adoption by enterprises
 

Mehr von shinolajla

20180416 reactive is_a_product_rs
20180416 reactive is_a_product_rs20180416 reactive is_a_product_rs
20180416 reactive is_a_product_rsshinolajla
 
20180416 reactive is_a_product
20180416 reactive is_a_product20180416 reactive is_a_product
20180416 reactive is_a_productshinolajla
 
20161027 scala io_keynote
20161027 scala io_keynote20161027 scala io_keynote
20161027 scala io_keynoteshinolajla
 
20160520 what youneedtoknowaboutlambdas
20160520 what youneedtoknowaboutlambdas20160520 what youneedtoknowaboutlambdas
20160520 what youneedtoknowaboutlambdasshinolajla
 
20160317 lagom sf scala
20160317 lagom sf scala20160317 lagom sf scala
20160317 lagom sf scalashinolajla
 
Effective Akka v2
Effective Akka v2Effective Akka v2
Effective Akka v2shinolajla
 
20150411 mutability matrix of pain scala
20150411 mutability matrix of pain scala20150411 mutability matrix of pain scala
20150411 mutability matrix of pain scalashinolajla
 
Reactive applications tools of the trade huff po
Reactive applications   tools of the trade huff poReactive applications   tools of the trade huff po
Reactive applications tools of the trade huff poshinolajla
 
20140228 fp and_performance
20140228 fp and_performance20140228 fp and_performance
20140228 fp and_performanceshinolajla
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaioshinolajla
 
Real world akka recepies v3
Real world akka recepies v3Real world akka recepies v3
Real world akka recepies v3shinolajla
 
Effective actors japanesesub
Effective actors japanesesubEffective actors japanesesub
Effective actors japanesesubshinolajla
 
Effective Actors
Effective ActorsEffective Actors
Effective Actorsshinolajla
 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scalashinolajla
 

Mehr von shinolajla (16)

20180416 reactive is_a_product_rs
20180416 reactive is_a_product_rs20180416 reactive is_a_product_rs
20180416 reactive is_a_product_rs
 
20180416 reactive is_a_product
20180416 reactive is_a_product20180416 reactive is_a_product
20180416 reactive is_a_product
 
20161027 scala io_keynote
20161027 scala io_keynote20161027 scala io_keynote
20161027 scala io_keynote
 
20160520 what youneedtoknowaboutlambdas
20160520 what youneedtoknowaboutlambdas20160520 what youneedtoknowaboutlambdas
20160520 what youneedtoknowaboutlambdas
 
20160317 lagom sf scala
20160317 lagom sf scala20160317 lagom sf scala
20160317 lagom sf scala
 
Effective Akka v2
Effective Akka v2Effective Akka v2
Effective Akka v2
 
20150411 mutability matrix of pain scala
20150411 mutability matrix of pain scala20150411 mutability matrix of pain scala
20150411 mutability matrix of pain scala
 
Reactive applications tools of the trade huff po
Reactive applications   tools of the trade huff poReactive applications   tools of the trade huff po
Reactive applications tools of the trade huff po
 
20140228 fp and_performance
20140228 fp and_performance20140228 fp and_performance
20140228 fp and_performance
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 
Real world akka recepies v3
Real world akka recepies v3Real world akka recepies v3
Real world akka recepies v3
 
Effective actors japanesesub
Effective actors japanesesubEffective actors japanesesub
Effective actors japanesesub
 
Effective Actors
Effective ActorsEffective Actors
Effective Actors
 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scala
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

20160524 ibm fast data meetup

  • 1. Scala: Lingua Franca of Fast Data Jamie Allen Sr. Director of Global Solutions Architects
  • 2. • Why Scala? • Who is doing this? • What is Fast Data? • Architecting for Fast Data Agenda
  • 3. • Cloud portability versus native control • Application correctness versus speed of development • Modularity versus global namespace • Concise syntax versus boilerplate • Multi-threaded simplicity via abstractions versus low-level control Tradeoffs
  • 4. • REPL • Type safety • Modularity • Concise syntax • Multi-threaded simplicity • Data-centric semantics • Managed runtime for cloud portability • Ecosystem Scala is the local optimum
  • 5. Scala is the local optimum
  • 6. The JVM is a primary reason for Scala’s success
  • 7. • No REPL or Notebook • Not a data-centric language, particularly collections semantics Why not Java?
  • 8. • Data-centric language, has all of the wonderful collections semantics we want • No type safety • No modularity Why not Python?
  • 9. • Weak type safety • Collections are too elemental • Native execution is a non-starter, so Go is the only option • Garbage collection is not generational Why not Go or C++?
  • 10. • Scala just so happened to fit well in this space • Performance • Correctness • Conciseness • Scala will evolve • Other languages will come in time Scala is NOT the end of the road
  • 11. Who is doing this?
  • 12.
  • 13. One Caveat: Apache Beam and TensorFlow
  • 14. Why Scala? At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc)… However, I also wanted to be on the JVM in order to easily interact with the Hadoop filesystem and data formats for that. Scala was the only somewhat popular JVM language that offered this kind of functional syntax and was also statically typed (letting us have some control over performance), so we chose that. Today there might be an argument to make the first version of the API in Java with Java 8, but we also benefitted from other aspects of Scala in Spark, like type inference, pattern matching, actor libriaries, etc. Matei Zaharia, creator of Spark
  • 15. What is Fast Data?
  • 16. A bit of history: Hadoop
  • 17. YARN HDFS MR job #1 MR job #2 Flume Sqoop DBs Slave Node DiskDiskDiskDiskDisk Node Mgr Data Node Master Resource Manager Name Node Worker
  • 18. Hadoop strengths • Lowest capital expenditure for big data • Excellent for ingesting and integrating diverse datasets • Flexible • Classic analytics (aggregations and data warehousing) • Machine learning
  • 19. Hadoop weaknesses • Complex administration • YARN requires dedicated cluster • MapReduce foibles • Poor performance • Imperative programming model • No stream processing support
  • 20. Fast Data with Spark
  • 21.
  • 22. Spark • 100x faster as a replacement for Hadoop MapReduce • Uses much fewer machines and resources • Incredible support from the community and enterprise
  • 23. Spark use cases • Primarily anomaly detection • Risk management • Fraud detection • Odds recalculation • Spam filters • Update search engine results quickly
  • 24. • Spark had it with RDDs • They removed it with the DataFrames API • Brought it back with DataSets, but not as comprehensively as RDDs Type safety
  • 25. Why not Flink? • Flink has much better stream handling for low latency systems that Spark currently lacks • Event timing • Watermarks • Triggers • Exactly-once semantics (assuming connections hold) • Pipeline portability via Apache Beam integration
  • 30. Traditional application architectures and platforms are obsolete. Gartner
  • 31. How do we avoid messing this up?
  • 32. • At the API • In our source • For our data We want isolation Wikipedia, Creative Commons, created by DFoerster
  • 33. We want realistic data management • Use CQRS and Event Sourcing, not CRUD • Transactions, especially distributed, will not work • Consistency is an anti-pattern at scale • Distributed locks and shared data will limit you • Data fabrics break all of these conventions Think in terms of compensation, not prevention. Kevin Webber, Lightbend
  • 34. We want to ACID v2 • Associativity, not Atomicity • Commutativity, not Consistency • Idempotent, not Isolation • Distributed, not Durable Wikipedia, Creative Commons, created by Weston.pace
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Learning Spark • Go to http://bigdatauniversity.com, built by IBM

Hinweis der Redaktion

  1. One version of the emerging Fast Data Architecture. For today’s talk, I won’t go through it in detail, but this reflects some industry trends among open-source tools (popularity of Spark Streaming, Kafka, and Cassandra), plus our view of the Typesafe Reactive Platform as the glue that integrates it all, lets you implement the rest of the micro services you need, and provides the low latency streaming through Akka Streams that Spark doesn’t provide. The last slide has a link to a white paper I wrote that goes through this diagram in detail, along with several example use cases.
  2. It’s very common to combine Spark Streaming, Kafka, and Cassandra (or another distributed, NoSQL database). In fact, this “troika” has become a de facto standard component of modern, stream-oriented architectures.
  3. While Spark, Kafka, and Cassandra are de facto standard core components, you also need a platform for resource management and other needs, plus “glue” to tie everything together. Hence, the “SMACK stack”: S - Spark (and Scala?) M - Mesos A - Akka C - Cassandra K - Kafka
  4. Discuss all the components in the context of data flow. First, you will have streaming data from many sources, web requests and other external Internet traffic, you’ll have services communicating with you. If they follow the Reactive Streams standard, then they can be ingested by Akka Streams. Finally, lots of internal data like logs, files FTPed into your environment, etc. will be ingested.
  5. Reactive Streams follow a standard for an out-of-band, resilient backpressure mechanism that provides flow control. It only describes the behavior of a single stream with 1+ producers and 1+ consumers, but these streams are composable; a graph of reactive streams effectively provide end-to-end flow control.
  6. Use the Lightbend Reactive Platform as “glue” for the other large components, implemented as microservices. Web requests (e.g., REST) can be handled by Play, which you can also use to implement user interfaces. Reactive Stream-Compliant sources can be ingested by Akka Streams, which supports a low-latency, per-event processing model.
  7. Use Kafka to ingest “ephemeral” stream data (which can’t be replied if lost downstream). Kafka has enormous scalability and durability. It’s a great place to capture your streams, which can then be processed with Akka Streams or with Spark Streaming.
  8. One use of Kafka is to solve the problem of N*M direct links between producers and consumers. This is hard to manage and it couples services to directly, which is fragile when a given service needs to be scaled up through replication or replacement and sometimes in the protocol that both ends need to speak.
  9. So Kafka can function as a central hub, yet it’s distributed and scalable so it isn’t a bottleneck or single point of failure.
  10. For sophisticated stream processing, where richer analytics (like “online” machine learning) is required and higher latencies can be tolerated, use Spark Streaming, which implements a mini batch model, where data is processed in “chunks”, defined by time windows as small as 1 second.
  11. Processing results can be written back to Kafka for downstream consumption. For permanent storage, right a database, like Cassandra, when fast record-level CRUD is required, or write to a distributed file system, when cheaper storage is desired and “table-scan” access patterns are most important.
  12. This architecture is agnostic to the platform. We like Mesos, the next-generation, general-purpose infrastructure for cluster resource and application management, but it works with YARN, too. You can deploy on premise (e.g., on “bare metal” hardware) or in a cloud environment.