SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Spark Streaming-as-a-
Service with Kafka and
YARN
Jim Dowling
KTH Royal Institute of Technology, Stockholm
Senior Researcher, SICS
CEO, Logical Clocks AB
Spark Streaming-as-a-Service in Sweden
• SICS ICE: datacenter research environment
• Hopsworks: Spark/Flink/Kafka/Tensorflow/Hadoop
• -as-a-service
– Built on Hops Hadoop (www.hops.io)
– >130 active users
Hadoop is not a cool kid anymore!
Hadoop’s Evolution
2009 2016
?
Hadoop’s Evolution
2009 2016
?
Tiny Brain
(NameNode, ResourceMgr)
Huge Body (DataNodes)
Build out Hadoop’s Brain with External
Weakly Consistent MetaData Services
Google-Glass Approach to Intelligence
NameNodes
NDB
HDFS Client
DataNodes
>37X Capacity
>16 X
Throughput
HopsFS
Larger Brains => Bigger, Faster*
16x
Performance
on Spotify Workload
*Usenix FAST 2017, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Hopsworks
• Projects
– Datasets/Files
– Topics
– Jobs/Notebooks
Hadoop
• Clusters
• Users
• Jobs/Applications
• Files
• ACLs
• Sys Admins
• Kerberos
Larger Brains => More Intelligent*
*HMGA2 gene mutations correlated with increased intracranial volume as well as enhanced IQ.
http://newsroom.ucla.edu/releases/international-team-uncovers-new-231989
User-Friendly Concepts
http://www.ibtimes.co.uk/embargoed-8pm-25th-jan-size-matters-brain-size-relative-body-size-indicates-animals-ability-1539994
YARN Spark Streaming Support
• Apache Kafka
• ELK Stack
– Real-time Logs
• Grafana/InfluxDB
– Monitoring
Hopsworks
YARN aggregates logs on job completion
http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/
Kafka Self-Service UI
Manage & Share
• Topics
• ACLs
• Avro Schemas
Manage & Share
• Topics
• ACLs
• Avro Schemas
Logs
Elasticsearch,
Logstash,
Kibana
(ELK Stack)
Elasticsearch,
Logstash,
Kibana
(ELK Stack)
Monitoring/Alerting
InfluxDB
and
Grafana
InfluxDB
and
Grafana
metrics.properties: StreamingMetrics.streaming.lastReceivedBatch_records == 0
Zeppelin for Prototyping Streaming Apps
[https://github.com/knockdata/spark-highcharts]
Debugging Spark with Dr. Elephant
• Analyzes Spark Jobs
for errors and
common using
pluggable heuristics
• Doesn’t show killed
jobs
• No online support for
streaming apps yet
Integration as Microservices in Hopsworks
• Project-based Multi-tenancy
• Self-Service UI
• Simplifying Spark Streaming Apps
Proj-All
Proj-X
Projects in Hopsworks
•
Proj-42
Shared TopicTopic /Projs/My/Data
CompanyDB
User roles
18
Data Owner
- Import/Export data
- Manage Membership
- Share DataSets, Topics
Data Scientist
- Write and Run code
Self-Service Administration – No Administrator Needed
Notebooks, Data sharing and Quotas
• Zeppelin Notebooks in HDFS, Jobs launcher UI.
• Sharing is not Copying
– Datasets/Topics
• Per-Project quotas
– Storage in HDFS
– CPU in YARN (Uber-style Pricing)
Dynamic roles
alice@gmail.com
ProjectA
Authenticate
ProjectB
HopsFS
YARN
Kafka
SSL/TLS
Certificates
Secure
Impersonation
ProjectA__alice
ProjectB__alice
Look Ma, no Kerberos
• Each project-specific user issued with a SSL/TLS
(X.509) certificate for both authentication and encryption.
• Services also issued with SSL/TLS certificates.
– Same root CA as user certs
Simplifying Spark Streaming Apps
• Spark Streaming Applications need to know
– Credentials
• Hadoop, Kafka, InfluxDb, Logstash
– Endpoints
• Kafka Broker, Kafka SchemaRegistry, ResourceManager,
NameNode, InfluxDB, Logstash
• The HopsUtil API hides this complexity.
– Location/security transparent Spark applications
Secure Streaming App with Kafka
Developer
1.Discover: Schema Registry and Kafka/InfluxDB/ELK Endpoints
2.Create: Kafka Properties file with certs and broker details
3.Create: Producer/Consumer using Kafka Properties
4.Download: the Schema for the Topic from the Schema Registry
5.Distribute: X.509 certs to all hosts on the cluster
6.Cleanup securely
These steps are replaced by calls to the HopsUtil API
Operations
https://github.com/hopshadoop/hops-kafka-examples
Streaming Producer in HopsWorks
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
String topic = HopsUtil.getTopic(); //Optional
SparkProducer producer = HopsUtil.getSparkProducer();
Map<String, String> message = …
sparkProducer.produce(message);
Streaming Consumer in HopsWorks
JavaStreamingContext jssc = new
JavaStreamingContext(sparkConf,Durations.seconds(2));
String topic = HopsUtil.getTopic(); //Optional
String consumerGroup = HopsUtil.getConsumerGroup(); //Optional
SparkConsumer consumer = HopsUtil.getSparkConsumer(jssc);
JavaInputDStream<ConsumerRecord<String, byte[]>> messages =
consumer.createDirectStream();
jssc.start();
Less code to write
https://github.com/hopshadoop/hops-kafka-examples
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
props.put(SCHEMA_REGISTRY_URL, restApp.restConnect);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
org.apache.kafka.common.serialization.StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put("producer.type", "sync");
props.put("serializer.class","kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
props.put("ssl.keystore.location","/var/ssl/kafka.client.keystore.jks"
)
props.put("ssl.keystore.password","test1234")
props.put("ssl.key.password","test1234")
ProducerConfig config = new ProducerConfig(props);
String userSchema =
"{"namespace": "example.avro", "type": "record", "name": "U
ser"," +
""fields":
[{"name": "name", "type": "string"}]}";
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(userSchema);
GenericRecord avroRecord = new GenericData.Record(schema);
avroRecord.put("name", "testUser");
Producer<String, String> producer = new Producer<String,
String>(config);
ProducerRecord<String, Object> message = new
ProducerRecord<>(“topicName”, avroRecord );
producer.send(data);
Lots of Hard-Coded Endpoints Here!
SparkProducer producer =
HopsUtil.getSparkProducer();
Map<String, String> message = …
sparkProducer.produce(message);
Massively Simplified Code for
Secure Spark Streaming/Kafka
Distributing Certs for Spark Streaming
Alice@gmail.com
1. Launch Spark Job
Distributed
Database
2. Get certs,
service endpoints
YARN Private
LocalResources
Spark Streaming App
4. Materialize certs
3. YARN Job, config
6. Get Schema
7. Consume
Produce
5. Read Certs
Hopsworks
HopsUtil
8. Read ACLs for
authentication
Multi-Tenant IoT Scenario
Sensor
Node
Sensor
Node
Sensor
Node
Sensor
Node
Sensor
Node
Sensor
Node
Field Gateway
StorageStorage
AnalysisAnalysis
IngestionIngestion
ACMEACME
Evil CorpEvil Corp
IoT Cloud Platform
DontBeEvil
Corp
DontBeEvil
Corp
IoT Scenario
ACME DontBeEvil Corp Evil-Corp
AWS Google
Cloud
Oracle
Cloud
User Apps control IoT Devices
IoT Company:
Analyze Data,
Data Services
for Clients
ACME DontBeEvil Corp Evil Corp
Cloud-Native Analytics Solution
ACME S3S3
[Authorization]
GCSGCS
OracleOracleIoT Company
Each customer needs its own
Analytics Infrastructure
Each customer needs its own
Analytics Infrastructure
Spark
Streaming App
IoT Company
Project
GatewayTopic
Hopsworks Solution using Projects
ACME
ProjectACMETopic
ACME Dataset
Data Stream
Analytics Reports
Hopsworks Solution
ACME
Spark
Streaming App
[Authorized]
ACME
Dataset
ACME
Dataset
ACME Topic
ACME Analytics
Reports
ACME Analytics
Reports
Spark Batch
Job
ACME Project
Karamel/Chef for Automated Installation
Google Compute Engine
BareMetal
DEMO
Hops Roadmap
• HopsFS
– HA support for Multi-Data-Center
– Small files, 2-Level Erasure Coding
• HopsYARN
– Tensorflow with isolated GPUs
• Hopsworks
– P2P Dataset Sharing
– Jupyter, Presto, Hive
Summary
• Hops is a new distribution of Hadoop
– Tinker-friendly and open-source.
• Hopsworks provides first-class support for
Spark-Streaming-as-a-Service
– With support services like Kafka, ELK Stack,
Zeppelin, Grafana/InfluxDB.
Hops Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto
Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid,
Robin Andersso, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Tiago Brito, Filotas
Siskos.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan
Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali
Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt,
Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler,
Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
Thank You.
We totally understand it’s going to be
America First Spark Streaming first, but
can we take this chance to say
Hopsworks second!
http://www.hops.io
@hopshadoop
Hops

Weitere ähnliche Inhalte

Was ist angesagt?

Cracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)W2O Group
 
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Timothy Spann
 
ApacheCon 2021: Apache NiFi 101- introduction and best practices
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practicesTimothy Spann
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Timothy Spann
 
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
 
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Timothy Spann
 
StreamNative FLiP into scylladb - scylla summit 2022
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Cracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
 
Spark optimization
Spark optimizationSpark optimization
Spark optimizationAnkit Beohar
 
Streaming from the cloud
Streaming from the cloudStreaming from the cloud
Streaming from the clouddmulford
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
 

Was ist angesagt? (19)

FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Cracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworks
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
 
ApacheCon 2021: Apache NiFi 101- introduction and best practices
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practices
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)Using the flipn stack for edge ai (flink, nifi, pulsar)
Using the flipn stack for edge ai (flink, nifi, pulsar)
 
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
 
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
 
StreamNative FLiP into scylladb - scylla summit 2022
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Cracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworks
 
Spark optimization
Spark optimizationSpark optimization
Spark optimization
 
Streaming from the cloud
Streaming from the cloudStreaming from the cloud
Streaming from the cloud
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 

Andere mochten auch

Shug meetup Hops Hadoop
Shug meetup Hops HadoopShug meetup Hops Hadoop
Shug meetup Hops HadoopJim Dowling
 
Hopsfs 10x HDFS performance
Hopsfs 10x HDFS performanceHopsfs 10x HDFS performance
Hopsfs 10x HDFS performanceJim Dowling
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
Unha pequena escolma
Unha pequena escolmaUnha pequena escolma
Unha pequena escolmamfpicado
 
【クラーク高校】模擬選挙授業用資料
【クラーク高校】模擬選挙授業用資料【クラーク高校】模擬選挙授業用資料
【クラーク高校】模擬選挙授業用資料Tetsuya Sato
 
Butterfly
ButterflyButterfly
Butterflyayazuob
 
News reports
News reportsNews reports
News reportsmicsan95
 
2008 photo identification facial metrical and morphological features in south...
2008 photo identification facial metrical and morphological features in south...2008 photo identification facial metrical and morphological features in south...
2008 photo identification facial metrical and morphological features in south...Marta Flores
 
Nueva droga alerta-6135
Nueva droga alerta-6135Nueva droga alerta-6135
Nueva droga alerta-6135bulletfor
 
【模擬選挙×マニフェストスイッチ】開票結果について
【模擬選挙×マニフェストスイッチ】開票結果について【模擬選挙×マニフェストスイッチ】開票結果について
【模擬選挙×マニフェストスイッチ】開票結果についてTetsuya Sato
 
Wcm remedies concl
Wcm remedies conclWcm remedies concl
Wcm remedies conclpjain30688
 
Billy Elliot Transcript
Billy Elliot TranscriptBilly Elliot Transcript
Billy Elliot Transcriptmicsan95
 
Retos de la ciencia para el siglo xxi
Retos de la ciencia  para  el  siglo xxiRetos de la ciencia  para  el  siglo xxi
Retos de la ciencia para el siglo xxiMaura Salcedo
 
Ensayo de oratoria. alexander
Ensayo de oratoria. alexanderEnsayo de oratoria. alexander
Ensayo de oratoria. alexanderAlexrivesaia
 
Negociere.curs 9
Negociere.curs 9Negociere.curs 9
Negociere.curs 9mihut888
 
kelas11 smk-biologi-pertanian_ameilia-dkk
 kelas11 smk-biologi-pertanian_ameilia-dkk kelas11 smk-biologi-pertanian_ameilia-dkk
kelas11 smk-biologi-pertanian_ameilia-dkkdedibiru
 
Retos de la ciencia para el siglo XXI
Retos de la ciencia  para  el  siglo XXIRetos de la ciencia  para  el  siglo XXI
Retos de la ciencia para el siglo XXIMaura Salcedo
 

Andere mochten auch (20)

Shug meetup Hops Hadoop
Shug meetup Hops HadoopShug meetup Hops Hadoop
Shug meetup Hops Hadoop
 
Hopsfs 10x HDFS performance
Hopsfs 10x HDFS performanceHopsfs 10x HDFS performance
Hopsfs 10x HDFS performance
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Haya Exports
Haya ExportsHaya Exports
Haya Exports
 
Modul html
Modul htmlModul html
Modul html
 
Unha pequena escolma
Unha pequena escolmaUnha pequena escolma
Unha pequena escolma
 
Monografia bioestadisstica
Monografia bioestadissticaMonografia bioestadisstica
Monografia bioestadisstica
 
【クラーク高校】模擬選挙授業用資料
【クラーク高校】模擬選挙授業用資料【クラーク高校】模擬選挙授業用資料
【クラーク高校】模擬選挙授業用資料
 
Butterfly
ButterflyButterfly
Butterfly
 
News reports
News reportsNews reports
News reports
 
2008 photo identification facial metrical and morphological features in south...
2008 photo identification facial metrical and morphological features in south...2008 photo identification facial metrical and morphological features in south...
2008 photo identification facial metrical and morphological features in south...
 
Nueva droga alerta-6135
Nueva droga alerta-6135Nueva droga alerta-6135
Nueva droga alerta-6135
 
【模擬選挙×マニフェストスイッチ】開票結果について
【模擬選挙×マニフェストスイッチ】開票結果について【模擬選挙×マニフェストスイッチ】開票結果について
【模擬選挙×マニフェストスイッチ】開票結果について
 
Wcm remedies concl
Wcm remedies conclWcm remedies concl
Wcm remedies concl
 
Billy Elliot Transcript
Billy Elliot TranscriptBilly Elliot Transcript
Billy Elliot Transcript
 
Retos de la ciencia para el siglo xxi
Retos de la ciencia  para  el  siglo xxiRetos de la ciencia  para  el  siglo xxi
Retos de la ciencia para el siglo xxi
 
Ensayo de oratoria. alexander
Ensayo de oratoria. alexanderEnsayo de oratoria. alexander
Ensayo de oratoria. alexander
 
Negociere.curs 9
Negociere.curs 9Negociere.curs 9
Negociere.curs 9
 
kelas11 smk-biologi-pertanian_ameilia-dkk
 kelas11 smk-biologi-pertanian_ameilia-dkk kelas11 smk-biologi-pertanian_ameilia-dkk
kelas11 smk-biologi-pertanian_ameilia-dkk
 
Retos de la ciencia para el siglo XXI
Retos de la ciencia  para  el  siglo XXIRetos de la ciencia  para  el  siglo XXI
Retos de la ciencia para el siglo XXI
 

Ähnlich wie Spark summit-east-dowling-feb2017-full

Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksJim Dowling
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
Secure Streaming-as-a-Service with Kafka/Spark/Flink in HopsworksSecure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
Secure Streaming-as-a-Service with Kafka/Spark/Flink in HopsworksTheofilos Kakantousis
 
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Evention
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AITorsten Steinbach
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN Jim Dowling
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networkspbelko82
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerScality
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 

Ähnlich wie Spark summit-east-dowling-feb2017-full (20)

Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on Hopsworks
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
Secure Streaming-as-a-Service with Kafka/Spark/Flink in HopsworksSecure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
Secure Streaming-as-a-Service with Kafka/Spark/Flink in Hopsworks
 
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
 
Cloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AICloud-based Data Lake for Analytics and AI
Cloud-based Data Lake for Analytics and AI
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 ServerAWS re:Invent 2016 - Scality's Open Source AWS S3 Server
AWS re:Invent 2016 - Scality's Open Source AWS S3 Server
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 

Mehr von Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 

Mehr von Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Spark summit-east-dowling-feb2017-full

  • 1. Spark Streaming-as-a- Service with Kafka and YARN Jim Dowling KTH Royal Institute of Technology, Stockholm Senior Researcher, SICS CEO, Logical Clocks AB
  • 2. Spark Streaming-as-a-Service in Sweden • SICS ICE: datacenter research environment • Hopsworks: Spark/Flink/Kafka/Tensorflow/Hadoop • -as-a-service – Built on Hops Hadoop (www.hops.io) – >130 active users
  • 3. Hadoop is not a cool kid anymore!
  • 5. Hadoop’s Evolution 2009 2016 ? Tiny Brain (NameNode, ResourceMgr) Huge Body (DataNodes)
  • 6. Build out Hadoop’s Brain with External Weakly Consistent MetaData Services Google-Glass Approach to Intelligence
  • 8. Larger Brains => Bigger, Faster* 16x Performance on Spotify Workload *Usenix FAST 2017, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
  • 9. Hopsworks • Projects – Datasets/Files – Topics – Jobs/Notebooks Hadoop • Clusters • Users • Jobs/Applications • Files • ACLs • Sys Admins • Kerberos Larger Brains => More Intelligent* *HMGA2 gene mutations correlated with increased intracranial volume as well as enhanced IQ. http://newsroom.ucla.edu/releases/international-team-uncovers-new-231989 User-Friendly Concepts http://www.ibtimes.co.uk/embargoed-8pm-25th-jan-size-matters-brain-size-relative-body-size-indicates-animals-ability-1539994
  • 10. YARN Spark Streaming Support • Apache Kafka • ELK Stack – Real-time Logs • Grafana/InfluxDB – Monitoring Hopsworks YARN aggregates logs on job completion http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/
  • 11. Kafka Self-Service UI Manage & Share • Topics • ACLs • Avro Schemas Manage & Share • Topics • ACLs • Avro Schemas
  • 14. Zeppelin for Prototyping Streaming Apps [https://github.com/knockdata/spark-highcharts]
  • 15. Debugging Spark with Dr. Elephant • Analyzes Spark Jobs for errors and common using pluggable heuristics • Doesn’t show killed jobs • No online support for streaming apps yet
  • 16. Integration as Microservices in Hopsworks • Project-based Multi-tenancy • Self-Service UI • Simplifying Spark Streaming Apps
  • 17. Proj-All Proj-X Projects in Hopsworks • Proj-42 Shared TopicTopic /Projs/My/Data CompanyDB
  • 18. User roles 18 Data Owner - Import/Export data - Manage Membership - Share DataSets, Topics Data Scientist - Write and Run code Self-Service Administration – No Administrator Needed
  • 19. Notebooks, Data sharing and Quotas • Zeppelin Notebooks in HDFS, Jobs launcher UI. • Sharing is not Copying – Datasets/Topics • Per-Project quotas – Storage in HDFS – CPU in YARN (Uber-style Pricing)
  • 21. Look Ma, no Kerberos • Each project-specific user issued with a SSL/TLS (X.509) certificate for both authentication and encryption. • Services also issued with SSL/TLS certificates. – Same root CA as user certs
  • 22. Simplifying Spark Streaming Apps • Spark Streaming Applications need to know – Credentials • Hadoop, Kafka, InfluxDb, Logstash – Endpoints • Kafka Broker, Kafka SchemaRegistry, ResourceManager, NameNode, InfluxDB, Logstash • The HopsUtil API hides this complexity. – Location/security transparent Spark applications
  • 23. Secure Streaming App with Kafka Developer 1.Discover: Schema Registry and Kafka/InfluxDB/ELK Endpoints 2.Create: Kafka Properties file with certs and broker details 3.Create: Producer/Consumer using Kafka Properties 4.Download: the Schema for the Topic from the Schema Registry 5.Distribute: X.509 certs to all hosts on the cluster 6.Cleanup securely These steps are replaced by calls to the HopsUtil API Operations https://github.com/hopshadoop/hops-kafka-examples
  • 24. Streaming Producer in HopsWorks JavaSparkContext jsc = new JavaSparkContext(sparkConf); String topic = HopsUtil.getTopic(); //Optional SparkProducer producer = HopsUtil.getSparkProducer(); Map<String, String> message = … sparkProducer.produce(message);
  • 25. Streaming Consumer in HopsWorks JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,Durations.seconds(2)); String topic = HopsUtil.getTopic(); //Optional String consumerGroup = HopsUtil.getConsumerGroup(); //Optional SparkConsumer consumer = HopsUtil.getSparkConsumer(jssc); JavaInputDStream<ConsumerRecord<String, byte[]>> messages = consumer.createDirectStream(); jssc.start();
  • 26. Less code to write https://github.com/hopshadoop/hops-kafka-examples Properties props = new Properties(); props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList); props.put(SCHEMA_REGISTRY_URL, restApp.restConnect); props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringSerializer.class); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, io.confluent.kafka.serializers.KafkaAvroSerializer.class); props.put("producer.type", "sync"); props.put("serializer.class","kafka.serializer.StringEncoder"); props.put("request.required.acks", "1"); props.put("ssl.keystore.location","/var/ssl/kafka.client.keystore.jks" ) props.put("ssl.keystore.password","test1234") props.put("ssl.key.password","test1234") ProducerConfig config = new ProducerConfig(props); String userSchema = "{"namespace": "example.avro", "type": "record", "name": "U ser"," + ""fields": [{"name": "name", "type": "string"}]}"; Schema.Parser parser = new Schema.Parser(); Schema schema = parser.parse(userSchema); GenericRecord avroRecord = new GenericData.Record(schema); avroRecord.put("name", "testUser"); Producer<String, String> producer = new Producer<String, String>(config); ProducerRecord<String, Object> message = new ProducerRecord<>(“topicName”, avroRecord ); producer.send(data); Lots of Hard-Coded Endpoints Here! SparkProducer producer = HopsUtil.getSparkProducer(); Map<String, String> message = … sparkProducer.produce(message); Massively Simplified Code for Secure Spark Streaming/Kafka
  • 27. Distributing Certs for Spark Streaming Alice@gmail.com 1. Launch Spark Job Distributed Database 2. Get certs, service endpoints YARN Private LocalResources Spark Streaming App 4. Materialize certs 3. YARN Job, config 6. Get Schema 7. Consume Produce 5. Read Certs Hopsworks HopsUtil 8. Read ACLs for authentication
  • 28. Multi-Tenant IoT Scenario Sensor Node Sensor Node Sensor Node Sensor Node Sensor Node Sensor Node Field Gateway StorageStorage AnalysisAnalysis IngestionIngestion ACMEACME Evil CorpEvil Corp IoT Cloud Platform DontBeEvil Corp DontBeEvil Corp
  • 29. IoT Scenario ACME DontBeEvil Corp Evil-Corp AWS Google Cloud Oracle Cloud User Apps control IoT Devices IoT Company: Analyze Data, Data Services for Clients ACME DontBeEvil Corp Evil Corp
  • 30. Cloud-Native Analytics Solution ACME S3S3 [Authorization] GCSGCS OracleOracleIoT Company Each customer needs its own Analytics Infrastructure Each customer needs its own Analytics Infrastructure Spark Streaming App
  • 31. IoT Company Project GatewayTopic Hopsworks Solution using Projects ACME ProjectACMETopic ACME Dataset Data Stream Analytics Reports
  • 32. Hopsworks Solution ACME Spark Streaming App [Authorized] ACME Dataset ACME Dataset ACME Topic ACME Analytics Reports ACME Analytics Reports Spark Batch Job ACME Project
  • 33. Karamel/Chef for Automated Installation Google Compute Engine BareMetal
  • 34. DEMO
  • 35. Hops Roadmap • HopsFS – HA support for Multi-Data-Center – Small files, 2-Level Erasure Coding • HopsYARN – Tensorflow with isolated GPUs • Hopsworks – P2P Dataset Sharing – Jupyter, Presto, Hive
  • 36. Summary • Hops is a new distribution of Hadoop – Tinker-friendly and open-source. • Hopsworks provides first-class support for Spark-Streaming-as-a-Service – With support services like Kafka, ELK Stack, Zeppelin, Grafana/InfluxDB.
  • 37. Hops Team Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, Robin Andersso, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Tiago Brito, Filotas Siskos. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu. Hops
  • 38. Thank You. We totally understand it’s going to be America First Spark Streaming first, but can we take this chance to say Hopsworks second! http://www.hops.io @hopshadoop Hops