SlideShare a Scribd company logo
1 of 35
JSUG勉強会〜SpringOne Platform 2016 報告会 - Takuya Saeki
• 所属
三井情報株式会社(MKI:Mitsui Knowledge Industry)
R&D部
• 興味
クラウドネイティブみたいなこと全般
佐伯 拓哉(Takuya Saeki)
自己紹介
Data Microservices in the Cloud
By Mark Pollack
HTTP
JMS
Kafka
File
HDFS
Cassandra
HAWQ
JDBC
Real Time Analytics
Spring XD Streams
5
Container Container
gpfdist
Cassandra
jms
http
ZooKeeper
Message Broker
XD Admin
stream1 = http | cassandra
stream2 = jms | gpfdist
On Metal/VMs
Spring XD Limitations
• How to scale up/down instances at runtime?
• How to upgrade/downgrade module instances at runtime?
• How to specify resources unique to each module, e.g. memory?
• Container architecture lead to parent/child class loader issues
• Too many libraries in root class path
6
Refactoring to a Microservice Architecture
7
「コンテナに組み込まれた複数モジュール」から
「単体で実行されるスタンドアロンアプリ」 へ
「独自ランタイム」 から
「プラットフォーム委任」 へ
Data Microservices
• Stand-alone, production grade applications focused on data processing
• Communicating with ‘lightweight mechanisms’ – messaging middleware
• ‘Event Driven’ - Microservices
8
“一つのことを行い、またそれをうまくやるプログラムを書け。”
“協調して動くプログラムを書け。”
“標準入出力(テキスト・ストリーム)を扱うプログラムを書け。
標準入出力は普遍的インターフェースなのだ。”
$ cat book.txt | tr ' ' ' ' | tr '[:upper:]' '[:lower:]' |
tr -d '[:punct:]' |
grep -v '[^a-z]‘ |
sort | uniq -c | sort -rn | head
Application Types
• Long lived Stream applications
• Spring Cloud Stream
• Short lived Task Applications
• Spring Cloud Task
9
Spring Cloud Stream
• Spring Boot based event-driven microservice framework
• Opinionated primitives for streaming applications
• Persistent Publish/Subscribe semantics
• Consumer Groups
• Partitioning
• Pluggable messaging middleware bindings
• Programming model focused on input/output objects
• Adaptable to different event processing APIs
10
Spring Cloud Stream Applications
11
Cassandra
java –jar cassandrasink-1.0.0.RELEASE.jar
--ingestQuery=<some cql>
--spring.cassandra.keyspace=tweetdata
--spring.cloud.stream.bindings
.input.destination=dataDest
http
java –jar twittersource-1.0.0.RELEASE.jar
--consumerKey=XYZ
--consumerSecret=ABC
--spring.cloud.stream.bindings
.output.destination=dataDest
Stream orchestration in Spring Cloud Data Flow
12
ingest = twitterstream | cassandraStream Definition
|Spring Cloud Stream Applications
Map DSL names to maven/docker artifacts
dataflow:>stream create --name ingest
--definition “twitterstream | cassandra” --deploy
Features: Persistent Messaging
13
HTTP log
s1.http DLQ
Message Broker
• Production Ready
• RabbitMQ
• Kafka
• Experimental
• JMS
• Google PubSub
• Planned
• Kinesis
s1 = http | log
Features: Named Destinations
14
HTTP
JMS
S3
myInputDestination
s1 = http > :myInputDestination
s2 = jms > :myInputDestination
s3 = aws-s3 > :myInputDestination
Features: Consumer Groups
15
HTTP
s1.http
HDFS
s1 = http | hdfs
s2 = :s1.http > counter
COUNTER
group: s1 group: s2
HDFS
Simple Real Time Analytics
16
tweets = twitterstream | hdfs
analytics = :ingest.twitterstream > field-value-counter --fieldName=lang
HTTP
s1.http
HDFS COUNTER
Data Flow Server
REST API
Spring Cloud Stream Programming Model
17
@EnableBinding(Processor.class)
public class TransformProcessor {
@StreamListener(“input”)
@SendTo(“output”)
public String transform(String s) {
return s.toUpperCase();
}
}
Spring Cloud Stream Programming Model
18
@EnableBinding(Processor.class)
public class TransformProcessor {
@StreamListener
@Output(“output”)
public Flux<String> transform(@Input(“input”) Flux<String> input) {
return input.map(s -> s.toUpperCase());
}
}
Spring Cloud Stream Programming Model
19
@EnableBinding(Processor.class)
public class TransformProcessor {
@StreamListener
@Output(“output”)
public Flux<WordCount> countWords(@Input("output") Flux<String> words) {
return words.window(ofSeconds(5), ofSeconds(1))
.flatMap(window -> window.groupBy(word -> word)
.flatMap(group -> group.reduce(0, (counter, word) -> counter + 1)
.map(count -> new WordCount(group.key(), count))));
}
}
Platform Runtimes
20
Docker Swarm Apache YARN
Apache Mesos + Marathon
Spring Cloud Data Flow Deployment Platforms
21
Data Flow Server
REST API
Deployer SPI
SCDF FloSCDF Shell
Spring Cloud Data Flow Streams
22
gpfdist
cassandra
jms
http
stream1 = http | cassandra
stream2 = jms | gpfdist
Message Broker
Data Flow Server DB
Platform Runtime
Deployment: Partitioning and Instance Count
23
http
http
work
work
work
hdfs
hdfs
hdfs
hdfs
LoadBalancer
stream create s1 --definition “http | work | hdfs”
stream deploy s1 --propertiesFile ingest.properties
app.http.count=2
app.work.count=3
app.hdfs.count=4
app.http.producer.partitionKeyExpression=payload.id
Deployment: Resource Management
24
http
http
work
work
work
app.work.spring.cloud.deployer.cloudfoundry.memory=2048
Spring Cloud Task
• Spring Boot based framework for short lived processes
• Auto-configuration provides a task repository and pluggable data source
• Result of each process persists beyond the life of the task for future
reporting
• Tasks can be any arbitrary short lived code
• Well integrated with Spring Batch
25
Task Orchestration in Spring Cloud Data Flow
26
>task create jdbc2hdfs –sql=‘select * from table’
>task launch jdbc2hdfs
jdbc2hdfs
Data Flow Server DB
Task Name
Start Time
End Time
Exit Code
Exit Message
Last Updated Time
Parameters
task-event
Message Broker
job-execution-events
step-execution-events
item-read-events
item-process-events
item-write-events
skip-events
Spring Cloud Data Flow Tasks
27
spark
Data Flow Server DB
http | task-launcher
sqoop
Message Broker
task-event
Platform Runtime
Spring Cloud Task Programming Model
28
@SpringBootApplication
@EnableTask
public class ExampleApplication {
@Bean
public CommandLineRunner commandLineRunner() {
return strings ->
System.out.println("Executed at :" +
new SimpleDateFormat().format(new Date()));
}
public static void main(String[] args) {
SpringApplication.run(ExampleApplication.class, args);
}
}
Provided Applications
• ~60 stream and task apps
• https://github.com/spring-cloud/spring-cloud-stream-app-starters
• https://github.com/spring-cloud/spring-cloud-task-app-starters/
• Customize provided apps - http://start-scs.cfapps.io/
• Create new stream/task apps - http://start.spring.io/
• Easy import of provided apps/tasks
• dataflow> app import --uri http://bit.ly/1-0-2-GA-stream-applications-kafka-maven
29
UI : Dashboard with Designer
30
XD to SCDF - Terminology
31
XD-Admin
Data Flow Server
(local, CF, YARN, k8s, Mesos)
XD-Container N/A
Modules Applications
Admin UI Dashboard
Message Bus Binders
Job Task
DEMO
32
Upcoming features
• Some ‘porting’ from XD
• Batch Job DSL + Designer
• Role based access
• Looking forward
• Spring Cloud Sleuth
• JavaDSL
• In-place application version upgrades with Spinnaker
• Application Groups
• Polyglot
• Expanded analytics with Redis and Python/R ecosystem
• More provided apps/tasks
33
Related Talks
• Building Resilient and Evolutionary Data Microservices – Tuesday 2:00pm
• Cloud Native Java – Tuesday 2:00pm
• Task Madness - Modern On Demand Processing – Tuesday 2:40pm
• Spinnaker – Land of a 1000 Builds – Tuesday 5:00pm
• Spring and Big Data – Tuesday 5:00pm
• Migrating from Spring XD to Spring Cloud Data Flow – Thursday 10:10am
• Orchestrate All the Things! with Spring Cloud Data Flow – Thursday 11:10am
• Cloud Native Streaming and Event-Driven Microservices – Wednesday 4:20pm
34
Get Started…
• http://cloud.spring.io/spring-cloud-dataflow/
• http://cloud.spring.io/spring-cloud-stream/
• http://cloud.spring.io/spring-cloud-task/
• https://github.com/spring-cloud/spring-cloud-deployer
35

More Related Content

What's hot

Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture  for Big Data AppGoogle Cloud Dataflow and lightweight Lambda Architecture  for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Trieu Nguyen
 
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’ Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
confluent
 

What's hot (20)

Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
Apache kafka meet_up_zurich_at_swissre_from_zero_to_hero_with_kafka_connect_2...
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture  for Big Data AppGoogle Cloud Dataflow and lightweight Lambda Architecture  for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Richmond kafka streams intro
Richmond kafka streams introRichmond kafka streams intro
Richmond kafka streams intro
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservices
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 
Kafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache KafkaKafka Streams: The Stream Processing Engine of Apache Kafka
Kafka Streams: The Stream Processing Engine of Apache Kafka
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017Unified Stream Processing at Scale with Apache Samza - BDS2017
Unified Stream Processing at Scale with Apache Samza - BDS2017
 
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’ Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
 

Similar to Data Microservices In The Cloud + 日本語コメント

Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 

Similar to Data Microservices In The Cloud + 日本語コメント (20)

Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015Machine Learning with H2O, Spark, and Python at Strata 2015
Machine Learning with H2O, Spark, and Python at Strata 2015
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Austin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkAustin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - Spark
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Data Microservices In The Cloud + 日本語コメント

  • 1. JSUG勉強会〜SpringOne Platform 2016 報告会 - Takuya Saeki
  • 2. • 所属 三井情報株式会社(MKI:Mitsui Knowledge Industry) R&D部 • 興味 クラウドネイティブみたいなこと全般 佐伯 拓哉(Takuya Saeki) 自己紹介
  • 3. Data Microservices in the Cloud By Mark Pollack
  • 5. Spring XD Streams 5 Container Container gpfdist Cassandra jms http ZooKeeper Message Broker XD Admin stream1 = http | cassandra stream2 = jms | gpfdist On Metal/VMs
  • 6. Spring XD Limitations • How to scale up/down instances at runtime? • How to upgrade/downgrade module instances at runtime? • How to specify resources unique to each module, e.g. memory? • Container architecture lead to parent/child class loader issues • Too many libraries in root class path 6
  • 7. Refactoring to a Microservice Architecture 7 「コンテナに組み込まれた複数モジュール」から 「単体で実行されるスタンドアロンアプリ」 へ 「独自ランタイム」 から 「プラットフォーム委任」 へ
  • 8. Data Microservices • Stand-alone, production grade applications focused on data processing • Communicating with ‘lightweight mechanisms’ – messaging middleware • ‘Event Driven’ - Microservices 8 “一つのことを行い、またそれをうまくやるプログラムを書け。” “協調して動くプログラムを書け。” “標準入出力(テキスト・ストリーム)を扱うプログラムを書け。 標準入出力は普遍的インターフェースなのだ。” $ cat book.txt | tr ' ' ' ' | tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | grep -v '[^a-z]‘ | sort | uniq -c | sort -rn | head
  • 9. Application Types • Long lived Stream applications • Spring Cloud Stream • Short lived Task Applications • Spring Cloud Task 9
  • 10. Spring Cloud Stream • Spring Boot based event-driven microservice framework • Opinionated primitives for streaming applications • Persistent Publish/Subscribe semantics • Consumer Groups • Partitioning • Pluggable messaging middleware bindings • Programming model focused on input/output objects • Adaptable to different event processing APIs 10
  • 11. Spring Cloud Stream Applications 11 Cassandra java –jar cassandrasink-1.0.0.RELEASE.jar --ingestQuery=<some cql> --spring.cassandra.keyspace=tweetdata --spring.cloud.stream.bindings .input.destination=dataDest http java –jar twittersource-1.0.0.RELEASE.jar --consumerKey=XYZ --consumerSecret=ABC --spring.cloud.stream.bindings .output.destination=dataDest
  • 12. Stream orchestration in Spring Cloud Data Flow 12 ingest = twitterstream | cassandraStream Definition |Spring Cloud Stream Applications Map DSL names to maven/docker artifacts dataflow:>stream create --name ingest --definition “twitterstream | cassandra” --deploy
  • 13. Features: Persistent Messaging 13 HTTP log s1.http DLQ Message Broker • Production Ready • RabbitMQ • Kafka • Experimental • JMS • Google PubSub • Planned • Kinesis s1 = http | log
  • 14. Features: Named Destinations 14 HTTP JMS S3 myInputDestination s1 = http > :myInputDestination s2 = jms > :myInputDestination s3 = aws-s3 > :myInputDestination
  • 15. Features: Consumer Groups 15 HTTP s1.http HDFS s1 = http | hdfs s2 = :s1.http > counter COUNTER group: s1 group: s2 HDFS
  • 16. Simple Real Time Analytics 16 tweets = twitterstream | hdfs analytics = :ingest.twitterstream > field-value-counter --fieldName=lang HTTP s1.http HDFS COUNTER Data Flow Server REST API
  • 17. Spring Cloud Stream Programming Model 17 @EnableBinding(Processor.class) public class TransformProcessor { @StreamListener(“input”) @SendTo(“output”) public String transform(String s) { return s.toUpperCase(); } }
  • 18. Spring Cloud Stream Programming Model 18 @EnableBinding(Processor.class) public class TransformProcessor { @StreamListener @Output(“output”) public Flux<String> transform(@Input(“input”) Flux<String> input) { return input.map(s -> s.toUpperCase()); } }
  • 19. Spring Cloud Stream Programming Model 19 @EnableBinding(Processor.class) public class TransformProcessor { @StreamListener @Output(“output”) public Flux<WordCount> countWords(@Input("output") Flux<String> words) { return words.window(ofSeconds(5), ofSeconds(1)) .flatMap(window -> window.groupBy(word -> word) .flatMap(group -> group.reduce(0, (counter, word) -> counter + 1) .map(count -> new WordCount(group.key(), count)))); } }
  • 20. Platform Runtimes 20 Docker Swarm Apache YARN Apache Mesos + Marathon
  • 21. Spring Cloud Data Flow Deployment Platforms 21 Data Flow Server REST API Deployer SPI SCDF FloSCDF Shell
  • 22. Spring Cloud Data Flow Streams 22 gpfdist cassandra jms http stream1 = http | cassandra stream2 = jms | gpfdist Message Broker Data Flow Server DB Platform Runtime
  • 23. Deployment: Partitioning and Instance Count 23 http http work work work hdfs hdfs hdfs hdfs LoadBalancer stream create s1 --definition “http | work | hdfs” stream deploy s1 --propertiesFile ingest.properties app.http.count=2 app.work.count=3 app.hdfs.count=4 app.http.producer.partitionKeyExpression=payload.id
  • 25. Spring Cloud Task • Spring Boot based framework for short lived processes • Auto-configuration provides a task repository and pluggable data source • Result of each process persists beyond the life of the task for future reporting • Tasks can be any arbitrary short lived code • Well integrated with Spring Batch 25
  • 26. Task Orchestration in Spring Cloud Data Flow 26 >task create jdbc2hdfs –sql=‘select * from table’ >task launch jdbc2hdfs jdbc2hdfs Data Flow Server DB Task Name Start Time End Time Exit Code Exit Message Last Updated Time Parameters task-event Message Broker job-execution-events step-execution-events item-read-events item-process-events item-write-events skip-events
  • 27. Spring Cloud Data Flow Tasks 27 spark Data Flow Server DB http | task-launcher sqoop Message Broker task-event Platform Runtime
  • 28. Spring Cloud Task Programming Model 28 @SpringBootApplication @EnableTask public class ExampleApplication { @Bean public CommandLineRunner commandLineRunner() { return strings -> System.out.println("Executed at :" + new SimpleDateFormat().format(new Date())); } public static void main(String[] args) { SpringApplication.run(ExampleApplication.class, args); } }
  • 29. Provided Applications • ~60 stream and task apps • https://github.com/spring-cloud/spring-cloud-stream-app-starters • https://github.com/spring-cloud/spring-cloud-task-app-starters/ • Customize provided apps - http://start-scs.cfapps.io/ • Create new stream/task apps - http://start.spring.io/ • Easy import of provided apps/tasks • dataflow> app import --uri http://bit.ly/1-0-2-GA-stream-applications-kafka-maven 29
  • 30. UI : Dashboard with Designer 30
  • 31. XD to SCDF - Terminology 31 XD-Admin Data Flow Server (local, CF, YARN, k8s, Mesos) XD-Container N/A Modules Applications Admin UI Dashboard Message Bus Binders Job Task
  • 33. Upcoming features • Some ‘porting’ from XD • Batch Job DSL + Designer • Role based access • Looking forward • Spring Cloud Sleuth • JavaDSL • In-place application version upgrades with Spinnaker • Application Groups • Polyglot • Expanded analytics with Redis and Python/R ecosystem • More provided apps/tasks 33
  • 34. Related Talks • Building Resilient and Evolutionary Data Microservices – Tuesday 2:00pm • Cloud Native Java – Tuesday 2:00pm • Task Madness - Modern On Demand Processing – Tuesday 2:40pm • Spinnaker – Land of a 1000 Builds – Tuesday 5:00pm • Spring and Big Data – Tuesday 5:00pm • Migrating from Spring XD to Spring Cloud Data Flow – Thursday 10:10am • Orchestrate All the Things! with Spring Cloud Data Flow – Thursday 11:10am • Cloud Native Streaming and Event-Driven Microservices – Wednesday 4:20pm 34
  • 35. Get Started… • http://cloud.spring.io/spring-cloud-dataflow/ • http://cloud.spring.io/spring-cloud-stream/ • http://cloud.spring.io/spring-cloud-task/ • https://github.com/spring-cloud/spring-cloud-deployer 35

Editor's Notes

  1. 表紙
  2. クラウドネイティブ ・・・カルチャ・組織変革、プラットフォーム、アプリケーション開発&デリバリ
  3. Mark Pollack ・・・ Spring Cloud DataFlowとSpring XDの開発リーダ 概要 ・Spring XDからSpring Cloud DataFlowになって変わったこと ・Spring Cloud DataFlow についての概略
  4. ETLのようにプロセスにフォーカスするのではなく ・データソースそのもの ・データをどこに格納するか について考えることが日常では多い。 例:モバイルゲームに使う情報を、他の社内システムからJMSを使ってデータを集める場合 データの流れに沿ってリアルタイムにモニタリング/解析、「今起きていること」についての洞察を得る。 →「レイテンシに問題がないか」「レスポンスタイムは正常か」  プラットフォームのヘルスモニタリングなども重要。 →さらに、システムを横断的に流れるデータそのものに焦点が当たる ・イベントタイプのバリエーション ・イベントの送信頻度 こういう本質的なことについて考えたい。 Spring Cloud DataFlowではこうした観点を持ってデータフローに関する課題解決を目指している。
  5. Spring XDについて ・UNIXライクなDSLでストリームを記述 ・モジュールはZooKeeperがオーケストレート ・コンテナでTomcatが動いていて、そこにアプリをデプロイ ・MessagingはMessage Brokerが取り持つ
  6. 次の世代の課題 ・スケールUP、スケールDown  インスタンスを増やすだけでも、ストリーム全体を一度壊して再作成 ・アプリケーション変更  同様に、処理を停止・再開してやる必要があった ・リソース  CPU, ディスク, メモリ, JVMメモリ、、指定方法がなかった これらは独自ランタイムの課題。 ・独自のXDコンテナというものを持ち続けることのリスク ・Tomcatベースでやり続けることのリスク 複数アプリを一つのコンテナ/ランタイムで動かすことが問題 ・ルート・クラスパスにたくさんのライブラリが含まれる ・特定のバージョンのモジュールは使えない といったユーザに対して多くの制約を強いることにもなる。
  7. Tomcatに複数アプリをデプロイするというのをやめて、スタンドアロンアプリをデプロイするように変えた ・httpのエンドポイントを持っていたとしても、Webアプリである必要はない ・これは大きな転換、モジュールにboot appを使うという方針に変わったということ プラットフォームの利用 ・独自ランタイムに多くの労力を割くのをやめて標準的なプラットフォームを使うようにした ・非機能要件などはプラットフォーム側に任せて本当に必要なことに集中することができる この2つの大きな変更を実現して、Spring Cloud Dataflowとしてリリース ・ユーザインターフェースやDSLはそのまま残してユーザがシームレスに移行できるようにしている
  8. [Martin Fowler] マイクロサービスの実体は、スタンドアロン・アプリケーション ・マイクロサービスではデータの運搬のされ方が変わる 何も考えずにマイクロサービスに分けると、メッセージ数が増えてインプットレートが高くなって遅延が起きる ・コミュニケーションに軽量なMessaging mechanisms ・ZeroMQ、RabbitMQのような軽量なMessaging Middlewareを使う マイクロサービスは「独断的に、外部のことを気にせずに作られて」いる ・Messaging Middlewareを使って相互にコミュニケートするように作ることで  マイクロサービス間でデータを運搬できる UNIX哲学 マイクロサービスアーキテクチャにはUNIX哲学が含まれている こういう信条を持って分散システムの設計をすれば、洗練されたアーキテクチャを作れるようになる 実際の書き方の例: ・データ処理も本質的にこれと同じことをやっている この書き方をSpring XDのときから実装、Spring Cloud Dataflowにも引き継がれている。
  9. Spring Cloud Dataflowの2つのアプリケーションタイプ と対応するフレームワーク XDは色々な要素を含む巨大プロジェクトだった →二つのフレームワークを別出しにした オーケストレーション部分は不要な場合には ストリーム処理の部分だけを使える
  10. メッセージングの仕組みはApache Kafkaのモデルに基づいている Kafkaから導入された仕組み: ・Persistent Pub/Sub  メッセージが消えずに保持、コンシューマーの都合がいい時に受け取れる ・Consumer Groups  同じデータを複数のコンシューマーで受け取る ・Partitioning  メッセージを分散させてアプリケーションをスケールさせる メッセージング・ミドルウェアの選択肢: Kafkaにインスパイアされているとは言っても他の製品も使える。 例えばRabbitMQ。 プログラミングモデル Springのエコシステムに矛盾のないシンプルなプログラミング・モデルになっている 例:昔の電話交換手のように、入力と出力だけにフォーカスしてケーブルでつなぐようにミドルウェアのことを気にしないで良い。
  11. Streamアプリは普通のSpring Bootアプリケーション ・bindings メッセージング・ミドルウェアとどうコミュニケーションするかを指定 output.destination、input.destinationに共通の名前を指定 メッセージング・ミドルウェアは何であっても同じ、ここでは気にする必要はない。
  12. Spring Cloud Data Flowを使ってオーケストレートするとどうなるか? ・モジュールとして使うストリームアプリはあらかじめSCDFに登録しておいたものが使える
  13. Spring Cloud Data Flowは ・メッセージ・ブローカー上に宛先を用意、管理してくれる。 ・Dead Letter Queueもミドルウェアに応じてトピックをみつけて送信してくれる。 対応しているメッセージングミドルウェアについてはスライドの通り
  14. 複数のソースからデータを集約することもできる
  15. ・同じデータを違うアプリへ同時に流す例 ・Consumer Groupではラウンドロビンされる
  16. ダッシュボードでチャートを表示してくれるという便利機能 集計結果をRedisに格納しておくと使える
  17. @EnableBinding ・Source ・Processor ・Sink
  18. Project Reactorのサポート
  19. プラットフォームが爆発的に普及してきているという状況 Spring Cloud Dataflowでは独自ランタイムを持つのではなく、プラットフォームに任せることにした
  20. ユーザ側 … REST API プラットフォーム側 … SPI OpenShiftはCommunity Contribution
  21. Data Flow Serverの隣のDBにはストリームの定義が格納 アプリケーションに加えてMessage Broker、DBも同じプラットフォーム上で稼働するのがよくある構成。 SCDFのすべての要素がプラットフォーム上で稼働するわけなので 全てプラットフォームのライフサイクルの元で運用されていくと言える
  22. ・インスタンス数の変更 ・パーティショニング・データのコントロール deployment propertyで指定 ・partition Key Expression  アウトプットを振り分けするときに使うデータを指定
  23. 全プラットフォームに共通的な、メモリ、ディスク、CPUを設定できる
  24. Taskアプリケーションは実行結果が成功したか、失敗したのかまで知る必要がある →Task repositoryで状態を管理 ・基本的に何でも動くが、特にSpring Batchとの統合に強み
  25. task create task launch →Data Flow ServerによってDBにタスクが登録 ・TaskアプリはStreamアプリと同じようにメッセージ・ブローカーと対話できる ・タスクのステータスの変化をストリームに通知してメール送信するというような使い方ができる ・Spring Batchのイベントもメッセージバスから取り入れて連携できる  例:バッチジョブのイベントを受け取るストリームを用意、エラーの数を集計→ダッシュボードに表示
  26. ・Stream内からTaskを実行 ・TaskとBatchの連携  公式サポートはSparkとScoop
  27. @EnableTask
  28. 特にデータ・スペースに関するもの XDのアプリケーションを全て移行できているわけではないが 来年中にはもっと充実させてAzure StorageやGoogle Strageにも対応していきたいとのこと。 start.spring.ioのfork スターターアプリを組み合わせてカスタマイズアプリを作ることができる 初めから新しいアプリケーションを作る場合は普通にstart.spring.ioを使えばいい app importで一式をインポート
  29. Webのダッシュボード、GUIのデザイナーが付属
  30. まとめ