SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Spark
Global Innovation Nights #1
Kawakami Tomoki @
Spark
Global Innovation Nights #1
Kawakami Tomoki @
Speed-up
business process
in enterprise
application
Agenda
- Profile
- About Worksap
- Our R&D history of distributed processing
- Case1: Accounting summary
- Case2: Salary calulation
Profile
Kawakami Tomoki, Engineer
Career
2012 Enter Works Applications
2014 Join new ERP project, AI WORKS
2015 Join distributed processing team
Work
- Platform development for distribute
processing
- Speed-up business processes
- Help communication between Japanese
and foreign engineers
# Not scientific major
# Hobby: Travel & Foreign language
Q. Why Japanese people person!
Q. Why Japanese people person!
This is “Global”
innovation night!
You are just
a Japanese
speaking English
Video
http://www3.nhk.or.jp/nhkworld/en/news/videos/20160601145647836/
A. I'm in the middle
of global development
A. I'm in the middle
of global development
19 English speakers 2 Japanese speakers
Global Innovation Nights - Spark
Develop and sell
ERP package software,
COMPANY & AI WORKS
Development
office in
Tokyo, Shanghai,
Singapore, India
Founded in 1996,
4,917 employees
(1st April, 2016)
Global Innovation Nights - Spark
High usability &
High speed
= 100ms
Using Cassandra,
Spark, Yarn, Kafka,
ElasticSearch etc.
Cloud native
application
A.I. built in
Screenshots
Screenshots
Screenshots
Our R&D history of distributed processing
Our R&D history
of
distributed
processing
2005
Release multi-node parallel cobol process for
salary calculation batch
2010
Cobol to Java conversion
Multi-thread processing framework for batch
2012
Hadoop verification for salary calculation
(RDB)
2013
Start R&D for AI WORKS, choose Cassandra
2014
Hadoop verification for financial summary
2015
Choose Spark
Develop salary calculation batch and financial
summary batch with spark
Develop platform for distributed processing
2016
More batches developed with spark
Our R&D history
of
distributed
processing
2005
Release multi-node, parallel cobol process for
salary calculation batch
2010
Cobol to Java conversion
Multi-thread processing framework for batch
2012
Hadoop verification for salary calculation (RDB)
2013
Start R&D for AI WORKS, choose Cassandra
2014
Hadoop verification for financial summary
2015
Choose Spark
Develop salary calculation batch and financial
summary batch with spark
Develop platform for distributed processing
2016
More batches developed with spark
Request from
user to speed-up
the batch
Similar
structure as
current spark
batch
Our R&D history
of
distributed
processing
2005
Release multi-node parallel cobol process for
salary calculation batch
2010
Cobol to Java conversion
Multi-thread processing framework for batch
2012
Hadoop verification for salary calculation (RDB)
2013
Start R&D for AI WORKS, choose Cassandra
2014
Hadoop verification for financial summary
2015
Choose Spark
Develop salary calculation batch and financial
summary batch with spark
Develop platform for distributed processing
2016
More batches developed with spark
Num of cores
increased
More efficient
to process
in one machine
User cannot
manage cluster
Our R&D history
of
distributed
processing
2005
Release multi-node parallel cobol process for
salary calculation batch
2010
Cobol to Java conversion
Multi-thread processing framework for batch
2012
Hadoop verification for salary calculation (RDB)
2013
Start R&D for AI WORKS, choose Cassandra
2014
Hadoop verification for financial summary
2015
Choose Spark
Develop salary calculation batch and financial
summary batch with spark
Develop platform for distributed processing
2016
More batches developed with spark
Not so match
with RDB
Our R&D history
of
distributed
processing
2005
Release multi-node parallel cobol process for
salary calculation batch
2010
Cobol to Java conversion
Multi-thread processing framework for batch
2012
Hadoop verification for salary calculation (RDB)
2013
Start R&D for AI WORKS, choose Cassandra
2014
Hadoop verification for financial summary
2015
Choose Spark
Develop salary calculation batch and financial
summary batch with spark
Develop platform for distributed processing
2016
More batches developed with spark
On cloud, it's easy
to get resource
on demand
Performance
will be more stable
with multi-nodes
than one node
OSS is getting
popular
in enterprise
Our R&D history
of
distributed
processing
2005
Release multi-node parallel cobol process for
salary calculation batch
2010
Cobol to Java conversion
Multi-thread processing framework for batch
2012
Hadoop verification for salary calculation (RDB)
2013
Start R&D for AI WORKS, choose Cassandra
2014
Hadoop verification for financial summary
2015
Choose Spark
Develop salary calculation batch and financial
summary batch with spark
Develop platform for distributed processing
2016
More batches developed with spark
Easy to learn
interface
Better
peformance
than Hadoop
Trend
Speed-up business process by spark
Case1: Accounting summary
Before
1,500,000 records
3 types of summary
→ 5 hrs
Behind
the summary
Summarize records to create financial reports
(B/S, P/L, Trial balance)
In SQL, “select sum(amount) from journals
group by item, section, product, xxx”
For each type, need to calculate all combinations
Example:
Num of Financial items = 1,000
Summary by section (7,000)
= 1,000 * 7,000 combinations
Summary by section & product (3,000)
= 1,000 * 7,000 * 3,000 combinations
After
1,500,000 records
3 types of summary
→ 5 hrs
→ 15 min (x 23)
#10 exexutors
Impact on user's
business
No overwork
to create
financial statement!
No merit,
No meaning
Process image
RecordRdd
.flatMapToPair() ← Add key by item, type
.reduceByKey() ← Sum up
.mapToPair() ← Add key by type, term
.groupByKey() ← collect to same partition
.foreach() ← Update DB
Techniques
1. Split tasks into proper unit.
One record in one journal
< one journal
< all journals for one summary type
< all journals for all summary type
2. Use cassandra counter
Pros: Higher concurrency
Cons: Not idempotent
3. Combine with stream processing
Only spark batch makes UX worse.
Case2: Salary calculation
Before
100,000 employees
1 month
→ 1 hrs
Behind
the calculation
Requires many information
to calculate many types of salary
Calculate
- Fixed salary
- Overtime
- Health insurance
- Residencial tax
- Pension
...etc.
Information
- Class
- Evaluation
- Attendance
- Overtime
- Paid leaves
- Family
- Address
- Previous income
...etc.
After
100,000 employees
1 month
→ 1 hrs
→ 1 min (x 60)
#10 exexutors * 4 cores
Impact on user's
business
Even if there is
trouble, can quickly
re-calculate
Can easily simulate
the salary
Process image
EmployeeKeyRdd
.map() ← Read data
.foreachPartitions() ← Calculate salary
Techniques
1. Use de-normalized table for Cassandra
Easier to get many data for a employee
2. Find best num of employee in a partition
To maximize the Cassandra throughput
By trial and error
About 2000 employees
Platform
Screenshots
Screenshots
Not bothered
by setting up
environment
Developer can
quickly try their
batch
Managing
resource
We explore the possiblity of Spark
to speed-up
more business processes

Weitere ähnliche Inhalte

Was ist angesagt?

Implementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teamsImplementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teamsLaurent PY
 
DevOps and Integrated Deployment
DevOps and Integrated DeploymentDevOps and Integrated Deployment
DevOps and Integrated DeploymentJoshua Drew
 
Tetap Agile dengan Arsitektur Monolith - Ziya El Arief
Tetap Agile dengan Arsitektur Monolith - Ziya El AriefTetap Agile dengan Arsitektur Monolith - Ziya El Arief
Tetap Agile dengan Arsitektur Monolith - Ziya El AriefDicodingEvent
 
DSG App Transformation Case Study
DSG App Transformation Case StudyDSG App Transformation Case Study
DSG App Transformation Case StudyVMware Tanzu
 
Xcode eXtreme Programming - #pragmamark 2014, Milan
Xcode eXtreme Programming - #pragmamark 2014, MilanXcode eXtreme Programming - #pragmamark 2014, Milan
Xcode eXtreme Programming - #pragmamark 2014, MilanGiulio Roggero
 
Top programming languages for DevOps
Top programming languages for DevOpsTop programming languages for DevOps
Top programming languages for DevOpsMetricoid Technology
 
Testing and beyond at startups
Testing and beyond at startupsTesting and beyond at startups
Testing and beyond at startupsMona Soni
 
Software Development Studio Cubex
Software Development Studio CubexSoftware Development Studio Cubex
Software Development Studio CubexDima Barr
 
Shippable DevOps platform overview
Shippable DevOps platform overviewShippable DevOps platform overview
Shippable DevOps platform overviewShippable
 
DevOps , A quick introduction
DevOps , A quick introductionDevOps , A quick introduction
DevOps , A quick introductionMostafa Hashkil
 
Embracing Change As An Open Source Product
Embracing Change As An Open Source ProductEmbracing Change As An Open Source Product
Embracing Change As An Open Source ProductShatarupa Nandi
 
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...Edureka!
 
Optimizing API Documentation: Some Guidelines and Effects
Optimizing API Documentation: Some Guidelines and EffectsOptimizing API Documentation: Some Guidelines and Effects
Optimizing API Documentation: Some Guidelines and EffectsPronovix
 

Was ist angesagt? (20)

Implementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teamsImplementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teams
 
DevOps and Integrated Deployment
DevOps and Integrated DeploymentDevOps and Integrated Deployment
DevOps and Integrated Deployment
 
Tetap Agile dengan Arsitektur Monolith - Ziya El Arief
Tetap Agile dengan Arsitektur Monolith - Ziya El AriefTetap Agile dengan Arsitektur Monolith - Ziya El Arief
Tetap Agile dengan Arsitektur Monolith - Ziya El Arief
 
WPMResume
WPMResumeWPMResume
WPMResume
 
Todd_Emelo
Todd_EmeloTodd_Emelo
Todd_Emelo
 
From Dev to Ops
From Dev to OpsFrom Dev to Ops
From Dev to Ops
 
DSG App Transformation Case Study
DSG App Transformation Case StudyDSG App Transformation Case Study
DSG App Transformation Case Study
 
DevOps seminar ppt
DevOps seminar ppt DevOps seminar ppt
DevOps seminar ppt
 
Xcode eXtreme Programming - #pragmamark 2014, Milan
Xcode eXtreme Programming - #pragmamark 2014, MilanXcode eXtreme Programming - #pragmamark 2014, Milan
Xcode eXtreme Programming - #pragmamark 2014, Milan
 
Dev ops
Dev opsDev ops
Dev ops
 
Top programming languages for DevOps
Top programming languages for DevOpsTop programming languages for DevOps
Top programming languages for DevOps
 
Testing and beyond at startups
Testing and beyond at startupsTesting and beyond at startups
Testing and beyond at startups
 
Software Development Studio Cubex
Software Development Studio CubexSoftware Development Studio Cubex
Software Development Studio Cubex
 
Shippable DevOps platform overview
Shippable DevOps platform overviewShippable DevOps platform overview
Shippable DevOps platform overview
 
DevOps , A quick introduction
DevOps , A quick introductionDevOps , A quick introduction
DevOps , A quick introduction
 
Embracing Change As An Open Source Product
Embracing Change As An Open Source ProductEmbracing Change As An Open Source Product
Embracing Change As An Open Source Product
 
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
What is DevOps | DevOps Introduction | DevOps Training | DevOps Tutorial | Ed...
 
DevOps
DevOpsDevOps
DevOps
 
Optimizing API Documentation: Some Guidelines and Effects
Optimizing API Documentation: Some Guidelines and EffectsOptimizing API Documentation: Some Guidelines and Effects
Optimizing API Documentation: Some Guidelines and Effects
 
What is DevOps?
What is DevOps?What is DevOps?
What is DevOps?
 

Andere mochten auch

Kubernetesを触ってみた
Kubernetesを触ってみたKubernetesを触ってみた
Kubernetesを触ってみたKazuto Kusama
 
Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Works Applications
 
Kubernetes - how to orchestrate containers
Kubernetes - how to orchestrate containersKubernetes - how to orchestrate containers
Kubernetes - how to orchestrate containersinovex GmbH
 
情報セキュリティと標準化I 第6回-公開用
情報セキュリティと標準化I 第6回-公開用情報セキュリティと標準化I 第6回-公開用
情報セキュリティと標準化I 第6回-公開用Ruo Ando
 
SensorBee: Stream Processing Engine in IoT
SensorBee: Stream Processing Engine in IoTSensorBee: Stream Processing Engine in IoT
SensorBee: Stream Processing Engine in IoTDaisuke Tanaka
 
SensorBeeでChainerをプラグインとして使う
SensorBeeでChainerをプラグインとして使うSensorBeeでChainerをプラグインとして使う
SensorBeeでChainerをプラグインとして使うDaisuke Tanaka
 
ストリーム処理とSensorBee
ストリーム処理とSensorBeeストリーム処理とSensorBee
ストリーム処理とSensorBeeDaisuke Tanaka
 
形式手法とalloyの紹介
形式手法とalloyの紹介形式手法とalloyの紹介
形式手法とalloyの紹介Daisuke Tanaka
 
ドメイン駆動設計 基本を理解する
ドメイン駆動設計 基本を理解するドメイン駆動設計 基本を理解する
ドメイン駆動設計 基本を理解する増田 亨
 

Andere mochten auch (12)

Demystifying kubernetes
Demystifying kubernetesDemystifying kubernetes
Demystifying kubernetes
 
Kubernetesを触ってみた
Kubernetesを触ってみたKubernetesを触ってみた
Kubernetesを触ってみた
 
Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)
 
Kubernetes - how to orchestrate containers
Kubernetes - how to orchestrate containersKubernetes - how to orchestrate containers
Kubernetes - how to orchestrate containers
 
情報セキュリティと標準化I 第6回-公開用
情報セキュリティと標準化I 第6回-公開用情報セキュリティと標準化I 第6回-公開用
情報セキュリティと標準化I 第6回-公開用
 
SensorBee: Stream Processing Engine in IoT
SensorBee: Stream Processing Engine in IoTSensorBee: Stream Processing Engine in IoT
SensorBee: Stream Processing Engine in IoT
 
20141001
2014100120141001
20141001
 
SensorBeeでChainerをプラグインとして使う
SensorBeeでChainerをプラグインとして使うSensorBeeでChainerをプラグインとして使う
SensorBeeでChainerをプラグインとして使う
 
ストリーム処理とSensorBee
ストリーム処理とSensorBeeストリーム処理とSensorBee
ストリーム処理とSensorBee
 
形式手法とalloyの紹介
形式手法とalloyの紹介形式手法とalloyの紹介
形式手法とalloyの紹介
 
SensorBeeのご紹介
SensorBeeのご紹介SensorBeeのご紹介
SensorBeeのご紹介
 
ドメイン駆動設計 基本を理解する
ドメイン駆動設計 基本を理解するドメイン駆動設計 基本を理解する
ドメイン駆動設計 基本を理解する
 

Ähnlich wie Global Innovation Nights - Spark

Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on HadoopSenturus
 
QuantiQ TEQ Day : Dynamics NAV
QuantiQ TEQ Day : Dynamics NAVQuantiQ TEQ Day : Dynamics NAV
QuantiQ TEQ Day : Dynamics NAVQuantiQ Technology
 
BALWANT SINGH_RESUME
BALWANT SINGH_RESUMEBALWANT SINGH_RESUME
BALWANT SINGH_RESUMEBalwant Singh
 
Adinarayana_Resume
Adinarayana_ResumeAdinarayana_Resume
Adinarayana_ResumeAdi Narayana
 
Going open source with small teams
Going open source with small teamsGoing open source with small teams
Going open source with small teamsJamie Thomas
 
DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...
DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...
DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...Gene Kim
 
Ankit agrawal cognos report_developer
Ankit agrawal cognos report_developerAnkit agrawal cognos report_developer
Ankit agrawal cognos report_developerAnkit Agrawal
 
Ankit agrawal cognos report_developer
Ankit agrawal cognos report_developerAnkit agrawal cognos report_developer
Ankit agrawal cognos report_developerAnkit Agrawal
 
Anjan SAP ABAP Resume with PAN Card
Anjan SAP ABAP Resume with PAN CardAnjan SAP ABAP Resume with PAN Card
Anjan SAP ABAP Resume with PAN CardAnjan Bera
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at MyplanetDaniel Zivkovic
 
SOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMS
SOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMSSOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMS
SOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMSAndrea Fontana
 
Chirag Gupta SAP 66 -consistent with all
Chirag Gupta SAP 66 -consistent with allChirag Gupta SAP 66 -consistent with all
Chirag Gupta SAP 66 -consistent with allChirag Gupta
 
Application Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and SucceedApplication Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and SucceedVMware Tanzu
 
Iman Mukhopadhyay_Resume
Iman Mukhopadhyay_ResumeIman Mukhopadhyay_Resume
Iman Mukhopadhyay_ResumeIman Mukherjee
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...GameCamp
 

Ähnlich wie Global Innovation Nights - Spark (20)

Running Cognos on Hadoop
Running Cognos on HadoopRunning Cognos on Hadoop
Running Cognos on Hadoop
 
Divya
DivyaDivya
Divya
 
QuantiQ TEQ Day : Dynamics NAV
QuantiQ TEQ Day : Dynamics NAVQuantiQ TEQ Day : Dynamics NAV
QuantiQ TEQ Day : Dynamics NAV
 
BALWANT SINGH_RESUME
BALWANT SINGH_RESUMEBALWANT SINGH_RESUME
BALWANT SINGH_RESUME
 
Resume - Nagaraj G B
Resume - Nagaraj G BResume - Nagaraj G B
Resume - Nagaraj G B
 
Adinarayana_Resume
Adinarayana_ResumeAdinarayana_Resume
Adinarayana_Resume
 
Sudipta Ghosh
Sudipta GhoshSudipta Ghosh
Sudipta Ghosh
 
Going open source with small teams
Going open source with small teamsGoing open source with small teams
Going open source with small teams
 
DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...
DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...
DOES16 London - Gebrian uit de Bulten & Vincent van Kooten - The Road to Enab...
 
Resume
ResumeResume
Resume
 
Ankit agrawal cognos report_developer
Ankit agrawal cognos report_developerAnkit agrawal cognos report_developer
Ankit agrawal cognos report_developer
 
Ankit agrawal cognos report_developer
Ankit agrawal cognos report_developerAnkit agrawal cognos report_developer
Ankit agrawal cognos report_developer
 
Anjan SAP ABAP Resume with PAN Card
Anjan SAP ABAP Resume with PAN CardAnjan SAP ABAP Resume with PAN Card
Anjan SAP ABAP Resume with PAN Card
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
 
Resume_Swarnali
Resume_SwarnaliResume_Swarnali
Resume_Swarnali
 
SOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMS
SOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMSSOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMS
SOCIALIZE YOUR SAP ERP THROUGH INTEGRATE D DIGITAL EXPERIENCE PLATFORMS
 
Chirag Gupta SAP 66 -consistent with all
Chirag Gupta SAP 66 -consistent with allChirag Gupta SAP 66 -consistent with all
Chirag Gupta SAP 66 -consistent with all
 
Application Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and SucceedApplication Migration: How to Start, Scale and Succeed
Application Migration: How to Start, Scale and Succeed
 
Iman Mukhopadhyay_Resume
Iman Mukhopadhyay_ResumeIman Mukhopadhyay_Resume
Iman Mukhopadhyay_Resume
 
Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...Building the BI system and analytics capabilities at the company based on Rea...
Building the BI system and analytics capabilities at the company based on Rea...
 

Mehr von Works Applications

Gitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるGitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるWorks Applications
 
Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Works Applications
 
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫Works Applications
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...Works Applications
 
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointCassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointWorks Applications
 
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善Works Applications
 

Mehr von Works Applications (8)

Gitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるGitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れる
 
Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器
 
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
 
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak pointCassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point
 
形態素解析
形態素解析形態素解析
形態素解析
 
Erpと自然言語処理
Erpと自然言語処理Erpと自然言語処理
Erpと自然言語処理
 
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
 

Kürzlich hochgeladen

Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid BodyAhmadHajasad2
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging systemgokuldongala
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvementVijayMuni2
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxNaveenVerma126
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptxMUKULKUMAR210
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxwendy cai
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technologyabdulkadirmukarram03
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfGiovanaGhasary1
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...soginsider
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingMarian Marinov
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxKISHAN KUMAR
 

Kürzlich hochgeladen (20)

Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
solar wireless electric vechicle charging system
solar wireless electric vechicle charging systemsolar wireless electric vechicle charging system
solar wireless electric vechicle charging system
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
Lecture 4 .pdf
Lecture 4                              .pdfLecture 4                              .pdf
Lecture 4 .pdf
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvement
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptx
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptx
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technology
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdf
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptx
 

Global Innovation Nights - Spark

  • 1. Spark Global Innovation Nights #1 Kawakami Tomoki @
  • 2. Spark Global Innovation Nights #1 Kawakami Tomoki @ Speed-up business process in enterprise application
  • 3. Agenda - Profile - About Worksap - Our R&D history of distributed processing - Case1: Accounting summary - Case2: Salary calulation
  • 4. Profile Kawakami Tomoki, Engineer Career 2012 Enter Works Applications 2014 Join new ERP project, AI WORKS 2015 Join distributed processing team Work - Platform development for distribute processing - Speed-up business processes - Help communication between Japanese and foreign engineers # Not scientific major # Hobby: Travel & Foreign language
  • 5. Q. Why Japanese people person!
  • 6. Q. Why Japanese people person! This is “Global” innovation night! You are just a Japanese speaking English
  • 8. A. I'm in the middle of global development
  • 9. A. I'm in the middle of global development 19 English speakers 2 Japanese speakers
  • 11. Develop and sell ERP package software, COMPANY & AI WORKS Development office in Tokyo, Shanghai, Singapore, India Founded in 1996, 4,917 employees (1st April, 2016)
  • 13. High usability & High speed = 100ms Using Cassandra, Spark, Yarn, Kafka, ElasticSearch etc. Cloud native application A.I. built in
  • 17. Our R&D history of distributed processing
  • 18. Our R&D history of distributed processing 2005 Release multi-node parallel cobol process for salary calculation batch 2010 Cobol to Java conversion Multi-thread processing framework for batch 2012 Hadoop verification for salary calculation (RDB) 2013 Start R&D for AI WORKS, choose Cassandra 2014 Hadoop verification for financial summary 2015 Choose Spark Develop salary calculation batch and financial summary batch with spark Develop platform for distributed processing 2016 More batches developed with spark
  • 19. Our R&D history of distributed processing 2005 Release multi-node, parallel cobol process for salary calculation batch 2010 Cobol to Java conversion Multi-thread processing framework for batch 2012 Hadoop verification for salary calculation (RDB) 2013 Start R&D for AI WORKS, choose Cassandra 2014 Hadoop verification for financial summary 2015 Choose Spark Develop salary calculation batch and financial summary batch with spark Develop platform for distributed processing 2016 More batches developed with spark Request from user to speed-up the batch Similar structure as current spark batch
  • 20. Our R&D history of distributed processing 2005 Release multi-node parallel cobol process for salary calculation batch 2010 Cobol to Java conversion Multi-thread processing framework for batch 2012 Hadoop verification for salary calculation (RDB) 2013 Start R&D for AI WORKS, choose Cassandra 2014 Hadoop verification for financial summary 2015 Choose Spark Develop salary calculation batch and financial summary batch with spark Develop platform for distributed processing 2016 More batches developed with spark Num of cores increased More efficient to process in one machine User cannot manage cluster
  • 21. Our R&D history of distributed processing 2005 Release multi-node parallel cobol process for salary calculation batch 2010 Cobol to Java conversion Multi-thread processing framework for batch 2012 Hadoop verification for salary calculation (RDB) 2013 Start R&D for AI WORKS, choose Cassandra 2014 Hadoop verification for financial summary 2015 Choose Spark Develop salary calculation batch and financial summary batch with spark Develop platform for distributed processing 2016 More batches developed with spark Not so match with RDB
  • 22. Our R&D history of distributed processing 2005 Release multi-node parallel cobol process for salary calculation batch 2010 Cobol to Java conversion Multi-thread processing framework for batch 2012 Hadoop verification for salary calculation (RDB) 2013 Start R&D for AI WORKS, choose Cassandra 2014 Hadoop verification for financial summary 2015 Choose Spark Develop salary calculation batch and financial summary batch with spark Develop platform for distributed processing 2016 More batches developed with spark On cloud, it's easy to get resource on demand Performance will be more stable with multi-nodes than one node OSS is getting popular in enterprise
  • 23. Our R&D history of distributed processing 2005 Release multi-node parallel cobol process for salary calculation batch 2010 Cobol to Java conversion Multi-thread processing framework for batch 2012 Hadoop verification for salary calculation (RDB) 2013 Start R&D for AI WORKS, choose Cassandra 2014 Hadoop verification for financial summary 2015 Choose Spark Develop salary calculation batch and financial summary batch with spark Develop platform for distributed processing 2016 More batches developed with spark Easy to learn interface Better peformance than Hadoop Trend
  • 26. Before 1,500,000 records 3 types of summary → 5 hrs
  • 27. Behind the summary Summarize records to create financial reports (B/S, P/L, Trial balance) In SQL, “select sum(amount) from journals group by item, section, product, xxx” For each type, need to calculate all combinations Example: Num of Financial items = 1,000 Summary by section (7,000) = 1,000 * 7,000 combinations Summary by section & product (3,000) = 1,000 * 7,000 * 3,000 combinations
  • 28. After 1,500,000 records 3 types of summary → 5 hrs → 15 min (x 23) #10 exexutors
  • 29. Impact on user's business No overwork to create financial statement! No merit, No meaning
  • 30. Process image RecordRdd .flatMapToPair() ← Add key by item, type .reduceByKey() ← Sum up .mapToPair() ← Add key by type, term .groupByKey() ← collect to same partition .foreach() ← Update DB
  • 31. Techniques 1. Split tasks into proper unit. One record in one journal < one journal < all journals for one summary type < all journals for all summary type 2. Use cassandra counter Pros: Higher concurrency Cons: Not idempotent 3. Combine with stream processing Only spark batch makes UX worse.
  • 34. Behind the calculation Requires many information to calculate many types of salary Calculate - Fixed salary - Overtime - Health insurance - Residencial tax - Pension ...etc. Information - Class - Evaluation - Attendance - Overtime - Paid leaves - Family - Address - Previous income ...etc.
  • 35. After 100,000 employees 1 month → 1 hrs → 1 min (x 60) #10 exexutors * 4 cores
  • 36. Impact on user's business Even if there is trouble, can quickly re-calculate Can easily simulate the salary
  • 37. Process image EmployeeKeyRdd .map() ← Read data .foreachPartitions() ← Calculate salary
  • 38. Techniques 1. Use de-normalized table for Cassandra Easier to get many data for a employee 2. Find best num of employee in a partition To maximize the Cassandra throughput By trial and error About 2000 employees
  • 41. Screenshots Not bothered by setting up environment Developer can quickly try their batch Managing resource
  • 42. We explore the possiblity of Spark to speed-up more business processes