COBOL to Apache Spark

Oct 28, 2017
Ville Misaki
System Strategy Department,
Rakuten Card Co., Ltd.

2
 Ville Misaki
 Senior Software Engineer
 Technology Strategy Group,
System Strategy Department,
Rakuten Card Co., Ltd
 Career
 15+ years; 3 years at Rakuten
 In Finland, the Netherlands, Japan
 Java (EE), Perl, C++, web systems, relational
databases, performance optimization & security

3
 Oracle OpenWorld 2017
 Case Study: Credit Card Core System
with Exalogic, Exadata, Oracle Cloud
Machine (CON4994) => Link
 JavaOne 2017
 Java EE 7 with Apache Spark for the
World’s Largest Credit Card Core
Systems (CON4998) => Link

4
Part 1 – Perfect Design
1. About Rakuten Card
2. Background
3. Platform Migration
4. Data Migration
5. Software Migration
Part 2 – Harsh Reliability
6. Performance
7. Apache Spark
8. Judgement Day
9. Into the Future

6
Unified brand, ecosystems around the world.

7
 Top-level credit card
company in Japan
 Core of Rakuten eco
systems.
 3rd position of total
transaction volume in 2016.
Growing rapidly.

9
Core Systems
Web Systems
External Systems
Intra Systems

10
Mainframe
 Old architecture – >20 years
 High cost
 Limited capacity and
performance
 Low maintainability
 Vendor locked-in
 Limited security
 For more details, check session
“From Mainframe to Java EE” at
16:00 today

11
Phase of the improvement – 3.0
1.0
Initial phase
2.0 In-house
development
3.0
Standardization
Outsource based,
just started.
Vendor locked-in.
In-house
development,
differentiate with
lower costs and
faster delivery.
Standardized
system
architecture, both
for hardware and
software.
Achieved
Current Standard
Architecture

13
Oracle Exalogic
+ Exadata + ZFS Servers
Mainframe
Old New
Core
Systems

14
 Financial de-facto standard
 Java EE compliant.
 Matured, from 1997.
 Financial de-facto standard
 ISO/IEC 9075 SQL compliant
 Matured, from 1983.
COBOL
Network
DB
App Server
Database
Old New
WebLogic Server
Oracle Database

16
ISAM
VSAM
NDB Oracle Database
Copy & Convert

17
 Data Conversion
 Network database to relational database
 ISAM/VSAM data to relational database
 Legacy Japanese character set to Unicode
 Fix data inconsistencies
 Scale
 Terabytes of live production data
 Less than 24 hours time

18
 Offline migration
 Freeze data during migration
 Full migration – not incremental
 Customers mostly unaffected
 Data & System migration
 At the same time
 Cannot be split into phases
Cached

19
ISAM
VSAM
NDB Oracle DatabaseISAM
VSAM
NDB
Mirror
Copy & Convert
Replication

21
Req.
Source
code
Appliction
Platform
Hardware
Reimplement
Convert
Emulate

22
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult to test
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
Requirements?

23
Pro
• Easy to test
• Easy to migrate
• Easy to test
Con
• Expensive
• Risky
• Difficult migration
• Low performance
2x Performance No regression Minimal downtime

24
Pro
• Easy to test
• Easy to migrate
• Easy to test
Con
• Expensive
• Risky
• Difficult migration
• Low performance
2x Performance No regression Minimal downtime

25
Japanese COBOL
Source code
Java Source code
Customized
source code
converter
 Convert from Japanese
COBOL to Java EE
 Keep original core
business logic

26
Java
From Web Systems,
For New Logic
COBOL
From Old System,
converted to Java
 Ease of migration, resource re-use
 Introduce power of Java EE
 Introduce converter from YPS to Java
“Dual Source Architecture”
Japanese
COBOL
 Japanese source code
 Almost abandoned
 No books, no community
Old New

27
New Logic
(Java EE)
Application Server
(Java EE)
Legacy Logic
(Mainframe)
Build
Deploy
Japanese
COBOL
Convert to
COBOL
Convert
to Java
COBOL
Java
Compile
WAR
Converter
 Two sources,
single binary
 Easy to operate
Java
Byte Code
Compile
Java

28
BIG-IP
Real-time Servers
(WebLogic)
Batch Servers
(Spark & Java)
Façade
Rich clients Façade
Façade
Intranet
External
Intra
Exadata
Mail
Form
BIG-IP
Façade
BIG-IP
External
customers
Scheduler
CoreBusinessLogicAPIs
Operation
terminal
Web
browser
Old New

29
Part 1 – Perfect Design
1. About Rakuten Card
2. Background
3. Platform Migration
4. Data Migration
5. Software Migration
Part 2 – Harsh Reliability
6. Performance
7. Apache Spark
8. Judgement Day
9. Into the Future

33
Start
Slow
Slow
 Batches are run as networks
 Hierarchical
 Critical path
 Time window

34
 Automatic code conversion
 COBOL program flow emulated in Java
 COBOL-like data structures in Java
 DB access logic
 Business logic built on network DB
 NDB and RDB are good at different tasks

35
 COBOL vs. Java
 Goto statement – imitation is complex
 Sub-program calls – heavy
 No local variables – tight coupling
 No libraries – copy&paste code
 Few shared data structures – copy&paste definition
 No shared enum/constant – magic numbers

36
 COBOL data structures
 Fixed length – hard-coded
 String-based
 Data block inside program
 Often thousands of fields
 Hierarchical fields
 Content is joined/split automatically
 Variable namespace under each parent
 Even five levels deep

38
 Logic optimized for NDB
 Read sequentially
 Data pre-sorted
 Data pre-formatted
 Emulate in RDB
 Uphill battle
NDB RDB
Search Slow Fast
Sequential Access Fast Slow
Sorting Slow Fast
Formatting Fast Slow

39
 New system must be faster
 Time until launch:
1 year

40
 Options?
 Redesign and re-implement from scratch
 Not feasible
 Optimize framework
 Limited effectiveness
 Parallelize batches
 Elastic brute-force

43
Cluster Node
Cluster Node
Cluster Node
Cluster Node
Cluster Node
Bootstrap
Scheduler
Cluster Node
SharedMemory

44
1. Making business logic parallel
 Independent processing
2. I/O
 Data transferred over network
3. Data ordering
 Shuffles

45
 Problem: input data rows are not independent!
 Red flags
 Fields not initialized for each row
 Code forks early (header & data?)
 Legacy code analysis
 Refactor
 Fields to local variables
 Extract data structures
 Initialize data for each row
 Run & see
321
3
2
1 Reference?

46
1. Group related rows together
2. Process header rows separately
3. Modify business logic

47
Group related rows together
 Custom data reader
 Multiple rows behave like one row
 Process each group row in a loop, on
the same node
 Pro
 Business logic not modified
 Con
 Relationships may be too complex
 Groups may grow too big
ID Data
1 …
1 …
2 …
3 …
3 …
4 …

48
Process header rows separately
 Run business logic for header rows first
 Collect result in NavigableMap
 Run business logic for data rows
 Initialize data from previous header
 floorKey(dataRowIndex)
 Pro
 Minimal changes to business logic
 Con
 Relationships may be too complex
ID Type Data
1 Head …
1 Data …
1 Data …
2 Head …
2 Data …
3 Head …
3 Data …

49
Modify business logic
 Row relationship could be removed, if it’s
 Unintentional (a bug)
 For unnecessary optimization
 Data that could be retrieved otherwise
 Pro
 High chance for good performance
 Con
 High chance for new bugs

50
 Input and output data must be shared
 Network storage
 How long does it take to copy 200 GB?
Transfer
Process
Transfer
Process
Transfer
Heavy
Process
Heavy
ProcessTransfer
Transfer Process

51
 Sequential batches rely on ordering
 Tricky to keep in Spark
 Safe operations: map, filter, zip
 Unsafe operations: join, group, sort
Process
Process
Process
Process
Process
Process
Shuffle
Process
Process
Process
Shuffle

52
 Good for
 Heavy processing
 Independent input data records
 One input, multiple outputs
 Unordered data
 Not so great for
 Little processing
 Dependencies between data records
 Merging multiple data sources

55
321
321Data
Saturday Sunday Monday

58
Next Phase
1.0
Initial phase
2.0 In-house
development
3.0
Standardization
4.0
Data Optimized
Outsource based,
just started.
Vendor locked-in.
In-house
development,
differentiate with
lower costs and
faster delivery.
Standardized
system
architecture, both
for hardware and
software.
Overwhelming
differentiation,
with enabling
architecture for
customer centric
service.
Achieved Next
Current Standard
Architecture

COBOL to Apache Spark

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie COBOL to Apache Spark

Ähnlich wie COBOL to Apache Spark (20)

Mehr von Rakuten Group, Inc.

Mehr von Rakuten Group, Inc. (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

COBOL to Apache Spark