"What does it take to transform a legacy mainframe COBOL system to state-of-the-art Java EE platform? How the Apache Spark clustering framework fits in all of this? Attend this session to find out, with concrete solutions to some of the major problems of turning a procedural program object-oriented, and parallelizing sequential processing."
1. Oct 28, 2017
Ville Misaki
System Strategy Department,
Rakuten Card Co., Ltd.
2. 2
Ville Misaki
Senior Software Engineer
Technology Strategy Group,
System Strategy Department,
Rakuten Card Co., Ltd
Career
15+ years; 3 years at Rakuten
In Finland, the Netherlands, Japan
Java (EE), Perl, C++, web systems, relational
databases, performance optimization & security
3. 3
Oracle OpenWorld 2017
Case Study: Credit Card Core System
with Exalogic, Exadata, Oracle Cloud
Machine (CON4994) => Link
JavaOne 2017
Java EE 7 with Apache Spark for the
World’s Largest Credit Card Core
Systems (CON4998) => Link
4. 4
Part 1 – Perfect Design
1. About Rakuten Card
2. Background
3. Platform Migration
4. Data Migration
5. Software Migration
Part 2 – Harsh Reliability
6. Performance
7. Apache Spark
8. Judgement Day
9. Into the Future
10. 10
Mainframe
Old architecture – >20 years
High cost
Limited capacity and
performance
Low maintainability
Vendor locked-in
Limited security
For more details, check session
“From Mainframe to Java EE” at
16:00 today
11. 11
Phase of the improvement – 3.0
1.0
Initial phase
2.0 In-house
development
3.0
Standardization
Outsource based,
just started.
Vendor locked-in.
In-house
development,
differentiate with
lower costs and
faster delivery.
Standardized
system
architecture, both
for hardware and
software.
Achieved
Current Standard
Architecture
14. 14
Financial de-facto standard
Java EE compliant.
Matured, from 1997.
Financial de-facto standard
ISO/IEC 9075 SQL compliant
Matured, from 1983.
COBOL
Network
DB
App Server
Database
Old New
WebLogic Server
Oracle Database
17. 17
Data Conversion
Network database to relational database
ISAM/VSAM data to relational database
Legacy Japanese character set to Unicode
Fix data inconsistencies
Scale
Terabytes of live production data
Less than 24 hours time
18. 18
Offline migration
Freeze data during migration
Full migration – not incremental
Customers mostly unaffected
Data & System migration
At the same time
Cannot be split into phases
Cached
22. 22
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult to test
• Development unchanged
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
Requirements?
23. 23
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult migration
• Development unchanged
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
2x Performance No regression Minimal downtime
24. 24
Reimplement Emulate Convert
Pro
• Optimal performance
• Low maintenance cost
• Development unchanged
• Easy to test
• Easy to migrate
• Flexible cost vs. schedule
• Case-by-case fixes
• Easy to test
Con
• Expensive
• Takes a long time
• Risky
• Difficult migration
• Development unchanged
• Low performance
• Future questionable
• Legacy code remains
• Low performance points
need to be addressed
2x Performance No regression Minimal downtime
25. 25
Japanese COBOL
Source code
Java Source code
Customized
source code
converter
Convert from Japanese
COBOL to Java EE
Keep original core
business logic
26. 26
Java
From Web Systems,
For New Logic
COBOL
From Old System,
converted to Java
Ease of migration, resource re-use
Introduce power of Java EE
Introduce converter from YPS to Java
“Dual Source Architecture”
Japanese
COBOL
Japanese source code
Almost abandoned
No books, no community
Old New
27. 27
New Logic
(Java EE)
Application Server
(Java EE)
Legacy Logic
(Mainframe)
Build
Deploy
Japanese
COBOL
Convert to
COBOL
Convert
to Java
COBOL
Java
Compile
WAR
Converter
Two sources,
single binary
Easy to operate
Java
Byte Code
Compile
Java
28. 28
BIG-IP
Real-time Servers
(WebLogic)
Batch Servers
(Spark & Java)
Façade
Rich clients Façade
Façade
Intranet
External
Intra
Exadata
Mail
Form
BIG-IP
Façade
BIG-IP
External
customers
Scheduler
CoreBusinessLogicAPIs
Operation
terminal
Web
browser
Old New
29. 29
Part 1 – Perfect Design
1. About Rakuten Card
2. Background
3. Platform Migration
4. Data Migration
5. Software Migration
Part 2 – Harsh Reliability
6. Performance
7. Apache Spark
8. Judgement Day
9. Into the Future
34. 34
Automatic code conversion
COBOL program flow emulated in Java
COBOL-like data structures in Java
DB access logic
Business logic built on network DB
NDB and RDB are good at different tasks
35. 35
COBOL vs. Java
Goto statement – imitation is complex
Sub-program calls – heavy
No local variables – tight coupling
No libraries – copy&paste code
Few shared data structures – copy&paste definition
No shared enum/constant – magic numbers
36. 36
COBOL data structures
Fixed length – hard-coded
String-based
Data block inside program
Often thousands of fields
Hierarchical fields
Content is joined/split automatically
Variable namespace under each parent
Even five levels deep
38. 38
Logic optimized for NDB
Read sequentially
Data pre-sorted
Data pre-formatted
Emulate in RDB
Uphill battle
NDB RDB
Search Slow Fast
Sequential Access Fast Slow
Sorting Slow Fast
Formatting Fast Slow
44. 44
1. Making business logic parallel
Independent processing
2. I/O
Data transferred over network
3. Data ordering
Shuffles
45. 45
Problem: input data rows are not independent!
Red flags
Fields not initialized for each row
Code forks early (header & data?)
Legacy code analysis
Refactor
Fields to local variables
Extract data structures
Initialize data for each row
Run & see
321
3
2
1 Reference?
46. 46
1. Group related rows together
2. Process header rows separately
3. Modify business logic
47. 47
Group related rows together
Custom data reader
Multiple rows behave like one row
Process each group row in a loop, on
the same node
Pro
Business logic not modified
Con
Relationships may be too complex
Groups may grow too big
ID Data
1 …
1 …
2 …
3 …
3 …
4 …
48. 48
Process header rows separately
Run business logic for header rows first
Collect result in NavigableMap
Run business logic for data rows
Initialize data from previous header
floorKey(dataRowIndex)
Pro
Minimal changes to business logic
Con
Relationships may be too complex
ID Type Data
1 Head …
1 Data …
1 Data …
2 Head …
2 Data …
3 Head …
3 Data …
49. 49
Modify business logic
Row relationship could be removed, if it’s
Unintentional (a bug)
For unnecessary optimization
Data that could be retrieved otherwise
Pro
High chance for good performance
Con
High chance for new bugs
50. 50
Input and output data must be shared
Network storage
How long does it take to copy 200 GB?
Transfer
Process
Transfer
Process
Transfer
Heavy
Process
Heavy
ProcessTransfer
Transfer Process
51. 51
Sequential batches rely on ordering
Tricky to keep in Spark
Safe operations: map, filter, zip
Unsafe operations: join, group, sort
Process
Process
Process
Process
Process
Process
Shuffle
Process
Process
Process
Shuffle
52. 52
Good for
Heavy processing
Independent input data records
One input, multiple outputs
Unordered data
Not so great for
Little processing
Dependencies between data records
Merging multiple data sources
58. 58
Next Phase
1.0
Initial phase
2.0 In-house
development
3.0
Standardization
4.0
Data Optimized
Outsource based,
just started.
Vendor locked-in.
In-house
development,
differentiate with
lower costs and
faster delivery.
Standardized
system
architecture, both
for hardware and
software.
Overwhelming
differentiation,
with enabling
architecture for
customer centric
service.
Achieved Next
Current Standard
Architecture