A Journey Into the Emotions of Software Developers
[RakutenTechConf2013] [C-1] Rakuten new infrastructure
1. Rakuten new infrastructure
- Why we try to make new things Vol.01 Oct/26/2013 @Rakuten Technology conference 2013
Osamu Iwasaki
Vice Group Manager
Server Platform Group / Network Administration Group
Global Infrastructure Development Department, Rakuten, Inc.
http://www.rakuten.co.jp/
2. Self introduction
Name : Osamu Iwasaki
Role : Network / Cloud
Eng & Mgr
Twitter @osamuiwasaki
Skype osamu.iwasaki
Vice Group Manager
Server Platform Group / Network administration Group
Global Infrastructure Development Department
And Committee member of JANOG (JApan Network Operators’ Group)
Project Manager / Desinger of
Rakuten Private Cloud system(RIaaS) & New data center fabric network.
2
3. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
3
4. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
4
5. Rakuten’s infrastructure
Several DC location around Tokyo area
Each location is active Data Center
Cost efficiency, Scalability, Disaster recovery
Hugh server resources for Rakuten Ichiba
services.
RIaaS which is Rakuten private cloud system
-> Automation tools.
-> Huge resources.
Rakuten scalable Fabric network
-> Easy to scale out.
5
6. Our traffic history
(Gbps)
160
140
Victory Sale
120
100
80
Super Sales
60
40
20
0
Peak traffic during Victory Sales, over 140Gbps
which was about over 5% of Japan Internet traffic.
6
7. Network traffic trend from 2012/Jan(SS traffic focus)
(Gbps)
Victory
Sales
160
140
Super
Sales
120
Super
Sales
100
80
Super
Sales
Super
Super
Sales
Sales
60
40
20
0
SuperSale
CDN
2012/
Jun
2012/De
c
2013/Ma 2013/Ju
r
n
2013/Se
p
2013/Oc
t(VS)
60G
78.9G
69.1G
75.8G
73.7G
127.6G
RakutenDC
12.7G
14.2G
12.8G
12.5G
11.7G
12.9G
Total
72.7G
93.1G
81.9G
88.3G
85.4G
140.5G
7
9. RIaaS, our private cloud history
12000
Resource
transfer to new
Data center
10000
1194
8000
3308
2155
1180
6000
Enhancements
doubles for
SuperSale
4000
4228
365
284
213
344
934
317
1754
2411
2247
1268
96
2241
405
2108
1479
2229
1956
1919
2000
1287
1232
130
20
158
200
29
168
231
76
251
Jun
0
108
12
142
Jul
Aug
Sep
Oct
1393
1406
2173
805
1052
474
216
358
480
1077
266
453
365
529
451
642
609
765
1109
Nov
Dec
Jan
Feb
Mar
2085
1577
Apr
1598
May
1599
Jun
1632
248
741
2
418
539
0
Jul
Aug
Sep
About 1year ago, we started from 300VMs.
But now, around 10000VMs are running for
Rakuten Ichiba services; YoY over 30 times !!!
9
10. Number of Setup Servers
1400
Physical Machines
1200
Virtualization
90% over Virtual
1000
800
600
400
200
0
Over these past 2 years, our server construction
shifted to Virtual from Physical.
10
11. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
11
13. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
13
15. Motivate to change
Simplenization
Toolnization
Automation
Cost reduction
and .. Technology challenge !!!
15
16. Motivate to change
Simplenization
Toolnization
Automation
Cost reduction
and .. Technology challenge !!!
A challenge sprits is the most
important things !
16
17. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
17
18. Concept
internet
Rakuten DC global network
Rakuten
service network
gb/at/db-net
Subsidiary
Rakuten XXX
Subsidiary
Rakuten XXX
• Rakuten DC global network is the Data Center side global IP network.
RIaaS
• Internet connectivity will be provided from Rakuten global network to
each networks include Rakuten environments.
Rakuten shared infra exchange
Storage
Backup
BigData
etc
18
19. Concept
internet
Rakuten DC global network
• Shared services( RIaaS, Storage, Backup, etc) will provide to each
subsidiary from shared infra like this image.
• All of the traffics separated by each virtualized technology.
Rakuten
service network
gb/at/db-net
Subsidiary
Rakuten XXX
Subsidiary
Rakuten XXX
RIaaS
Rakuten shared infra exchange
Storage
Backup
etc
etc
19
20. Data center network overview
Internet
Other DC /
Regional DC
DC core
Network A
VPN-Router
AZ1-Router
DC core
Network B
AZ2-Router
Subsidiary
Gateway-Router
RIaaS
Legacy
RIaaS
Legacy
AZ1
AZ1
AZ2
AZ2
Subsidiary
Subsidiary
Subsidiary
Subsidiary
Management network
Separate AZ(Availability Zone in DC) to minimize
big trouble impacts.
20
21. Fabric network physical architecture
Spine switch
Spine switch
Spine
Layer3
Border
Leaf
Border Leaf
Switch
Other DC
L3 Switch L3 Switch
Layer2
Border Leaf
Switch
Leaf Switch
Leaf Switch
・
・
・
Leaf
Leaf Swtich
Leaf Switch
Other DC
L3 Switch L3 Switch
Spine : Leaf architecture
Easy to operate, enhance, standardize quality, and
scale out.
21
22. Fabric network logical architecture
internet
Router
Router
Adopting Ethernet Fabric
- Flat network structure
- Every network pass is
active
- VRF and tag VLAN
enables remote control
and no-more-cabling
DC Core network
Fabric network
Scalable
Scalable
Fabric
Therefore, we can
provide flexible and
scalable network
structure
Shared service
(e.g.Storage)
RIaaS
Physical server
Simple and scalable network architecture.
22
23. Reduce Costs, Improve Agility / delivery time
2011
2013
Enterprise
storage
VLAN
networks
Firewall, loadbalancer
$10,000
6 weeks
$1800
5 days, 15 minutes
IDS, security,
monitoring
Legacy model is high price / long delivery time
RIaaS model is more cheap / fast delivery time
23
24. Cost compare Physical Server x RIaaS
Over half
price down!
DC cost
Storage
Compute
1U
Server
RIaaS
RIaaS, Private cloud system dramaticaly reduce
our cost
24
25. RIaaS: Concept Roadmap
RIaaS Phase2
RIaaS
RIaaS Phase3
•
Multisite BCP
RIaaS at East DC + West DC
can take balance on Disaster
recovery
•
Multi-Tenant Structure
• RIaaS for all Rakuten Group
including Subsidiaries
Lean/Powerful/Scalable Cloud Service
•
•
•
Reinforce architecture : High density server
Premium high-end storage , Commercial hypervisor
Speedy Server Construction using RIaaS management console
25
26. Database Platform in Rakuten
Shuichiro Makigaki
Datastore Platform Group, Grobal Infrastructure Development Department
27. Self Introduction
• Joined Rakuten as new grads. on April 2012
• Working on database and storage technology
for next generation Rakuten infrastructure as a
platform.
27
28. Agenda
1. Past MySQL Problems
2. Clustrix
• Introduction
• Benefits
3. Usage in Production
4. HA, Multiple Cluster Management
5. And, Some Demos!
28
29. Past MySQL Problems
×
×
×
×
×
Manual sharding
Manual server management
Long lead time
Offline maintenance
90% of CPU is NOT used!
……
……
……
Application servers
Master DB servers
DB
DB
BD
Slave DB servers
DB
BD
BD
BD
BD
BD
BD
BD
BD
BD
DB
BD
BD
BD
BD
BD
BD
29
30. Past MySQL Problems
×
×
×
×
×
Manual sharding
Manual server management
Long lead time
Offline maintenance
90% of CPU is NOT used!
We need a
new database platform
for “As a Service”!
……
……
……
Application servers
Master DB servers
DB
DB
BD
Slave DB servers
DB
BD
BD
BD
BD
BD
BD
BD
BD
BD
DB
BD
BD
BD
BD
BD
BD
30
32. Clustrix - Introduction
Clustrix is an appliance database
server.
MySQL Compatible
Distributed, Scalable
&
ACID guarantee
Automatic Fault Tolerance
32
33. Clustrix - Benefits
No manual sharding
• Automatic data distribution
No manual fault tolerance
• Automatic!
• Single point VIP access
APP
Scalable
Online Schema Change
No Sharding
Server1
Data3
Server2
Data1
Server3
Data2
ServerX
DataY
Data1
Data2
Data3
DataZ
Support
33
34. HA, Multiple Cluster Management
Production
Staging
Single node cluster
Cluster1
DB
DB
DB
DB
DB
DB
DB
DB
Bi-directional
Replication
(for BCP)
DB
RIaaS
Development
Single node cluster
Cluster2
DB
DB
DB
DB
DB
DB
DB
DB
DB
RIaaS
Backup
Monitoring
NFS
GlusterFS
34
35. Usage in Production
Number of DBs
200
150
100
50
Cluster2
Cluster1
0
Data Size (GB)
2500
2000
1500
1000
500
Cluster2
Cluster1
0
35
36. For Database as a Service
No Lead Time
Charge on demand
Charge on DB size
36
37. For Database as a Service
No Lead Time
Charge on demand
Charge on DB size
Private PaaS Integration
Self Management Tool
Demo
37
38. Demo1
Create DB from private PaaS (RPaaS)!
1. Create an application
2. Login PaaS
• rpaas login
3. Push the Application
• rpaas push
38
43. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
43
44. Case1
RIaaS benefits for Super/Victory Sale
Quick delivery time !!
Physical server construction takes long time….
RIaaS which is Rakuten private cloud system
-> IaaS platform.
-> Automation tools.
-> Huge resources.
Anytime, we could provide server
resources for Super Sale as
emergency server enhancement.
44
45. Case2
Layer2 extension between our Data Centers
Internet
Current DC
New DC
GSLB
Server Migration
L2 extension network
Bridge each Data center network for
server migration from Physical to Virtual
with out network setting change.
45
46. Index
1. Introduction / Current situation
- Today’s Rakuten infrastructure status
2. Our legacy infrastructure
- What’s the problems
3. Why we try to change our infrastructure
- Simple / Automate / Cost reduction and tech challenge!!
4. New infrastructure
- What’s the benefits
5. Case
- Use case from new infrastructure
6. Future
- What’s we are thinking for the next step
46
47. Future plan
Fast delivery / Self service
BCP / DR infrastructure
Global expansion
47
48. For All Datacenter Services with Self Service portal
2014 -
Software-defined
Datacenter
Services
with Self Service
VDC
5 days, 15 minutes
3 minutes
The next step will be the fastest delivery time with
self service portal service for all of Rakuten.
48
49. BCP / DR network concept
Internet (ISP network)
Tokyo
Osaka
Internet -VPN
Public Network
Rakuten Network
Public Network
Otemachi DC
or Rakuten Network
For IX Connection
Tokyo A
Availablity Zone
East DC
Osaka
Tokyo B
Availablity Zone
Availablity Zone
Availablity Zone
West DC
We plan to expand to other location to avoid
disaster risks.
49
Communication cost is still high for preparing all equipment
For Physical Server user we can save money 1203 JPY for CPU/Memory guarantee user. For low resource use user, 2998 JPY is GAP.For Cloud user, 5210 is performance guarantee users and 10110 is GAP.We can also save OPEX, we just increase people 10% in 2014, but App growth 40% which means we can save 30% OPEX by providing Cloud.If 20% OPEX saving is our Cloud effort, we can save 10M JPY/year.
Produce some SDDC, customer can provision without negotiation communicationUnderstand their SLA and Cost via Self service portal