31. A story about an attempt to make Rakuten Rakuma's
organization and its system microservices.
Aug 4, 2022
Hiroki Kishi
EC Incubation Department > Rakuma Development Section
Rakuten Group, Inc.
32. 32
Biography
~2016: Backend Engineer at a navigation application company
• Developed various functions and performed operation and maintenance as a Java engineer
2016-2018: Joined Fablic Co.
• Developed various functions and performed operation and maintenance as a Ruby
engineer
2018-present:
• Transferred to the current Rakuma Development Section following the absorption merger
with Rakuten, Inc.
• Engaged in management of various engineering organizations as an engineering manager
• Planning and implementing various measures for the expanded organization
Hobbies
• My home, my car, cooking
About me
Hiroki Kishi
33. 33
Today’s topics
Launched in 2012 as "Fril" by a venture company, the current Rakuten Rakuma has grown so much that
it is incomparable to the initial launch as of 2022!
I personally felt that we were approaching a point where we needed to re-architect both the system
and the organization to keep up with the growth of the service, and over the past year we have been
exploring the path to microservices.
Today, I would like to share with you some of the things I have been working on in the process of re-
architecting the organizational aspects of the service (a little bit about the technical aspects).
I hope that the mistakes we made and the things I learned from them will be useful to you!
41. 41
Microservice working group
Managers
Finding boundaries in terms of the domain
structure and organizational structure that
make up Rakuten Rakuma.
Technical specialists
Performs technical concept validation and
achieve system cut-outs.
Outside advisory
43. 43
FYI: Technical study of system microservicing
Considering the model proposed by Spotify to not cut out as
microservices from the beginning, but to cut out via modular
monoliths, which is a method that can be rolled back while
experimenting. If we split up to DB, it will be difficult to rolled
back.
We extensively studied distributed transactions, boundary
straddling, communication methods between microservices
(REST, GraphQL, gRPC), etc., and verified the results with a
single endpoint.
Source: Internal document
47. 47
各々にミッションがある
Listing
functionalities
Store large numbers of
products in the data store
efficiently
Vendor seller
Individual seller
Item DB
Search Engine
Fast search
Can manage inventory
for vendors
Accounting
Automation of manual tasks
Accuracy
Shipping
Reduction of cumbersome
packaging work
Providing a variety of
convenient shipping features
Customer support
Functionalities for staff to quickly resolve issues
Development of features that step in to solve
problems
Search
Recommendation
Easy to find, precise
Display
A system that
makes it easy for
marketers to
develop a variety
of measures
In-app campaign
Individual buyers
Campaign(distribution)
Speedy delivery in large amounts
SRE,Technology Infrastructure
Stable handling of large amounts of traffic
and large amounts of data, and
deployment efficiency
Payment
Support a variety of payment methods
Reduce fees to payment vendors
48. 48
Sell and buy a
lot in Rakuma
A lot of items
sold
A lot of items
bought
Sold by
individuals
Sold by
businesses
Items are
found
accurately
Items are
viewed
through
campaigns
System1 System2 System3
Each node of the KPI tree that breaks down
business goal targets has its own KPIs and
system performance requirements, but when
multiple teams develop on the same system,
conflicts are more likely to occur, which in turn
worsen DX.
As a result, delivery slows down and system
troubles occur frequently.
Understanding in KPI tree
49. 49
Sell and buy a
lot in Rakuma
A lot of items
sold
A lot of items
bought
Sold by
individuals
Sold by
businesses
Items are
found
accurately
Items are
viewed
through
campaigns
System1 System2 System3 System4
Understanding in KPI tree
When systems are tied to each node and
aredeployed independently, there should be fewer
conflicts .
DX, including development, maintenance, and
operational efficiency, will improve, and delivery
will be faster with fewer system problems (ideally).
However, if we split the system only for the
convenience of the system, conflicts would
continue. So, we conducted interviews to their
daily works over a long period in order to carefully
search for a break in the system.
50. 50
Criteria of area selection to split out
Reach
• Size of area/system
• Number of engineers involved
• Number of operators involved
• Number of users affected
Impact
• Debt repayment: magnitude of
of issue and effect of resolution
Confidence
• Expected likely effect (risk of
counterproductive effect)
• Domain proficiency of team
members
Effort
• Execution difficulty
51. 51
Example
Campaign Functions(Distribution)
Speedy mass distribution
• Obviously different people are using it, so it's easy to
disconnect.
• Need separate performance tuning
• Can be decoupled and scaled individuallywith
different languages, etc.
• Lack of team proficiency
• Areas with many unexpected responses
• Expansion of the area itself
• Constant resource difficulties
• Teams receiving spills
• The domain and the engineers and operators involved
are narrowed down without being too many.
• Target: Users who want to increase retention (a lot)
• The logic of incentivization is not sufficient or
the performance is not keeping up with the
volume you want to deliver, so the negative
elimination effect is great
• Development potential is great because there is
no limit to the number of appeals that can be
drawn out
Good for the area choice, but difficult to start now...
Reach
• Size of area/system
• Number of engineers involved
• Number of operators involved
• Number of users affected
Impact
• Debt repayment: magnitude of
issue and effect of resolution
• Developmental: Prospects for
significant changes in the future
Confidence
• Expected likely effect (risk of
counterproductive effect)
• Domain proficiency of team
members
Effort
• Execution difficulty
52. 52
Due to retirement* and parental leave of the promotion members, the working group could no longer
continue, therefore microservicing was stopped.
*This retirement was not because of this project.
54. 54
Retrospective and prospective
Lack of a position to discuss with the business organization on an equal footing
• Failed to create a foundation for quantifying system issues as business issues and proposing a balance with business
projects
• Also, there was no one to lead the entire development organization, so there was a possibility that proposals were
positional talk.
Management was not able to devote enough time and money to the consideration of microservices.
• Because of their wide perspective, they have many other responsibilities and roles to play.
Next action
• As it is not efficient to conduct interviews from scratch, we will start with a small number of people to make the
process more efficient.
• We want to add a full-time member with authority.
57. 57
Wrap-up
• Services that have been evolving for 10 years will lose development efficiency if they are not rear-architected.
• “Re-architect” is not only about the system, but also for the organization.
• Aim for a state where the system is neatly connected to the KPI tree of the business.
Don’t stop at only one trial. Continue for the future.
• It is extremely difficult just to clarify the complicated linkage between KPI tree and the system.
It cannot be done in your spare time.
• We tried to conceptualize it in many ways besides the metaphors of shop and the RICE framework.
• Need a role who can translate system issues into business issues and can develop, promote, and maintain specific
plans for the future.
58. 58
One more thing…
• Not only microservicing, it is an eternal challenge for development
organizations to maintain a healthy system by incorporating code cleanup
such as re-architecture, refactoring, and technical debt elimination into
the business plan
• To acquire vocabulary that others can understand = to be able to tell a
convincing story, it is necessary to have a deep understanding of the
organization's mission, KPI structure, and technical and organizational
architectural methods, and to derive a realistic plan based on this
understanding (in that sense, it is not just about vocabulary).
• In other words, we need someone, who is not locked into a specific
business area, but is responsible for the entire development organization,
and can coordinate with the business organization.
i.e. CTO…?
To be continued...
Source: Internal document