Repository structure is one of the most important decisions that an engineering team needs to make. This decision is typically made at the start of the project when possible.
In most enterprises, teams tend to follow a similar pattern, as to their choice of monorepo VS multi-repo. Google switched to monorepo, which means all their projects and code is stored in a single, large repository.
But which approach should we take - monorepo or multi-repo?
What are the pros/cons of each? Why did I advocate for multi-repo with all clients that I worked with? Are there criteria to help teams decide?
This session will discuss the context, challenges and advantages of each of the two approaches.
2. Thank You all for being here.
And , a Big Thank You to the DevOps Institute for
hosting this event , and inviting me over .
I hope you all have a wonderful session.
3. A brief about me -
• Currently , I am working as an Assistant Director,
Cloud Practice at EY (Ernst & Young).
• Before that I led the CCOE ( Cloud
Centre Of Excellence) Team at Accenture.
• I have around 10+ years of IT experience,
working on Public Cloud technologies since 2017.
4. Expectation management -
• This is not an introductory session on how to get started
with Source control repositories, or Git.
• It is expected that the audience for this session is already
familiar with Git, or any other source control repository
tool.
• This talk is primarily targeted at software developers,
devops architects, and build/release engineers who want
to know more about different repository structures, their
pros and cons, and how to choose between them.
5. Agenda -
• Section I - An introduction to Monorepo, and
Polyrepo.
• Section II - How does Google do Monorepo.
• Section III - Monorepo vs Polyrepo :
Advantages and Disadvantages.
• Section IV - What or how to choose?
• Section V - Conclusion and moving forward.
6.
7. •Section I - An introduction
to Monorepo, and
Polyrepo.
• Section II - How does Google do Monorepo.
• Section III - Monorepo vs Polyrepo – Advantages and Disadvantages.
• Section IV - What or how to choose?
• Section V - Conclusion and moving forward.
8. In version control systems,
a repository is a data
structure that stores
metadata for a set of files or
directory structure.
What is a repository?
9. Primarily, there
are 2 types or
patterns
when it comes
to repository
structures.
Types/patterns of repo
10. The way an enterprise/software company organizes
its codebase, normally seems like a trivial topic, but
in reality, it has a huge impact on
- how fast the development teams can make
changes.
- how fast they can get those changes released in
production.
- how well developers can communicate and
collaborate amongst each other.
- how fast engineering can deliver credible
business value to the end users.
But, why should you be interested?
12. What is a Monorepo?
• It is a single repository that contains more than one logical
project (e.g. an iOS client and a web-application)
• These projects are most likely unrelated, loosely connected
or can be connected by other means (e.g. via dependency
management tools)
• The repository is large in many ways:
Number of commits
Number of branches and/or tags
Number of files tracked
Size of content tracked (as measured by looking at the
.git directory of the repository)
13. Sample Monorepo structure
The Monorepo structure consists of a single
code repository with a hierarchical directory
structure that includes several projects.
A proposed structure is defined here, where
each microservice/project/
application can be
owned by different teams.
One single repo
14. Who uses Monorepo?
• Google uses a homegrown version-control system
to host one large codebase visible to, and used by,
most of the software developers in the company.
• Google's monolithic software repository is used by
95% of its software developers worldwide.
15. Who uses Monorepo?
• With thousands of commits a week across hundreds of thousands
of files, Facebook’s main source repository is enormous — many
times larger than even the Linux kernel, which checked in at 17
million lines of code and 44,000 files in 2013.
• And while conducting performance tests, the test repository
Facebook used were as follows:
4 million commits
Linear history
~1.3 million files
The size of the .git directory was roughly 15GB
The size of the index file was 191MB
16. A polyrepo architecture means using multiple
repositories, rather than one repository.
For example, a polyrepo can use a repo for a web app
project, a repo for a mobile app project, and a repo
for a server app project.
Polyrepo is also known as many-repo or multi-repo.
What is a polyrepo?
17. Sample Polyrepo structure
As you can see, in a polyrepo, each logical
structure of the code, in this case, say
services/domains, get one individual repo.
So, in this case the Monorepo shown earlier has
been broken down into
3 separate
Polyrepo.
One repo for apple
One repo for banana
One repo for grocery
18. High-level summary of Section I –
2 types of repo structures available today –
Monorepo where you have all projects/products of an
enterprise in a single repo
VS
Polyrepo where you create multiple repositories,
each one dedicated to a project/product/logical
segregation of a product.
19. • Section I - An introduction to Monorepo, and Polyrepo.
•Section II - How does
Google do Monorepo.
• Section III - Monorepo vs Polyrepo – Advantages and Disadvantages.
• Section IV - What or how to choose?
• Section V - Conclusion and moving forward.
20. Scale of Google Monorepo
Used by 95% of engineers at Google
,as of 2015 data
24. High-level summary of Section II –
Google has been able to successfully adopt Monorepo at huge scale,
across billions of lines of code.
While doing that it has enjoyed the benefits that comes with adopting
Monorepo like code reuse, easier dependency management, etc.
But at the same time, to be able to use Monorepo properly across
such a big enterprise, it had to make significant investments,
both time and effort, in its tooling, practices and patterns.
25. • Section I - An introduction to Monorepo, and Polyrepo.
• Section II - How does Google do Monorepo.
•Section III - Monorepo vs
Polyrepo – Advantages
and Disadvantages.
• Section IV - What or how to choose?
• Section V - Conclusion and moving forward.
26. • Single source of truth
• Strong collaboration across
teams
• Standard coding, architectural
and testing patterns
• Simplified dependency
management
• Easy refactoring and code reuse
Advantages and
Disadvantages of
Monorepo
• IDE lag
• Git slowdown
• Broken master
• Long build times
• Codebase
complexity
• Tooling
investment
27. Monorepos are sometimes called
monolithic repositories, but they
should not be confused with monolithic
architecture, which is a
software development practice
for writing self-contained applications.
28. • Clear/strong ownership
• Smaller code base
• Narrow clones
• Fast build times
• Isolated master breakage.
Advantages and
Disadvantages of Polyrepo
• Integration issues
• Code searching, and
sharing
• Functionality
duplication
• Dependency hell
• Silos
30. Can I change from Monorepo to Polyrepo and vice-versa?
Yes, you can, and you might have to based on the
state of your application/company from time to
time.
Uber moved from Monorepo to Polyrepo to back
again.
31. • Section I - An introduction to Monorepo, and Polyrepo.
• Section II - How does Google do Monorepo.
• Section III - Monorepo vs Polyrepo – Advantages and Disadvantages.
•Section IV - What or how
to choose?
• Section V - Conclusion and moving forward.
32. So, what should I choose?
The answer like many things in
software architecture is -
33. Monorepo vs Polyrepo
It is very much like – the concept of a glass
being half-empty/ half-full.
This topic, in itself has been a topic of a never-
ending debate, across engineers and software
professionals.
Nowadays, this is more of a subjective choice
that an organization makes and is related to
its engineering team’s toolchain and culture.
34. Don’t follow big companies blindly
Many times in enterprises, engineers tend to look at big
companies, and try to replicate their practices and patterns,
without understanding the monumental effort that has been
invested in making those initiatives work for them, at scale.
It’s never a good idea to just go with an approach because
Google/Facebook/Twitter said so.
Instead, the enterprise engineering team should try to measure
how the approaches impact the organisation, and then take a
decision that works for them.
35. • Section I - An introduction to Monorepo, and Polyrepo.
• Section II - How does Google do Monorepo.
• Section III - Monorepo vs Polyrepo – Advantages and Disadvantages.
• Section IV - What or how to choose?
•Section V - Conclusion
and moving forward.
36. A forward thinking approach
What if we could take the advantages of both
approaches, and try to minimize the
disadvantages, and come up with our own
repo structure, that works for our company,
and our teams.
The ultimate goal is to come up with a repo
structure that facilitates faster development,
and does not get in the way of your
development teams.
37. You need to strike a delicate
balance between – not storing
everything in a single repo, and
at the same time, keeping the
number of repositories to
manageable level.
38. Hybrid repo
Any solution should be tailored
for the specific requirements and
needs of the organization.
For example,
- One single Monorepo for all
infrastructure libraries,
cross-SL API, security
baselines, etc
- While individual service and utilities team
will continue to live in their separate
repositories, one per team.