See the webinar: http://perforce.com/resources/presentations/webinars/dev-talk-avoid-git-bloat-submodule-hell
How do you avoid the burden of working with bloated, monolithic Git repositories and sidestep the complexity of submodules?
Perforce Git Fusion lets you incrementally break up large Git repositories into small functional repositories. You can break off chunks for different teams and build localized processes around those smaller units, while also maintaining the "big" build. That way, you’ve got a pathway for migrating to a much more Agile delivery system.
Join Perforce engineer and Git user, Russ Tremain, as he discusses how Git Fusion can help you…
* Avoid the hassle of splitting up large repos using standard Git tools
* Create new repositories—either in Git or in Perforce—by picking and choosing the content you want
* Prepare for your growth in products and teams
PRESENTED BY: Russ Tremain
Build Engineer, Perforce
Russ Tremain is a veteran Software Engineer who currently specializes in advanced automation frameworks for software build, test, and release. He holds degrees in Computer and Information Science, and Information Studies from UC Santa Cruz and UC Berkeley, respectively. Russ has authored and actively participates in several open source projects, including the "Cado" language, which he uses to develop structured source code transformations.
TODAY’S PRESENTATION
Hello, my name is Russ Tremain and today I'm going to talk a little bit about how we have used Git Fusion here at Perforce to solve some interesting integration problems. I will cover some basic ways in which you can use Git fusion to integrate content from different sources into your companies content.
But first, a little background.
LINUS’S GIT
Git was developed by Linus Torvalds to support Linux kernel development, and is geared to adding a constant flow of patches from numerous contributors to the hundreds of Linux kernel modules that are now available. Often, these patches are transmitted via email lists, and Git has facilities for easily generating and applying patches from email messages.
WHY DEVLOPERS LIKE GIT
Paramount among the features that Linus wanted, is the ability to see the entire history of a source file that is compiled into the Linux kernel. Why should a developer have to hook up to a central repository in order to see a diff?
The answer is, the developer should not have to! At the time, many other software developers had arrived at a similar conclusion, and there was a period when Mercurial, Bazaar, and other SCM's blossomed to support the distributed development model.
GIT, THE SORCERER’S APPRENTICE
Git has become enormously popular for development in recent years. There are now millions of Git repositories out in the wild. The key word here is "millions" - there is a reason for that. Many of these repositories are merely "clones" of others, and do not contain unique content. An organization adopting Git as a pure development tool, thus becomes embroiled in a sort of Sorcerer's Apprentice problem - cloned repositories soon become yet another problem that has to be managed. The flow of commits between all of these repositories can be difficult for one developer to manage on an individual laptop, let alone an IT organization that is responsible for securing intellectual property for hundreds or even thousands of employee contributors.
GIT IN THE REAL WORLD
Various solutions have evolved to address the problem of out-of-control cloning. Many organizations adopt a Central Repository model, where Git developers essentially are required to work in a centralized flow that is more widely understood. Former Perforce VP of Product Technology Laura Wingerd popularized this idea in her book "Practical Perforce", ironically in the same year that Git was first released. The Central Repository Model, or "Mainline Model", as Wingerd calls it, is very popular, because it provides a simple, well understood method that can support any organization's software development workflows in a standard way.
EMBRACING GIT IN THE ENTERPRISE
By now, Git has essentially won the distributed SCM wars, and has become the dominant choice for most new open source projects, and for many new companies. Git has become a core skill for new software engineers, and so supporting Git for these developers is an imperative that Perforce recognized early, and as a result developed Git Fusion.
GIT FUSION ALLOWS DEVELOPERS TO TAKE CHARGE
With Git Fusion, Perforce elevates support for Git to the Enterprise level, allowing the seamless inter-operation of Git clients with back-end Perforce repositories. Developers thereby gain the ability to leverage a wide variety of Git open source technology and content for software development, while IT gets all the cool redundancy, brokering, and replication features that Perforce offers.
So what are some of these advantages to the software developer?
To start with, let's talk about submitting a Git Repository to Perforce.
Once a developer has a Perforce account, and IT has provided a Git Fusion server, that developer can now re-parent Git repositories directly into a Perforce client view. This is a key concept - it is the Perforce Client View that can be leveraged not only to support the incorporation of existing Git Repositories, but also for re-mapping those repositories into new perforce views, and thus into new Git repositories.
This can all be controlled by the developer with minimal support from IT, using the developer's existing Perforce credentials. IT can go about its business.
DEVELOPERS CAN MANAGE THEIR OWN IMPORTS
Secondly, how a software project evolves into Perforce is totally under the control of the developer. Does the code come from CVS? Subversion? Mercurial? RCS? Even SCCS? This is only of concern to the developer, and there are a wide variety of tools that can be used to migrate those sources into Git. Once it is in Git form, one merely reparents the repository to point to a Perforce view, and the migration is complete.
REFACTORING GIT WITH PERFORCE GIT FUSION
Third, after the repository is in Perforce, it can be remapped to any number of views that then become Git repositories as required by functional needs.
For example, suppose you have a large, bloated Git repository that has grown to one or two gigabits in size. Under normal circumstances, the source code for the project may only be a small percentage of that overall size, and yet when the developer creates a local clone, all of the content is brought down. Git-based solutions to this problem have evolved, such as "shallow" cloning, but this complicates the developer’s workflow.
A better solution would be to use Git Fusion to re-map the repository into functional views, such as a view designed for tech-pubs, one designed for QA, another for development, and finally, a "build" view that may encompass all of the other views, in order to drive a full-scale release process.
I personally have had to do this sort of mapping in Git, and it was a hellish project that I would not want to have to repeat. Try explaining “filter-branches” to your mom.
HOW TO REMAP YOUR GIT REPOSITORY
To demonstrate how easy this is, here's an example where I reparent and submit my Cado open source project to Perforce, and then remap it into 3 new views - one containing the base source code, one containing the regression tests, and another containing the Cado template sources.
WHAT IS CADO?
Briefly, Cado is a template-based code-generation language that can be used to transform or create textual content. I have used it to convert wiki mark-up languages, generate large build systems, generate Java class hierarchies, etc. Cado is a programmers friend.
Diagram of Demo
DEMO
MANAGING OUTSIDE CONTRIBUTORS
Okay, that's all pretty cool, but let's think about the problem from another angle: using Git Fusion to manage workflow from outside contractors.
Engineering teams sometimes need help, and hire outside contractors to add capacity during crunch times. Often, management resists giving those developers full access to the company resources. IT may not be well set up to monitor and grant privileges to contractors.
With Git Fusion, contractors can be given a view into the source repository adequate to support their development, without requiring full access to a companies network. There are multiple options for managing the outside development, without ever exposing Perforce servers outside of the firewall. Here is a simple way you might approach the problem.
[DIAGRAM]
The above diagram is one approach in which Xyz Corporation has hired Acme Consulting to help on project X. The project manager at Acme consulting uses Git Fusion to clone the remote repository hosted at Xyz Corp:
$ git clone –bare xyzcorp:x
$ cd x
$ git config --system receive.denyNonFastForwards true
This can be a “bare” git repository, and we recommend that you require straight line merges, i.e., set the receive.denyNonFastForwards to true in your shared clone of the Git Fusion repository at Xyz corporation This will make integration with the development team at Xyz easier.
In this scenario, Xyz Corp has decided to manage the contributions from Acme via a single Perforce user account, which is called “acme”. However, Acme may have several developers work on the project at their end, and to so the project manager at Xyz Corp keeps track of them in the Git Fusion user map, which is stored in the Perforce repository in the file //.git-fusion/users/p4gf_usermap. Unknown developers will not be allowed to integrate. From the Perforce repository view, all submits will done by the “acme” user account, but the original contributor’s name will be retained in the Git history.
Meanwhile, the build team for Xyz Corp can continue to use their existing build infrastructure, driven by Perforce submits. No change in their process is necessary, other than the normal work of incorporating new product code for release.
Diagram
HOW WE USE GIT AT PERFORCE
As Git Fusion has evolved, we have used it in a several different ways at Perforce.
First of all, the GF dev team uses Git. Drink your own Ale.
In build, we have used it to support versioning of Jenkins configurations. Jenkins doesn’t know that all of its configurations are being stored in Perforce, because it is using Git as it’s source control interface to version the configuration edits.
I have used Git Fusion in a couple of different ways. The first was to migrate a software project for dumping out Electric Commander sources. I initially was working in CVS, because some of the other pieces I needed were also in CVS. I then migrated it to Git, and from there into Perforce.
On the same project, I used Git as an interface to Perforce in order to develop the SCM synchronization for the Ecdump project. This meant that I could make all of my mistakes in throw-away Git repos. Once the synchronization piece was done, I flipped a switch and sent in into Perforce via Git Fusion. There are some more details of how I did this in Part II of my blog series on the Perforce blog site.
Once the EC configurations are safe and sound in the Perforce repository, we can use any of the Perforce visual tools to figure out what happened when.
The Ecdump project is now released to open-source, so that our customers can benefit. When I released it, I only had to re-parent the local repository to point to my open-source site, and push it on out. I can incorporate any fixes via the external-GF-Perforce pathway.
SUMMARY
Git fusion offers new options for working – as we showed in the demo, some teams can adopt Git for their development, will other teams can use the visual and drag and drop interfaces to Perforce. We do it here, at Perforce.
We have also shown how you can solve the remapping problem for Git, using Perforce, for FREE! Did I mention FREE before?
Give it a shot!
RESOURCES
Blog, part 1: http://www.perforce.com/blog/130702/using-git-api-perforce-part-1Blog, part 2: http://www.perforce.com/blog/130722/using-git-api-perforce-part-2Download Git Fusion: http://perforce.com/product/components/git-fusionGit Fusion Manual: http://www.perforce.com/perforce/doc.current/manuals/git-fusion/Cado & VSPMS open source home: http://github.com/russt/Cado downloads: http://sourceforge.net/projects/cado/files/