10. Originally we just wanted to make a git hosting site.
In fact, that was the ïŹrst tagline.
11. git repository hosting
git repository hosting.
Thatâs what we wanted to do: give us and our friends a place to share git repositories.
12. Itâs not easy to setup a git repository. It never was.
But back in 2007 I really wanted to.
13. I had seen Torvaldsâ talk on YouTube about git.
But it wasnât really about git - it was more about distributed version control.
It answered many of my questions and clariïŹed DVCS ideas.
I still wasnât sold on the whole idea, and I had no idea what it was good for.
14. CVS
is stupid
But when Torvalds says âCVS is stupidâ
15. and so are
you
âand so are you,â the natural reaction for me is...
17. At the time the biggest and best free hosting site was repo.or.cz.
18. Right after I had seen the Torvalds video, the god project was posted up on repo.or.cz
I was interested in the project so I ïŹnally got a chance to try it out with some other people.
19. Namely this guy, Tom Preston-Werner.
Seen here in his famous âI put ketchup on my ketchupâ shirt.
20. I managed to make a few contributions to god before realizing that repo.or.cz was not different.
git was not different.
Just more of the same - centralized, inïŹexible code hosting.
21. This is what I always imagined.
No rules. Project belongs to you, not the site. Share, fork, change - do what you want.
Give people tools and get out of their way. Less ceremony.
22. So, we set off to create our own site.
A git hub - learning, code hosting, etc.
26. Whatâs special about GitHub is that people use the site in spite of git.
Many git haters use the site because of what it is - more than a place to host
git repositories, but a place to share code with others.
27. a brief
history
So thatâs how it all started.
Now I want to (brieïŹy) cover some milestones and events.
29. 2008 january
We launched the beta in January at Steffâs on 2nd street in San Franciscoâs SOMA district.
The ïŹrst non-github user was wycats, and the ïŹrst project was merb-core.
They wanted to use the site for their refactoring and 0.9 branch.
30. 2008 april
A few short months after that we launched to the public.
31. 2009 january
In January of this year, we were awared the âBest Bootstrapped Startupâ
by TechCrunch.
32. 2009 april
Then in April we were featured as some of the best young tech entrepreneurs
in BusinessWeek.
(Finally something to show mom)
33. 2009 june
Our Firewall Install, something weâd been talking about since practically
day one, was launched in June of 2009.
34. 2009 september
And in September we moved to Rackspace, our current hosting provider.
(Which some of you may have noticed.)
35. Along the way we managed to pick up Scott Chacon, our VP of R&D
42. .com as opposed to ïŹ, which Iâm not going to get into today.
Youâll have to invite PJ out if you want to hear about that.
43. the
web app
As everyone knows, a web âsiteâ is really a bunch of different components.
Some of them generate and deliver HTML to you, but most of them donât.
Either way, letâs start with the HTMLy parts.
44. rails
We use Ruby on Rails 2.2.2 as our web framework.
Itâs kept up to date with all the security patches and includes custom patches weâve added
ourselves, as well as patches weâve cherry-picked from more recent versions of Rails.
45. We found out Rails was moving to GitHub in March 2008, after we had reached out to
them and they had turned us down.
So it was a bit of a surprise.
46. rails
But there are entire presentations on Rails, so Iâm not going to get further
into it here.
As for whether it scales or not, weâll let you know when we ïŹnd out. Because so far
it hasnât come close to presenting a problem.
48. We badly wanted this, but didnât want to invest the time upgrading.
So using a few open source libraries weâve wrapped our Rails 2.2.2 instance in Rack.
49. Now we can use awesome Rack middleware like Rack::Bug in GitHub
50. In fact, the Coderack competition is about to open voting to the public this week.
Coders created and submitted dozens of Rack middleware for the competition.
I was a judge so I got the see the submissions already. Some of my favorite
were
56. unicorn
- 0 downtime deploys
- protects against bad rails startup
- migrations handled old fashioned way
57. nginx
For serving static content and slow clients, we use nginx
nginx is pretty much the greatest http server ever
itâs simple, fast, and has a great module system
70. smoke
Kinda.
Eventually we needed to move of our git repositories off of our web servers
Today our HTTP servers are distinct from our git servers. The two communicate using smoke
71. smoke
âGrit in the cloudâ
Instead of reading and writing from the disk, Grit makes Smoke calls
The reading and writing then happens on our ïŹle servers
73. bert-rpc
bert : erlang ::
json : javascript
BERT is an erlang-based protocol
BERT-RPC is really great at dealing with large binaries
Which is a lot of what we do
74. bert-rpc
we have four ïŹle servers, each running bert-rpc servers
our front ends and job queue make RPC calls to the backend servers
77. chimney
We have a proprietary library called chimney
It routes the smoke. I know, donât blame me.
78. chimney
All user routes are kept in Redis
Chimney is how our BERT-RPC clients know which server to hit
It falls back to a local cache and auto-detection if Redis is down
79. chimney
It can also be told a backend is down.
Optimized for connection refused but in reality that wasnât the real problem.
80. proxymachine
All anonymous git clones hit the front end machines
the git-daemon connects to proxymachine, which uses chimney to proxy your
connection between the front end machine and the back end machine (which holds
the actual git repository)
very fast, transparent to you
82. ssh
Sometimes you need to access a repository over ssh
In those instances, you ssh to an fe and we tunnel your connection to
the appropriate backend
To ïŹgure that out we use chimney
83. jobs
We do a lot of work in the background at GitHub
92. solr
Solr is basically an HTTP interface on top of Lucene. This makes it pretty simple
to use in your code.
We use solr because of its ability to incrementally add documents to
an index.
93. Here I am searching for my name in source code
94. solr
Weâve had some problems making it stable but luckily the guys at Pivotal
have given us some tips
Like bumping the Java heap size.
Whatever that means
100. fragments
Formerly we invalidated most of our fragments using a generation scheme,
where you put a number into a bunch of related keys and increment it
when you want all those caches to be missed (thus creating new cache
entries with fresh data)
101. fragments
But we had high cache eviction due to low ram and hardware constraints, and found
that scheme did more harm than good.
We also noticed some cached data we wanted to remain forever was being evicted due
to the slabs with generational keys ïŹlling up fast
102. page
We cache entire pages using nginxâs memcached module
Lots of HTML, but also other data which gets hit a lot and changes rarely:
103. page
- network graph json
- participation graph data
Always looking to stick more into page caches
104. object
We do basic object caching of ActiveRecord objects such as
repositories and users all over the place
Caches are invalidated whenever the objects are saved
105. associations
We also cache associations as arrays of IDs
Grab the array, then do a get_multi on its contents to get a list of objects
That way we donât have to worry about caching stale objects
110. walker
For most big apps, you need to write a caching layer
that knows your business domain
Generic, catch-all caching libraries probably wonât do
121. sha asset id
Instead of using timestamps for asset ids, which may end up hitting the disk
multiple times on each request, we set the asset id to be the sha of the last commit
which modiïŹed a javascript or css ïŹle
125. bundling
googleâs closure compiler for javascript
we donât use the most aggressive setting because it means changing
your javascript to appease the compression gods,
which we havenât committed to yet
126. scripty 301
Again, for most of these tricks you need to really pay
attention to your app.
One example is scriptaculousâ wiki
127. scripty 301
When we changed our wiki URL structure, we setup dynamic 301 redirects
for the old urls.
Scriptaculousâ old wiki was getting hit so much we put the redirect into nginx itself -
this took strain off our web app and made the redirects happen almost instantly
128. ajax loading
We also load data in via ajax in many places.
Sometimes a piece of information will just take too long to retrieve
In those instances, we usually load it in with ajax
129.
130.
131. If Walker sees that it doesnât have all the information it needs, it kicks off a job
to stick that information in memcached.
132. We then periodically hit a URL which checks if the information is in memcached or not.
If it is, we get it and rewrite the page with the new information.
149. test unit
We mostly use Rubyâs test/unit.
Weâve experimented with other libraries including test/spec, shoulda, and RSpec, but in the end
we keep coming back to test/unit
150. git ïŹxtures
As many of our ïŹxtures are git repositories, we specify in the test what sha
we expect to be the HEAD of that ïŹxture.
This means we can completely delete a git repository in one test, then have it back in
pristine state in another. We plan to move all our ïŹxtures to a similar git-system in the future.
151. ci joe
We use ci joe, a continuous integration server, to run on tests after each push.
He then notiïŹes us if the tests fail.
155. staging
We also always deploy the current branch to staging
This means you can be working on your branch, someone else can be working on theirs,
and you donât need to worry about reconciling the two to test out a feature
One of the best parts of Git
158. security@
github.com
we get weekly emails to our security email (that people ïŹnd on the security page)
and people are always grateful when we can reassure them or a answer their question
159. consultant
if you can, ïŹnd a security consultant to poke your site for XSS vulnerabilities
having your target audience be developers helps, too