This document summarizes a presentation on managing open source software in the GitHub era. It discusses how open source development and distribution has evolved from centralized models to a decentralized model exemplified by GitHub. This shift has introduced new challenges for open source compliance, such as tracking the large number of dependencies between projects and properly attributing and licensing snippets of code. The presentation provides best practices for organizations to reduce risks, such as vetting dependency sources and embedding license information.
2. Managing
Open
Source
in
the
GitHub
Era
Agenda
• IntroducEon
• Understanding
basis
issues
for
OSS
compliance
• Understanding
new
issues
for
OSS
compliance
in
the
GitHub
Era
• Best
pracEces
to
reduce
risk
• Latest
trends
for
process
and
tools
to
manage
open
source
compliance
• QuesEons
3. Managing
Open
Source
in
the
GitHub
Era
Most
Common
OSS
License
ObligaEons
• Copyright
and
license
noEces
• ANribuEon
obligaEons
• “CopyleP”
obligaEons
– Licensing
of
derivaEve
works
– Change
noEces
– Offer
to
provide
source
code
• Carve
out
for
OSS
in
your
license
agreements
4. Managing
Open
Source
in
the
GitHub
Era
Key
OSS
Compliance
Challenges
• Tracking
acquisiEon
and
use
of
open
source
• GeWng
OSS
informaEon
from
suppliers
• Delivering
OSS
informaEon
to
customers
– ANribuEon
noEce
creaEon
and
delivery
– CopyleP
-‐
source
code
packaging
and
delivery
5. Managing
Open
Source
in
the
GitHub
Era
The
“GitHub
Era”
• Decentralized
and
distributed
model
of
Git
represents
many
of
the
new
OSS
trends
• More
individuals
engaged
directly
• Smaller
projects/components
with
many
more
dependencies
• Forking
is
encouraged
à
exponenEal
growth
in
number
of
copies
of
popular
components
• Explosion
in
the
number
of
disEnct
OSS
components
used
in
products
and
systems
-‐
from
dozens
to
hundreds
to
thousands
or
more
6. Managing
Open
Source
in
the
GitHub
Era
Growth
of
component
repositories
• In
January
2011
there
were
less
than
80K
components
available
in
the
main
component
repositories
(Maven,
CPAN,
Pypi,
RubyGems)
• In
December
2014
there
are
more
than
500K
components
and
counEng
(including
NPM,
Bower,
Godoc,
Packagist,
NuGet)
• In
2014,
new
components
have
been
added
to
these
repositories
at
the
rate
of
over
10,000
new
component-‐versions
per
month.
Source:
hNp://www.modulecounts.com/
7. Managing
Open
Source
in
the
GitHub
Era
GitHub
–
more
background
• Provides
Git-‐based
services.
• Git
is
a
version
control
and
content
management
tool
from
Linus
Torvalds
(GPL
v2)
• GitHub
key
aNributes
are
easy
code
sharing
and
collaboraEon
• JavaScript
is
dominant
– Other
languages:
Ruby,
Java,
PHP,
Python,
C/C++
• Started
in
2008
– Over
17
million
repos
and
7.8
million
users
claimed
• Over
the
last
12
months,
new
public
open
source
components
repositories:
– Over
350K
created
per
month
(excluding
forks)
– Over
10,000+
created
daily
Source:
nexB
research
data,
Github
API,
2013-‐11/2014-‐11
8. EvoluEon
of
OSS
Development
OLD
OSS
• Centralized
development
• CVS,
Subversion
• Project
leader
is
benevolent
dictator
• Fewer
larger
components
• Push
releases
• C/C++,
Java
• SourceForge,
Maven
• L/GPL
v2,
BSD,
MIT
• Desktops
and
servers
NEW
OSS
• Decentralized
development
• Git
/
GitHub
• Each
developer
forks
code
at
any
Eme
• More
smaller
components
• Pull
releases
• JavaScript,
Ruby,
Scala,
Go
• RubyGems,
NPM
• MIT,
Apache,
L/GPL
v3
• Mobile
and
Cloud
9. EvoluEon
of
OSS
Compliance
Challenges
OLD
OSS
•
Components
without
a
license
•
OSS
code
downloaded
to
internal
codebase
and
compiled
locally
(vendored)
•
DistribuEon
means
shipment
or
download
•
Snippets
NEW
OSS
•
Many
more
components
without
a
license
•
Deep
external
dependencies
provisioned
live
from
the
web
at
deployment
or
runEme
• DistribuEon
via
network
/
Internet
deployment
•
Many
more
snippets
10. Managing
Open
Source
in
the
GitHub
Era
Challenges
-‐
Missing
licenses
• No
license
from
copyright
holder
means
that
you
do
not
have
a
right
to
copy
or
re-‐use
the
soPware
• License
at
project
/
README
level
helps,
but…
• Ambiguous
without
noEces
in
source
files
• License
informaEon
is
lost
when
code
is
parEally
copied
• Not
a
new
problem,
but
scale
is
increasing
rapidly
12. Managing
Open
Source
in
the
GitHub
Era
Challenges
–
Snippets
• Many
snippet-‐sharing
/
educaEonal
web
sites
have
vague
or
no
license
terms
– Someone
who
posts
a
code
snippet
or
code
example
does
not
usually
think
about
an
explicit
license
– Terms
of
service
are
the
typical
default
• StackOverflow
example
– Major
source
of
advice
about
coding
including
code
snippets
– StackOverflow
license
is
CC-‐BY-‐SA
which
is
effecEvely
copyleP
13. Managing
Open
Source
in
the
GitHub
Era
Challenges
–
JavaScript
example
• Accelerated
usage
on
server
and
clients
–
esp.
mobile
• Very
common
to
mash
up
snippets
of
JavaScript
from
mulEple
origins
and
compile/minify
them
in
a
single
file
for
execuEon
efficiency
– License
informaEon
oPen
lost
when
extracEng
snippets
– Most
restricEve
license
applies
to
the
JavaScript
file
• jQuery
core
components
are
MIT-‐licensed,
but
components
named
jquery-‐xxxxx
may
be
copyleP-‐
licensed
– ExecuEng
JS
on
client
could
be
considered
distribuEon
– And
could
have
copyleP
impact
on
server-‐side
code
14. Managing
Open
Source
in
the
GitHub
Era
Healthcare.gov
case
study
• Healthcare.gov
uses
JavaScript
code
from
Datatables
(jQuery
plug-‐in
under
BSD
3-‐clause
or
GPL
v2)
• Weekly
Standard
accused
HHS
of
removing
copyright
&
license
noEces
from
the
borrowed
code
• Our
analysis
determined
that
the
developers
did
not
remove
noEces
–
they
created
their
own
Datatables.js
file
from
snippets
from
other
Datatables
project
files
that
did
not
contain
license
noEces
• HHS
quickly
corrected
this
case,
but
the
error
indicates
poor
guidance
to
developers
See
hNp://www.dejacode.org/healthcare_case_study.html
15. Managing
Open
Source
in
the
GitHub
Era
Challenges
–
Managing
dependencies
• Java,
JavaScript,
Ruby,
Go
and
many
newer
languages
automate
provisioning
of
required
components,
aka
dependencies
• AutomaEon
is
convenient
for
developers,
but
adds
risk
– Dependent
components
may
not
be
provisioned
unEl
deployment
or
runEme
– Dependencies
may
be
deep
and
recursive
– AutomaEcally
provisioned
components
may
contain
“hidden”
security,
quality
or
licensing
issues
– Accurate
ANribuEon
for
OSS
components
may
be
very
complex
16. Managing
Open
Source
in
the
GitHub
Era
SoluEons
–
Dependency
Management
• A
basic
soluEon
is
“vendoring”
–
explicitly
control
provisioning
of
third-‐party
components
• SoP
vendoring
–
define
exact
list
of
third-‐party
component-‐
versions
from
known/veNed
repositories
• Hard
vendoring
–
physically
copy
the
third-‐party
component-‐
versions
to
a
/vendor
folder
in
your
codebase
• Different
repositories
/
plarorms
provide
different
tools
• Maven
and
others
for
Java
• .gitmodules
file
for
Git
• Godep
for
Go,
NPM
for
Node.js,
Bundler
for
Ruby,
etc.
17. Managing
Open
Source
in
the
GitHub
Era
Compliance
in
the
GitHub
Era
• Open
source
code
is
evolving
and
expanding
too
quickly
for
tradiEonal
source
code
scanning
and
matching
techniques
– Number
of
possible
matches
increase
with
each
fork
– Many
or
most
of
the
open
source
components
may
not
actually
be
in
your
codebase
(dependencies)
• Risk
focus
on
components
over
snippets
even
more
important
• AcceleraEng
proliferaEon
of
languages,
plarorms
and
repositories
requires
acEve
management
and
coordinaEon
from
business,
engineering
and
legal
teams
17
18. Managing
Open
Source
in
the
GitHub
Era
Compliance
in
the
GitHub
Era
• Adapt
policies
to
specific
languages
and
plarorms
upfront:
– Define
acceptable
licenses
in
context
of
the
technology
and
usage
• Distributed
as
soPware
package
or
Cloud-‐based
service?
• What
does
copyleP
mean
in
context?
– Create
Light-‐weight
process
for
idenEfying
and
resolving
provenance
gaps
/
issues
– Evaluate
preferred
sources
for
provisioning
components
– Determine
best
dependency
management
approach
for
each
technology
18
19. Managing
Open
Source
in
the
GitHub
Era
Compliance
in
the
GitHub
Era
• Embed
open
source
provenance
data
in
your
codebase
– As
close
to
the
code
as
possible
– Adapt
techniques
to
leverage
exisEng
tools
and
data
from
each
plarorm
/
repository
– Use
simple
approach
to
document
provenance
data
if
missing
from
original
project
– Instrument
your
build
processes
to
idenEfy
components
that
you
actually
use
in
each
deployed
product
• Most
accurate
way
to
track
and
fulfill
OSS
obligaEons
• Fully
automate
aNribuEon
documentaEon
• RedistribuEon,
if
applicable,
has
extra
steps
See
also
hNps://github.com/dejacode/about-‐code-‐tool
19
20. Managing
Open
Source
in
the
GitHub
Era
Compliance
in
the
GitHub
Era
• Establish
central
database
of
open
source
and
third-‐
party
components
– Collect
provenance
data
for
all
products
across
languages
and
plarorms
– Document
all
effecEve
component
dependencies
– Harmonize
open
source
compliance
by
product
across
languages
and
plarorms
• Current
soluEons
from
several
vendors,
but
no
OSS
soluEon
is
available
today
See
also
hNps://enterprise.dejacode.com/landing/
20
22. Managing
Open
Source
in
the
GitHub
Era
About
nexB
Inc.
• nexB
offers:
– DejaCode
–
a
central
business
system
for
managing
soPware
components
– SoPware
analysis/audit
services
for
products
and
for
acquisiEons
• 350+
soPware
audit
projects
completed
to-‐date
– Aggregated
audited
codebases
>
3
billion
lines
of
source
code
– Aggregated
value
of
the
acquisiEons
transacEons
>
$5B
• See
DejaCode
at
www.dejacode.com
23. Managing
Open
Source
in
the
GitHub
Era
Contacts
• O’Melveny
&
Myers
Heather
Meeker
hmeeker@omm.com
+1
650
473
2635
Subscribe
to
news
and
events
alert
at
hNp://heathermeeker.squarespace.com/
• nexB
Inc.
Michael
Herzog
mjherzog@nexB.com
+1
650
380
0680
24. Managing
Open
Source
in
the
GitHub
Era
Resources
–
OSS
Licensing
Trends
• Neil
McAllister
-‐
Study:
Most
projects
on
GitHub
not
open
source
licensed
hNp://www.theregister.co.uk/2013/04/18/github_licensing_study/
• MaN
Asay
-‐
Open
Source
Is
Old
School,
Says
The
GitHub
GeneraEon
hNp://readwrite.com/2013/05/15/open-‐source-‐is-‐old-‐school-‐says-‐the-‐github-‐
generaEon
• Richard
Fontana
-‐
Post
open
source
soPware,
licensing
and
GitHub
hNp://opensource.com/law/13/8/github-‐poss-‐licensing
• Simon
Phipps
-‐
GitHub
finally
takes
open
source
licenses
seriously
hNp://www.infoworld.com/arEcle/2611422/open-‐source-‐soPware/github-‐finally-‐
takes-‐open-‐source-‐licenses-‐seriously.html
• Armin
Ronacher
-‐
Licensing
in
a
Post
Copyright
World
hNp://lucumr.pocoo.org/2013/7/23/licensing/
24
25. Managing
Open
Source
in
the
GitHub
Era
Resources
–
OSS
Language
/
Repo
Trends
• GitHub
growth
and
language
trends
hNp://redmonk.com/dberkholz/2013/01/21/github-‐will-‐hit-‐5-‐million-‐users-‐within-‐a-‐
year/
hNp://redmonk.com/dberkholz/2014/05/02/github-‐language-‐trends-‐and-‐the-‐
fragmenEng-‐landscape/
hNp://beust.com/weblog/2014/05/03/language-‐popularity-‐on-‐github/
hNp://redmonk.com/dberkholz/2014/09/26/githubs-‐vanishing-‐acceleraEon/
• Repository
package
growth
staEsEcs
hNp://www.modulecounts.com/
• GitHub
Users
Worldwide
hNp://aasen.in/github_globe/
25