SlideShare ist ein Scribd-Unternehmen logo
1 von 47
How is distributed version control
software being used?

Daniel M German
University of Victoria
Git
●

Software engineers are moving towards git
–

●

And other DVCs

Github a major reason
The Promise of Git

From: http://thkoch2001.github.io/whygitisbetter/
Challenge 1
●

Personal repos are beyond reach

●

Local commits might never be observable
Challenge 2: History
“History is written by the victors”
Niccolò Machiavelli
Rebasing changes history
Save history before it is lost!
DVC model: a clone is a branch
Super-repository
●

Collection of repositories cloned (recursively)
from the same repo
–

At least one per developer
●

–

At least one public repository
●

–

In their personal computer
The blessed

In git, no way to trace them
Moving commits across the
superRepo
Method
Push
Pull
Email

Done at source, needs write access to source
Done at destination, needs read access to source
Source creates patch, recipient applies it
Merging in DCVs
●

If not all commits in destination
–

Create temp
branch

–

Copy commits

●

Merge locally

●

If created,
–

delete temp
branch
Ecosystem of Repos
Can we learn from Linux?
Life of a Patch in Linux
How can we observe them?
We have to find them!
Snapshot Mining
Continuous Mining
Continuous Mining

●

Every interval
–

Mine as many repos as known
●
●

What is new?
What has been deleted?
Continuous Mining
●

Challenges:
1) Finding the repositories
2) Logging their commits

Proactive vs Reactive implementation
•

Logging vs discovery
ContinuousMining of Linux
●

Linux has no centralized logging
–
–

●

Nobody really knows what the superRepo is
Commits flow without any event broadcasting
mechanism

Where do we find the activity?
–

Repos

–

Commits
Repos and Committers
●

Most repos will have a known set of persons
committing to them
–

Simplest case: its owner is the only committer

–

Extreme case: repo is used as centralized version
control system: everybody commits to it
Semiautomatic Process
●

Every 3 hrs, ask every repo:
–

What new commits do you have?

–

What commits did you delete?

–

Automatically resolve propagations
●

●

Commits might propagate before we scan

Daily:
–

Are commits in repo by unknown committers?
●

Answer:
–

is there a new repo? or is committer new to repo?
Implementation
●

Running since Nov. 2011
–

Currently scans 650 repos every 3 hrs

–

Retrieved
●
●

2.3 million commits (compared to 400k in Linus repo)
109 million records in propagation table
<commit-id, added|deleted, repo, when>
Discovery of new repos
Is one better than the other?
●

RQ1
–

●

Does continuousMining uncover a larger
development ecosystem than snapMining?

RQ2
–

Does continuousMining expose any missing
information, or bias in the recorded history of the
project recovered using snapMining?
Snapshot (Linus)
No Repos

Continuous
1

479

Commits

64k

533k

Non-merge Commits

59k

485k

Unique Non-merges

58k

135k

98.9%

27.9%

%unique non-merges
Non-merges that reached Blessed

43.1%

Different authors emails

3434

5646

Different authors

2883

4575

Different committers emails

283

1185

Different committers

245

1058
●

RQ2
–

Does continuousMining expose any missing
information, or bias in the recorded history of the
project recovered using snapMining?
Commit vs Patches

●

Commit ids are insufficient to tracks patches

●

Large amount of work not reaching blessed
The data in blessed is biased
●

37.9% of patches arriving at blessed did not
arrived in its original commit
Arrival of Commits at Blessed
Arrival of Commits at Blessed...
●

We can classify patches as a new feature or
bug-fix
So what? (the reviewer will ask)
●

What can we do with this data?
–

For researchers: enable empirical studies of
activities previously invisible

–

For practitioners: Implement traceability of
●
●

Commits and
Repos
Empirical study
●

RQ. What are the characteristics of the repos
in the Linux Super-repository
The Repos
The Repos
●

●

●

X: activity (in commits)
Y: ratio of commits accepted by
Linus to total commits
Shape:
–
–

●

Triangles: official repos
Circles: non-official repos

Size:
–
–

●

Smaller: consume commits
Larger: produce commits

Color: merge/commit ratio
–

Grey: never merge

–

“Cooler”: high ratio

–

“Warmer”: lower ratio
Consumers are very active
Propagation
●

RQ: How do repositories interact, and how do
commits propagate across repositories?
The Latency

Time of Authorship

Time of Commit
The Interaction
Linux Dashboard
●

We asked two linux maintainers:
–

●

Can this info be useful?

Answer:
–

“Yes”
… but not for what we expected...
Tracking commits in Linux
●

Need to track patches, not commits
–

Particularly important in consumer repositories

–

Need to cross-reference commits
●
●

–

What commits contain the same patch?
What commits are mentioned in the log?

Some repos track commits from blessed via
cherry-picking
●
●

Commit ids are useless
So they annotate log with the origin commit id
Example
Has it reached linux-next before
blessed?
●

Commits should pass through linux-next
before arriving at blessed.

●

If not, potential issue

●

Hard to do with current tools:
●

Patches change commit id
How distributed version control is transforming open source collaboration
How distributed version control is transforming open source collaboration

Weitere ähnliche Inhalte

Was ist angesagt?

Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day Linaro
 
Version Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part IVersion Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part ISergey Aganezov
 
Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...fureigh
 
Effective Git with Eclipse
Effective Git with EclipseEffective Git with Eclipse
Effective Git with EclipseChris Aniszczyk
 
LCA13: Upstreaming 101
LCA13: Upstreaming 101LCA13: Upstreaming 101
LCA13: Upstreaming 101Linaro
 
Using Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHubUsing Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHubAboutHydrology Slides
 
LCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your CodeLCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your CodeLinaro
 
EclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heavenEclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heavenmsohn
 
EclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at EclipseEclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at Eclipsemsohn
 
BLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes DevelopersBLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes DevelopersMartin Jinoch
 
AIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHubAIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHubNhi Nguyen
 
Understanding and Using Git at Eclipse
Understanding and Using Git at EclipseUnderstanding and Using Git at Eclipse
Understanding and Using Git at EclipseChris Aniszczyk
 
Linux Kernel Introduction
Linux Kernel IntroductionLinux Kernel Introduction
Linux Kernel IntroductionSage Sharp
 
Effective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and HudsonEffective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and HudsonChris Aniszczyk
 
FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013Jeremy Green
 
Breaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success StoryBreaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success StorySage Sharp
 

Was ist angesagt? (20)

Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day
 
Github101
Github101Github101
Github101
 
Version Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part IVersion Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part I
 
Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...
 
Effective Git with Eclipse
Effective Git with EclipseEffective Git with Eclipse
Effective Git with Eclipse
 
LCA13: Upstreaming 101
LCA13: Upstreaming 101LCA13: Upstreaming 101
LCA13: Upstreaming 101
 
Using Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHubUsing Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHub
 
LCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your CodeLCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your Code
 
EclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heavenEclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heaven
 
EclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at EclipseEclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at Eclipse
 
BLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes DevelopersBLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes Developers
 
AIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHubAIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHub
 
Understanding and Using Git at Eclipse
Understanding and Using Git at EclipseUnderstanding and Using Git at Eclipse
Understanding and Using Git at Eclipse
 
Linux Kernel Introduction
Linux Kernel IntroductionLinux Kernel Introduction
Linux Kernel Introduction
 
Effective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and HudsonEffective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and Hudson
 
FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013
 
Git General
Git GeneralGit General
Git General
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Breaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success StoryBreaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success Story
 
Git
GitGit
Git
 

Andere mochten auch

he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME dmgerman
 
File (20)
File (20)File (20)
File (20)lilfato
 
Components license
Components licenseComponents license
Components licensedmgerman
 
Cregit Recovering token level authorship from Git
Cregit Recovering token level authorship from GitCregit Recovering token level authorship from Git
Cregit Recovering token level authorship from Gitdmgerman
 
The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014dmgerman
 

Andere mochten auch (6)

he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME
 
File (20)
File (20)File (20)
File (20)
 
VVP100_engl
VVP100_englVVP100_engl
VVP100_engl
 
Components license
Components licenseComponents license
Components license
 
Cregit Recovering token level authorship from Git
Cregit Recovering token level authorship from GitCregit Recovering token level authorship from Git
Cregit Recovering token level authorship from Git
 
The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014
 

Ähnlich wie How distributed version control is transforming open source collaboration

Debian general presentation
Debian general presentationDebian general presentation
Debian general presentationDing Zhou
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPCVanessa S
 
Git presentation
Git presentationGit presentation
Git presentationjordimash
 
Make It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version ControlMake It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version Controlindiver
 
Git for standalone use
Git for standalone useGit for standalone use
Git for standalone useIkuru Kanuma
 
Upstreaming 1013
Upstreaming 1013Upstreaming 1013
Upstreaming 1013Linaro
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99ashish61_scs
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Dan Cundiff
 
The Source Control Landscape
The Source Control LandscapeThe Source Control Landscape
The Source Control LandscapeLorna Mitchell
 
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, TrivandrumIntroduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, TrivandrumAbhijitNarayan2
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupMist.io
 
Get your FLOSS problems solved
Get your FLOSS problems solvedGet your FLOSS problems solved
Get your FLOSS problems solvedRex Tsai
 
Version Control, Writers, and Workflows
Version Control, Writers, and WorkflowsVersion Control, Writers, and Workflows
Version Control, Writers, and Workflowsstc-siliconvalley
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsC4Media
 

Ähnlich wie How distributed version control is transforming open source collaboration (20)

Debian general presentation
Debian general presentationDebian general presentation
Debian general presentation
 
Git SVN Migrate Reasons
Git SVN Migrate ReasonsGit SVN Migrate Reasons
Git SVN Migrate Reasons
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPC
 
Git presentation
Git presentationGit presentation
Git presentation
 
Hacktoberfest 2022
Hacktoberfest 2022Hacktoberfest 2022
Hacktoberfest 2022
 
finall_(1).pptx
finall_(1).pptxfinall_(1).pptx
finall_(1).pptx
 
Make It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version ControlMake It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version Control
 
Git for standalone use
Git for standalone useGit for standalone use
Git for standalone use
 
Upstreaming 1013
Upstreaming 1013Upstreaming 1013
Upstreaming 1013
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
 
The Source Control Landscape
The Source Control LandscapeThe Source Control Landscape
The Source Control Landscape
 
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, TrivandrumIntroduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetup
 
Github
GithubGithub
Github
 
Get your FLOSS problems solved
Get your FLOSS problems solvedGet your FLOSS problems solved
Get your FLOSS problems solved
 
Automatic codefixes
Automatic codefixesAutomatic codefixes
Automatic codefixes
 
Version Control, Writers, and Workflows
Version Control, Writers, and WorkflowsVersion Control, Writers, and Workflows
Version Control, Writers, and Workflows
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
 

Kürzlich hochgeladen

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Kürzlich hochgeladen (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

How distributed version control is transforming open source collaboration

  • 1. How is distributed version control software being used? Daniel M German University of Victoria
  • 2. Git ● Software engineers are moving towards git – ● And other DVCs Github a major reason
  • 3. The Promise of Git From: http://thkoch2001.github.io/whygitisbetter/
  • 4.
  • 5. Challenge 1 ● Personal repos are beyond reach ● Local commits might never be observable
  • 6. Challenge 2: History “History is written by the victors” Niccolò Machiavelli
  • 8. Save history before it is lost!
  • 9. DVC model: a clone is a branch
  • 10. Super-repository ● Collection of repositories cloned (recursively) from the same repo – At least one per developer ● – At least one public repository ● – In their personal computer The blessed In git, no way to trace them
  • 11. Moving commits across the superRepo Method Push Pull Email Done at source, needs write access to source Done at destination, needs read access to source Source creates patch, recipient applies it
  • 12. Merging in DCVs ● If not all commits in destination – Create temp branch – Copy commits ● Merge locally ● If created, – delete temp branch
  • 14. Can we learn from Linux?
  • 15. Life of a Patch in Linux
  • 16. How can we observe them?
  • 17. We have to find them!
  • 20. Continuous Mining ● Every interval – Mine as many repos as known ● ● What is new? What has been deleted?
  • 21. Continuous Mining ● Challenges: 1) Finding the repositories 2) Logging their commits Proactive vs Reactive implementation • Logging vs discovery
  • 22. ContinuousMining of Linux ● Linux has no centralized logging – – ● Nobody really knows what the superRepo is Commits flow without any event broadcasting mechanism Where do we find the activity? – Repos – Commits
  • 23. Repos and Committers ● Most repos will have a known set of persons committing to them – Simplest case: its owner is the only committer – Extreme case: repo is used as centralized version control system: everybody commits to it
  • 24. Semiautomatic Process ● Every 3 hrs, ask every repo: – What new commits do you have? – What commits did you delete? – Automatically resolve propagations ● ● Commits might propagate before we scan Daily: – Are commits in repo by unknown committers? ● Answer: – is there a new repo? or is committer new to repo?
  • 25. Implementation ● Running since Nov. 2011 – Currently scans 650 repos every 3 hrs – Retrieved ● ● 2.3 million commits (compared to 400k in Linus repo) 109 million records in propagation table <commit-id, added|deleted, repo, when>
  • 27. Is one better than the other? ● RQ1 – ● Does continuousMining uncover a larger development ecosystem than snapMining? RQ2 – Does continuousMining expose any missing information, or bias in the recorded history of the project recovered using snapMining?
  • 28. Snapshot (Linus) No Repos Continuous 1 479 Commits 64k 533k Non-merge Commits 59k 485k Unique Non-merges 58k 135k 98.9% 27.9% %unique non-merges Non-merges that reached Blessed 43.1% Different authors emails 3434 5646 Different authors 2883 4575 Different committers emails 283 1185 Different committers 245 1058
  • 29. ● RQ2 – Does continuousMining expose any missing information, or bias in the recorded history of the project recovered using snapMining?
  • 30. Commit vs Patches ● Commit ids are insufficient to tracks patches ● Large amount of work not reaching blessed
  • 31. The data in blessed is biased ● 37.9% of patches arriving at blessed did not arrived in its original commit
  • 32. Arrival of Commits at Blessed
  • 33. Arrival of Commits at Blessed... ● We can classify patches as a new feature or bug-fix
  • 34. So what? (the reviewer will ask) ● What can we do with this data? – For researchers: enable empirical studies of activities previously invisible – For practitioners: Implement traceability of ● ● Commits and Repos
  • 35. Empirical study ● RQ. What are the characteristics of the repos in the Linux Super-repository
  • 37. The Repos ● ● ● X: activity (in commits) Y: ratio of commits accepted by Linus to total commits Shape: – – ● Triangles: official repos Circles: non-official repos Size: – – ● Smaller: consume commits Larger: produce commits Color: merge/commit ratio – Grey: never merge – “Cooler”: high ratio – “Warmer”: lower ratio
  • 39. Propagation ● RQ: How do repositories interact, and how do commits propagate across repositories?
  • 40. The Latency Time of Authorship Time of Commit
  • 42. Linux Dashboard ● We asked two linux maintainers: – ● Can this info be useful? Answer: – “Yes” … but not for what we expected...
  • 43. Tracking commits in Linux ● Need to track patches, not commits – Particularly important in consumer repositories – Need to cross-reference commits ● ● – What commits contain the same patch? What commits are mentioned in the log? Some repos track commits from blessed via cherry-picking ● ● Commit ids are useless So they annotate log with the origin commit id
  • 45. Has it reached linux-next before blessed? ● Commits should pass through linux-next before arriving at blessed. ● If not, potential issue ● Hard to do with current tools: ● Patches change commit id