SlideShare ist ein Scribd-Unternehmen logo
1 von 47
How is distributed version control
software being used?

Daniel M German
University of Victoria
Git
●

Software engineers are moving towards git
–

●

And other DVCs

Github a major reason
The Promise of Git

From: http://thkoch2001.github.io/whygitisbetter/
Challenge 1
●

Personal repos are beyond reach

●

Local commits might never be observable
Challenge 2: History
“History is written by the victors”
Niccolò Machiavelli
Rebasing changes history
Save history before it is lost!
DVC model: a clone is a branch
Super-repository
●

Collection of repositories cloned (recursively)
from the same repo
–

At least one per developer
●

–

At least one public repository
●

–

In their personal computer
The blessed

In git, no way to trace them
Moving commits across the
superRepo
Method
Push
Pull
Email

Done at source, needs write access to source
Done at destination, needs read access to source
Source creates patch, recipient applies it
Merging in DCVs
●

If not all commits in destination
–

Create temp
branch

–

Copy commits

●

Merge locally

●

If created,
–

delete temp
branch
Ecosystem of Repos
Can we learn from Linux?
Life of a Patch in Linux
How can we observe them?
We have to find them!
Snapshot Mining
Continuous Mining
Continuous Mining

●

Every interval
–

Mine as many repos as known
●
●

What is new?
What has been deleted?
Continuous Mining
●

Challenges:
1) Finding the repositories
2) Logging their commits

Proactive vs Reactive implementation
•

Logging vs discovery
ContinuousMining of Linux
●

Linux has no centralized logging
–
–

●

Nobody really knows what the superRepo is
Commits flow without any event broadcasting
mechanism

Where do we find the activity?
–

Repos

–

Commits
Repos and Committers
●

Most repos will have a known set of persons
committing to them
–

Simplest case: its owner is the only committer

–

Extreme case: repo is used as centralized version
control system: everybody commits to it
Semiautomatic Process
●

Every 3 hrs, ask every repo:
–

What new commits do you have?

–

What commits did you delete?

–

Automatically resolve propagations
●

●

Commits might propagate before we scan

Daily:
–

Are commits in repo by unknown committers?
●

Answer:
–

is there a new repo? or is committer new to repo?
Implementation
●

Running since Nov. 2011
–

Currently scans 650 repos every 3 hrs

–

Retrieved
●
●

2.3 million commits (compared to 400k in Linus repo)
109 million records in propagation table
<commit-id, added|deleted, repo, when>
Discovery of new repos
Is one better than the other?
●

RQ1
–

●

Does continuousMining uncover a larger
development ecosystem than snapMining?

RQ2
–

Does continuousMining expose any missing
information, or bias in the recorded history of the
project recovered using snapMining?
Snapshot (Linus)
No Repos

Continuous
1

479

Commits

64k

533k

Non-merge Commits

59k

485k

Unique Non-merges

58k

135k

98.9%

27.9%

%unique non-merges
Non-merges that reached Blessed

43.1%

Different authors emails

3434

5646

Different authors

2883

4575

Different committers emails

283

1185

Different committers

245

1058
●

RQ2
–

Does continuousMining expose any missing
information, or bias in the recorded history of the
project recovered using snapMining?
Commit vs Patches

●

Commit ids are insufficient to tracks patches

●

Large amount of work not reaching blessed
The data in blessed is biased
●

37.9% of patches arriving at blessed did not
arrived in its original commit
Arrival of Commits at Blessed
Arrival of Commits at Blessed...
●

We can classify patches as a new feature or
bug-fix
So what? (the reviewer will ask)
●

What can we do with this data?
–

For researchers: enable empirical studies of
activities previously invisible

–

For practitioners: Implement traceability of
●
●

Commits and
Repos
Empirical study
●

RQ. What are the characteristics of the repos
in the Linux Super-repository
The Repos
The Repos
●

●

●

X: activity (in commits)
Y: ratio of commits accepted by
Linus to total commits
Shape:
–
–

●

Triangles: official repos
Circles: non-official repos

Size:
–
–

●

Smaller: consume commits
Larger: produce commits

Color: merge/commit ratio
–

Grey: never merge

–

“Cooler”: high ratio

–

“Warmer”: lower ratio
Consumers are very active
Propagation
●

RQ: How do repositories interact, and how do
commits propagate across repositories?
The Latency

Time of Authorship

Time of Commit
The Interaction
Linux Dashboard
●

We asked two linux maintainers:
–

●

Can this info be useful?

Answer:
–

“Yes”
… but not for what we expected...
Tracking commits in Linux
●

Need to track patches, not commits
–

Particularly important in consumer repositories

–

Need to cross-reference commits
●
●

–

What commits contain the same patch?
What commits are mentioned in the log?

Some repos track commits from blessed via
cherry-picking
●
●

Commit ids are useless
So they annotate log with the origin commit id
Example
Has it reached linux-next before
blessed?
●

Commits should pass through linux-next
before arriving at blessed.

●

If not, potential issue

●

Hard to do with current tools:
●

Patches change commit id
How Linux uses Git
How Linux uses Git

Weitere ähnliche Inhalte

Was ist angesagt?

Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day Linaro
 
Version Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part IVersion Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part ISergey Aganezov
 
Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...fureigh
 
Effective Git with Eclipse
Effective Git with EclipseEffective Git with Eclipse
Effective Git with EclipseChris Aniszczyk
 
LCA13: Upstreaming 101
LCA13: Upstreaming 101LCA13: Upstreaming 101
LCA13: Upstreaming 101Linaro
 
Using Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHubUsing Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHubAboutHydrology Slides
 
LCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your CodeLCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your CodeLinaro
 
EclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heavenEclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heavenmsohn
 
EclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at EclipseEclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at Eclipsemsohn
 
BLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes DevelopersBLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes DevelopersMartin Jinoch
 
AIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHubAIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHubNhi Nguyen
 
Understanding and Using Git at Eclipse
Understanding and Using Git at EclipseUnderstanding and Using Git at Eclipse
Understanding and Using Git at EclipseChris Aniszczyk
 
Linux Kernel Introduction
Linux Kernel IntroductionLinux Kernel Introduction
Linux Kernel IntroductionSage Sharp
 
Effective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and HudsonEffective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and HudsonChris Aniszczyk
 
FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013Jeremy Green
 
Breaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success StoryBreaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success StorySage Sharp
 

Was ist angesagt? (20)

Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day Lcu14 312-Introduction to the Ecosystem day
Lcu14 312-Introduction to the Ecosystem day
 
Github101
Github101Github101
Github101
 
Version Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part IVersion Control Systems -- Git -- Part I
Version Control Systems -- Git -- Part I
 
Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...Git in gear: How to track changes, travel back in time, and code nicely with ...
Git in gear: How to track changes, travel back in time, and code nicely with ...
 
Effective Git with Eclipse
Effective Git with EclipseEffective Git with Eclipse
Effective Git with Eclipse
 
LCA13: Upstreaming 101
LCA13: Upstreaming 101LCA13: Upstreaming 101
LCA13: Upstreaming 101
 
Using Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHubUsing Git Inside Eclipse, Pushing/Cloning from GitHub
Using Git Inside Eclipse, Pushing/Cloning from GitHub
 
LCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your CodeLCA13: Why I Don't Want Your Code
LCA13: Why I Don't Want Your Code
 
EclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heavenEclipseCon 2010 talk: Towards contributors heaven
EclipseCon 2010 talk: Towards contributors heaven
 
EclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at EclipseEclipseCon 2010 tutorial: Understanding git at Eclipse
EclipseCon 2010 tutorial: Understanding git at Eclipse
 
BLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes DevelopersBLUG 2012 Version Control for Notes Developers
BLUG 2012 Version Control for Notes Developers
 
AIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHubAIS Technical Development Workshop 3: Getting Started with Git and GitHub
AIS Technical Development Workshop 3: Getting Started with Git and GitHub
 
Understanding and Using Git at Eclipse
Understanding and Using Git at EclipseUnderstanding and Using Git at Eclipse
Understanding and Using Git at Eclipse
 
Linux Kernel Introduction
Linux Kernel IntroductionLinux Kernel Introduction
Linux Kernel Introduction
 
Effective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and HudsonEffective Development With Eclipse Mylyn, Git, Gerrit and Hudson
Effective Development With Eclipse Mylyn, Git, Gerrit and Hudson
 
FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013FTP Commando to Git Hero - WordCamp Denver 2013
FTP Commando to Git Hero - WordCamp Denver 2013
 
Git General
Git GeneralGit General
Git General
 
Introduction to Git
Introduction to GitIntroduction to Git
Introduction to Git
 
Breaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success StoryBreaking into Open Source and Linux: A USB 3.0 Success Story
Breaking into Open Source and Linux: A USB 3.0 Success Story
 
Git
GitGit
Git
 

Andere mochten auch

he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME dmgerman
 
File (20)
File (20)File (20)
File (20)lilfato
 
Components license
Components licenseComponents license
Components licensedmgerman
 
Cregit Recovering token level authorship from Git
Cregit Recovering token level authorship from GitCregit Recovering token level authorship from Git
Cregit Recovering token level authorship from Gitdmgerman
 
The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014dmgerman
 

Andere mochten auch (6)

he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME he Future of Continuous Integration in GNOME
he Future of Continuous Integration in GNOME
 
File (20)
File (20)File (20)
File (20)
 
VVP100_engl
VVP100_englVVP100_engl
VVP100_engl
 
Components license
Components licenseComponents license
Components license
 
Cregit Recovering token level authorship from Git
Cregit Recovering token level authorship from GitCregit Recovering token level authorship from Git
Cregit Recovering token level authorship from Git
 
The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014The Promises and Perils of Mining Github: MSR'2014
The Promises and Perils of Mining Github: MSR'2014
 

Ähnlich wie How Linux uses Git

Debian general presentation
Debian general presentationDebian general presentation
Debian general presentationDing Zhou
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPCVanessa S
 
Git presentation
Git presentationGit presentation
Git presentationjordimash
 
Make It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version ControlMake It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version Controlindiver
 
Git for standalone use
Git for standalone useGit for standalone use
Git for standalone useIkuru Kanuma
 
Upstreaming 1013
Upstreaming 1013Upstreaming 1013
Upstreaming 1013Linaro
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99ashish61_scs
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Dan Cundiff
 
The Source Control Landscape
The Source Control LandscapeThe Source Control Landscape
The Source Control LandscapeLorna Mitchell
 
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, TrivandrumIntroduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, TrivandrumAbhijitNarayan2
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupMist.io
 
Get your FLOSS problems solved
Get your FLOSS problems solvedGet your FLOSS problems solved
Get your FLOSS problems solvedRex Tsai
 
Version Control, Writers, and Workflows
Version Control, Writers, and WorkflowsVersion Control, Writers, and Workflows
Version Control, Writers, and Workflowsstc-siliconvalley
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsC4Media
 

Ähnlich wie How Linux uses Git (20)

Debian general presentation
Debian general presentationDebian general presentation
Debian general presentation
 
Git SVN Migrate Reasons
Git SVN Migrate ReasonsGit SVN Migrate Reasons
Git SVN Migrate Reasons
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPC
 
Git presentation
Git presentationGit presentation
Git presentation
 
Hacktoberfest 2022
Hacktoberfest 2022Hacktoberfest 2022
Hacktoberfest 2022
 
finall_(1).pptx
finall_(1).pptxfinall_(1).pptx
finall_(1).pptx
 
Make It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version ControlMake It Cooler: Using Decentralized Version Control
Make It Cooler: Using Decentralized Version Control
 
Git for standalone use
Git for standalone useGit for standalone use
Git for standalone use
 
Upstreaming 1013
Upstreaming 1013Upstreaming 1013
Upstreaming 1013
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
Splunk All the Things: Our First 3 Months Monitoring Web Service APIs - Splun...
 
The Source Control Landscape
The Source Control LandscapeThe Source Control Landscape
The Source Control Landscape
 
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, TrivandrumIntroduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
Introduction to Git and Github - Google Developer Student Clubs CET, Trivandrum
 
Kubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetupKubernetes and CoreOS @ Athens Docker meetup
Kubernetes and CoreOS @ Athens Docker meetup
 
Github
GithubGithub
Github
 
Get your FLOSS problems solved
Get your FLOSS problems solvedGet your FLOSS problems solved
Get your FLOSS problems solved
 
Automatic codefixes
Automatic codefixesAutomatic codefixes
Automatic codefixes
 
Version Control, Writers, and Workflows
Version Control, Writers, and WorkflowsVersion Control, Writers, and Workflows
Version Control, Writers, and Workflows
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

How Linux uses Git

  • 1. How is distributed version control software being used? Daniel M German University of Victoria
  • 2. Git ● Software engineers are moving towards git – ● And other DVCs Github a major reason
  • 3. The Promise of Git From: http://thkoch2001.github.io/whygitisbetter/
  • 4.
  • 5. Challenge 1 ● Personal repos are beyond reach ● Local commits might never be observable
  • 6. Challenge 2: History “History is written by the victors” Niccolò Machiavelli
  • 8. Save history before it is lost!
  • 9. DVC model: a clone is a branch
  • 10. Super-repository ● Collection of repositories cloned (recursively) from the same repo – At least one per developer ● – At least one public repository ● – In their personal computer The blessed In git, no way to trace them
  • 11. Moving commits across the superRepo Method Push Pull Email Done at source, needs write access to source Done at destination, needs read access to source Source creates patch, recipient applies it
  • 12. Merging in DCVs ● If not all commits in destination – Create temp branch – Copy commits ● Merge locally ● If created, – delete temp branch
  • 14. Can we learn from Linux?
  • 15. Life of a Patch in Linux
  • 16. How can we observe them?
  • 17. We have to find them!
  • 20. Continuous Mining ● Every interval – Mine as many repos as known ● ● What is new? What has been deleted?
  • 21. Continuous Mining ● Challenges: 1) Finding the repositories 2) Logging their commits Proactive vs Reactive implementation • Logging vs discovery
  • 22. ContinuousMining of Linux ● Linux has no centralized logging – – ● Nobody really knows what the superRepo is Commits flow without any event broadcasting mechanism Where do we find the activity? – Repos – Commits
  • 23. Repos and Committers ● Most repos will have a known set of persons committing to them – Simplest case: its owner is the only committer – Extreme case: repo is used as centralized version control system: everybody commits to it
  • 24. Semiautomatic Process ● Every 3 hrs, ask every repo: – What new commits do you have? – What commits did you delete? – Automatically resolve propagations ● ● Commits might propagate before we scan Daily: – Are commits in repo by unknown committers? ● Answer: – is there a new repo? or is committer new to repo?
  • 25. Implementation ● Running since Nov. 2011 – Currently scans 650 repos every 3 hrs – Retrieved ● ● 2.3 million commits (compared to 400k in Linus repo) 109 million records in propagation table <commit-id, added|deleted, repo, when>
  • 27. Is one better than the other? ● RQ1 – ● Does continuousMining uncover a larger development ecosystem than snapMining? RQ2 – Does continuousMining expose any missing information, or bias in the recorded history of the project recovered using snapMining?
  • 28. Snapshot (Linus) No Repos Continuous 1 479 Commits 64k 533k Non-merge Commits 59k 485k Unique Non-merges 58k 135k 98.9% 27.9% %unique non-merges Non-merges that reached Blessed 43.1% Different authors emails 3434 5646 Different authors 2883 4575 Different committers emails 283 1185 Different committers 245 1058
  • 29. ● RQ2 – Does continuousMining expose any missing information, or bias in the recorded history of the project recovered using snapMining?
  • 30. Commit vs Patches ● Commit ids are insufficient to tracks patches ● Large amount of work not reaching blessed
  • 31. The data in blessed is biased ● 37.9% of patches arriving at blessed did not arrived in its original commit
  • 32. Arrival of Commits at Blessed
  • 33. Arrival of Commits at Blessed... ● We can classify patches as a new feature or bug-fix
  • 34. So what? (the reviewer will ask) ● What can we do with this data? – For researchers: enable empirical studies of activities previously invisible – For practitioners: Implement traceability of ● ● Commits and Repos
  • 35. Empirical study ● RQ. What are the characteristics of the repos in the Linux Super-repository
  • 37. The Repos ● ● ● X: activity (in commits) Y: ratio of commits accepted by Linus to total commits Shape: – – ● Triangles: official repos Circles: non-official repos Size: – – ● Smaller: consume commits Larger: produce commits Color: merge/commit ratio – Grey: never merge – “Cooler”: high ratio – “Warmer”: lower ratio
  • 39. Propagation ● RQ: How do repositories interact, and how do commits propagate across repositories?
  • 40. The Latency Time of Authorship Time of Commit
  • 42. Linux Dashboard ● We asked two linux maintainers: – ● Can this info be useful? Answer: – “Yes” … but not for what we expected...
  • 43. Tracking commits in Linux ● Need to track patches, not commits – Particularly important in consumer repositories – Need to cross-reference commits ● ● – What commits contain the same patch? What commits are mentioned in the log? Some repos track commits from blessed via cherry-picking ● ● Commit ids are useless So they annotate log with the origin commit id
  • 45. Has it reached linux-next before blessed? ● Commits should pass through linux-next before arriving at blessed. ● If not, potential issue ● Hard to do with current tools: ● Patches change commit id