SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
The Ultimate Debian
  Database
  Israel Herraiz
  <israel.herraiz@upm.es>

  Davis, CA, July 26th 2012



Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
Outline

1. Debian: what is it and sources of data

2. The UDD: what is it and where to get it

3. What has been done and what we can do




                                             1 / 25
1. Debian: what is it and
sources of data

                            2 / 25
Debian

• GNU/Linux software distribution
   •   Goal: to deliver an entirely and exclusively free
       distribution
• Maintained by volunteers
• Bureaucratic organization (policies, constitution,
  social contract)
• Release when ready
• > 10 years history
• > 500 MSLOC
• > 15k packages
                                                           3 / 25
Debian Releases




                  4 / 25
5 / 25
Debian Source Packages




                         6 / 25
Source and Binary Packages

• A source package generates one or more binary
  packages
                                 octave-core

                                 octave-doc

   octave
                                  liboctave

                                 liboctave-dev


                                                 7 / 25
Package uploads

• There are no repositories like in other software
  projects
  •   Although developers may privately use version
      control systems
• When a bug is fixed, a new version is uploaded
  •   Uploads == commits




                                                      8 / 25
Source Packages metadata


Source: octave
Section: math
Priority: extra
Maintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org>
Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot
<sebastien.villemot@ens.fr>
DM-Upload-Allowed: yes
Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….
Standards-Version: 3.9.3
Homepage: http://www.octave.org/
Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git
Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git




                                                                                 9 / 25
Binary Packages metadata
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
                                                                        10 / 25
Binary Packages metadata
Package: octave
Priority: extra
Section: math
Installed-Size: 4760
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Version: 3.6.1-1ubuntu1ppa1~precise1
Recommends: gnuplot, libatlas3gf-base
Replaces: octave3.2
Suggests: octave-info, octave-doc, octave-htmldoc
Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …
Conflicts: octave3.2
Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb
Size: 1746050
MD5sum: 2c431556d6cf98fd8a341e865ac63058
SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7
Description: GNU Octave language for numerical computations…
                                                                        11 / 25
Debian Popcon: Tracking Installations

• Popularity: total
  install counts
  •   Recent Use (< 30
      days)
  •   Old Use (Beyond 30
      days)
• Data collected daily
• Users voluntarily opt-
  in
  •   Source of bias

                                                12 / 25
Debian Bugs

• People find bugs in binary packages
  •   ~500 bugs per month
• But bugs are linked to source packages
• Bugs can be
  •   Accepted and solved in Debian
  •   Rejected
  •   Forwarded to upstream
• Everything else, similar to other bug tracking
  systems
  •   Life cycle, comments, severity levels…
                                               13 / 25
2. The UDD: what is it and
where to get it

                             14 / 25
Research work: main paper (at MSR 2010)




                                          15 / 25
Other papers at MSR 2010




                           16 / 25
What is the UDD?

• PostgreSQL database with all the information of
  the sources described so far
  •   http://udd.debian.org
• New dumps available every two days
  •   ~ 500 MB bz2
• Used for some Debian internal services
• Schema too complex and too big for a slide 
• Technical detail: you need a Debian-based
  system to load the dump of the UDD

                                                17 / 25
Debian sources of data

• Sources / Packages • Lintian
  metadata           • Migrations to testing
• Bugs               • Uploads
    •   including *all*             •   All the way back to
        archived bugs                   1998!
        •   1995-96-97
                                •   New packages queue
•   Carnivore
                                •   Translations status
•   Debtags
                                •   Orphaned packages
•   Popularity Contest
                                •   Screenshots
•   DEHS
                                                              18 / 25
!

    19 / 25
Bear in mind!

• You can also obtain the source code of the
  packages
  •   Easy to automate
• And the modifications done by the Debian
  maintainers
• So add product metrics to the set of data
  sources
• But this is not included in the UDD


                                           20 / 25
3. What has been done and
what we can do

                            21 / 25
What kind of questions does Debian solve with the
                       UDD?
• High priority packages that have           Release
  Candidate blocker bugs
• Developers with very buggy and/or         outdated
  packages
• Who uploaded this package to the          unstable
  release?
• Who reported the RC bugs since            the last
  release?


                                                      22 / 25
Some questions solved in the literature

• The popularity bias
      •   http://oa.upm.es/9585/
  •   Open source projects get more bug reports if
      they are popular
  •   The actual number of bugs is not related to the
      number of bugs reported
  •   So more bugs actually means more quality
      •   Well, at least more people who decide to use the
          software


                                                             23 / 25
The popularity bias


            Required packages
Log(Bugs)




                    Log(installations)
                                         24 / 25
Summary

• Packages and sources metadata
     •   And source code
• Bugs
     •   All the way back to 1995-96-97!
• Popularity contest
• Maintainers activity (uploads)
     •   All the way back to 1998!
• And much more….
• Now, what do you think we can do with this?

                                                25 / 25

Weitere ähnliche Inhalte

Was ist angesagt?

Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Derek Buitenhuis
 
Comments on carriage of timed text and visual overlays in MP4
Comments on carriage of timed text and visual overlays in MP4Comments on carriage of timed text and visual overlays in MP4
Comments on carriage of timed text and visual overlays in MP4Cyril Concolato
 
Ceph Day Santa Clara: Ceph Fundamentals
Ceph Day Santa Clara: Ceph Fundamentals Ceph Day Santa Clara: Ceph Fundamentals
Ceph Day Santa Clara: Ceph Fundamentals Ceph Community
 
Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7Mazenetsolution
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
Perforce Helix Never Dies: DevOps at Bandai Namco StudiosPerforce Helix Never Dies: DevOps at Bandai Namco Studios
Perforce Helix Never Dies: DevOps at Bandai Namco StudiosPerforce
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesPerforce
 
Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSRoberto Franchini
 

Was ist angesagt? (11)

Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)
 
Comments on carriage of timed text and visual overlays in MP4
Comments on carriage of timed text and visual overlays in MP4Comments on carriage of timed text and visual overlays in MP4
Comments on carriage of timed text and visual overlays in MP4
 
Ceph Day Santa Clara: Ceph Fundamentals
Ceph Day Santa Clara: Ceph Fundamentals Ceph Day Santa Clara: Ceph Fundamentals
Ceph Day Santa Clara: Ceph Fundamentals
 
Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
Perforce Helix Never Dies: DevOps at Bandai Namco StudiosPerforce Helix Never Dies: DevOps at Bandai Namco Studios
Perforce Helix Never Dies: DevOps at Bandai Namco Studios
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 
How to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse BranchesHow to Reduce Database Load with Sparse Branches
How to Reduce Database Load with Sparse Branches
 
Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFS
 
MPEG-DASH open source tools and cloud services
MPEG-DASH open source tools and cloud servicesMPEG-DASH open source tools and cloud services
MPEG-DASH open source tools and cloud services
 

Andere mochten auch

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolutionIsrael Herraiz
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of MetricsIsrael Herraiz
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPMIsrael Herraiz
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key CryptographyIsrael Herraiz
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costIsrael Herraiz
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptographyIsrael Herraiz
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
 
Informe tecnico unidad 5 tap
Informe tecnico unidad 5 tapInforme tecnico unidad 5 tap
Informe tecnico unidad 5 tapIrving Che
 
Practica martes22
Practica martes22Practica martes22
Practica martes22jamarzo
 
Informe tecnico unidad 3
Informe tecnico unidad 3Informe tecnico unidad 3
Informe tecnico unidad 3Irving Che
 
Practica3
Practica3Practica3
Practica3jamarzo
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 
Electrónica analogica
Electrónica analogicaElectrónica analogica
Electrónica analogicaIrving Che
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguayJosé Caraguay
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguayJosé Caraguay
 
Fotos tomadas con ingenio
Fotos tomadas con ingenioFotos tomadas con ingenio
Fotos tomadas con ingenioJosé Caraguay
 

Andere mochten auch (20)

intensive metrics software evolution
intensive metrics software evolutionintensive metrics software evolution
intensive metrics software evolution
 
Statistical Distribution of Metrics
Statistical Distribution of MetricsStatistical Distribution of Metrics
Statistical Distribution of Metrics
 
¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM¿MATLAB? Yo uso Octave UPM
¿MATLAB? Yo uso Octave UPM
 
Public Key Cryptography
Public Key CryptographyPublic Key Cryptography
Public Key Cryptography
 
Software size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software costSoftware size distribution - Why we always underestimate software cost
Software size distribution - Why we always underestimate software cost
 
Public key cryptography
Public key cryptographyPublic key cryptography
Public key cryptography
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
Informe tecnico unidad 5 tap
Informe tecnico unidad 5 tapInforme tecnico unidad 5 tap
Informe tecnico unidad 5 tap
 
Practica martes22
Practica martes22Practica martes22
Practica martes22
 
Informe tecnico unidad 3
Informe tecnico unidad 3Informe tecnico unidad 3
Informe tecnico unidad 3
 
Esfera
EsferaEsfera
Esfera
 
Practica3
Practica3Practica3
Practica3
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
Electrónica analogica
Electrónica analogicaElectrónica analogica
Electrónica analogica
 
Comenzar
ComenzarComenzar
Comenzar
 
Examen
Examen Examen
Examen
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguay
 
Cb eval josé luis caraguay
Cb eval josé luis caraguayCb eval josé luis caraguay
Cb eval josé luis caraguay
 
Fotos tomadas con ingenio
Fotos tomadas con ingenioFotos tomadas con ingenio
Fotos tomadas con ingenio
 
Examen Word
Examen WordExamen Word
Examen Word
 

Ähnlich wie The Ultimate Debian Database

Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Distro Recipes 2013 : Debian and quality assurance
Distro Recipes 2013 : Debian and quality assuranceDistro Recipes 2013 : Debian and quality assurance
Distro Recipes 2013 : Debian and quality assuranceAnne Nicolas
 
Docker and the Linux Kernel
Docker and the Linux KernelDocker and the Linux Kernel
Docker and the Linux KernelDocker, Inc.
 
Linux: Everyting-as-a-service
Linux: Everyting-as-a-serviceLinux: Everyting-as-a-service
Linux: Everyting-as-a-serviceRohit Sansiya
 
Smau Milano 2016 - Fabio Alessandro Locati
Smau Milano 2016 - Fabio Alessandro LocatiSmau Milano 2016 - Fabio Alessandro Locati
Smau Milano 2016 - Fabio Alessandro LocatiSMAU
 
Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011Tim Bunce
 
Debian general presentation
Debian general presentationDebian general presentation
Debian general presentationDing Zhou
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"IT Event
 
Building community with CentOS Stream
Building community with CentOS StreamBuilding community with CentOS Stream
Building community with CentOS StreamDavide Cavalca
 
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10IxiaRomania
 
Upgrading CentOS on the Facebook fleet
Upgrading CentOS on the Facebook fleetUpgrading CentOS on the Facebook fleet
Upgrading CentOS on the Facebook fleetDavide Cavalca
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...CP-Union
 
CentOS Stream at Facebook
CentOS Stream at FacebookCentOS Stream at Facebook
CentOS Stream at FacebookDavide Cavalca
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
 
The Gory Details of Debian packages
The Gory Details of Debian packagesThe Gory Details of Debian packages
The Gory Details of Debian packagesJeremiah Foster
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EranexB Inc.
 

Ähnlich wie The Ultimate Debian Database (20)

Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Distro Recipes 2013 : Debian and quality assurance
Distro Recipes 2013 : Debian and quality assuranceDistro Recipes 2013 : Debian and quality assurance
Distro Recipes 2013 : Debian and quality assurance
 
Docker and the Linux Kernel
Docker and the Linux KernelDocker and the Linux Kernel
Docker and the Linux Kernel
 
Linux: Everyting-as-a-service
Linux: Everyting-as-a-serviceLinux: Everyting-as-a-service
Linux: Everyting-as-a-service
 
Smau Milano 2016 - Fabio Alessandro Locati
Smau Milano 2016 - Fabio Alessandro LocatiSmau Milano 2016 - Fabio Alessandro Locati
Smau Milano 2016 - Fabio Alessandro Locati
 
Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011Perl Dist::Surveyor 2011
Perl Dist::Surveyor 2011
 
Debian general presentation
Debian general presentationDebian general presentation
Debian general presentation
 
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
Leonid Vasilyev  "Building, deploying and running production code at Dropbox"Leonid Vasilyev  "Building, deploying and running production code at Dropbox"
Leonid Vasilyev "Building, deploying and running production code at Dropbox"
 
Building community with CentOS Stream
Building community with CentOS StreamBuilding community with CentOS Stream
Building community with CentOS Stream
 
CentOS at Facebook
CentOS at FacebookCentOS at Facebook
CentOS at Facebook
 
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
Why It’s Important to Contribute to Open-Source Projects | Keysight Connect #10
 
Upgrading CentOS on the Facebook fleet
Upgrading CentOS on the Facebook fleetUpgrading CentOS on the Facebook fleet
Upgrading CentOS on the Facebook fleet
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
Tito Mari Escaño on The Better Alternative Development and Startup Platform; ...
 
CentOS Stream at Facebook
CentOS Stream at FacebookCentOS Stream at Facebook
CentOS Stream at Facebook
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
The Gory Details of Debian packages
The Gory Details of Debian packagesThe Gory Details of Debian packages
The Gory Details of Debian packages
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub Era
 

Kürzlich hochgeladen

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 

Kürzlich hochgeladen (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 

The Ultimate Debian Database

  • 1. The Ultimate Debian Database Israel Herraiz <israel.herraiz@upm.es> Davis, CA, July 26th 2012 Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
  • 2. Outline 1. Debian: what is it and sources of data 2. The UDD: what is it and where to get it 3. What has been done and what we can do 1 / 25
  • 3. 1. Debian: what is it and sources of data 2 / 25
  • 4. Debian • GNU/Linux software distribution • Goal: to deliver an entirely and exclusively free distribution • Maintained by volunteers • Bureaucratic organization (policies, constitution, social contract) • Release when ready • > 10 years history • > 500 MSLOC • > 15k packages 3 / 25
  • 8. Source and Binary Packages • A source package generates one or more binary packages octave-core octave-doc octave liboctave liboctave-dev 7 / 25
  • 9. Package uploads • There are no repositories like in other software projects • Although developers may privately use version control systems • When a bug is fixed, a new version is uploaded • Uploads == commits 8 / 25
  • 10. Source Packages metadata Source: octave Section: math Priority: extra Maintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org> Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot <sebastien.villemot@ens.fr> DM-Upload-Allowed: yes Build-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo …. Standards-Version: 3.9.3 Homepage: http://www.octave.org/ Vcs-Git: git://git.debian.org/git/pkg-octave/octave.git Vcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git 9 / 25
  • 11. Binary Packages metadata Package: octave Priority: extra Section: math Installed-Size: 4760 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Version: 3.6.1-1ubuntu1ppa1~precise1 Recommends: gnuplot, libatlas3gf-base Replaces: octave3.2 Suggests: octave-info, octave-doc, octave-htmldoc Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), … Conflicts: octave3.2 Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb Size: 1746050 MD5sum: 2c431556d6cf98fd8a341e865ac63058 SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7 Description: GNU Octave language for numerical computations… 10 / 25
  • 12. Binary Packages metadata Package: octave Priority: extra Section: math Installed-Size: 4760 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Version: 3.6.1-1ubuntu1ppa1~precise1 Recommends: gnuplot, libatlas3gf-base Replaces: octave3.2 Suggests: octave-info, octave-doc, octave-htmldoc Depends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), … Conflicts: octave3.2 Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.deb Size: 1746050 MD5sum: 2c431556d6cf98fd8a341e865ac63058 SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7 Description: GNU Octave language for numerical computations… 11 / 25
  • 13. Debian Popcon: Tracking Installations • Popularity: total install counts • Recent Use (< 30 days) • Old Use (Beyond 30 days) • Data collected daily • Users voluntarily opt- in • Source of bias 12 / 25
  • 14. Debian Bugs • People find bugs in binary packages • ~500 bugs per month • But bugs are linked to source packages • Bugs can be • Accepted and solved in Debian • Rejected • Forwarded to upstream • Everything else, similar to other bug tracking systems • Life cycle, comments, severity levels… 13 / 25
  • 15. 2. The UDD: what is it and where to get it 14 / 25
  • 16. Research work: main paper (at MSR 2010) 15 / 25
  • 17. Other papers at MSR 2010 16 / 25
  • 18. What is the UDD? • PostgreSQL database with all the information of the sources described so far • http://udd.debian.org • New dumps available every two days • ~ 500 MB bz2 • Used for some Debian internal services • Schema too complex and too big for a slide  • Technical detail: you need a Debian-based system to load the dump of the UDD 17 / 25
  • 19. Debian sources of data • Sources / Packages • Lintian metadata • Migrations to testing • Bugs • Uploads • including *all* • All the way back to archived bugs 1998! • 1995-96-97 • New packages queue • Carnivore • Translations status • Debtags • Orphaned packages • Popularity Contest • Screenshots • DEHS 18 / 25
  • 20. ! 19 / 25
  • 21. Bear in mind! • You can also obtain the source code of the packages • Easy to automate • And the modifications done by the Debian maintainers • So add product metrics to the set of data sources • But this is not included in the UDD 20 / 25
  • 22. 3. What has been done and what we can do 21 / 25
  • 23. What kind of questions does Debian solve with the UDD? • High priority packages that have Release Candidate blocker bugs • Developers with very buggy and/or outdated packages • Who uploaded this package to the unstable release? • Who reported the RC bugs since the last release? 22 / 25
  • 24. Some questions solved in the literature • The popularity bias • http://oa.upm.es/9585/ • Open source projects get more bug reports if they are popular • The actual number of bugs is not related to the number of bugs reported • So more bugs actually means more quality • Well, at least more people who decide to use the software 23 / 25
  • 25. The popularity bias Required packages Log(Bugs) Log(installations) 24 / 25
  • 26. Summary • Packages and sources metadata • And source code • Bugs • All the way back to 1995-96-97! • Popularity contest • Maintainers activity (uploads) • All the way back to 1998! • And much more…. • Now, what do you think we can do with this? 25 / 25