WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Hg version control bioinformaticians
1. Giovanni Dall'Olio,
IBE (UPF-CEXS)
Introduction to version control
and hg for our bioinformatics
group
2. What is hg?
● Programmers use software to keep track of all
the versions of the code they write. These are
called Version Control Systems (VCS)
● There are many software to make VCS; the
most renown are cvs, subversion, git, hg,
bazaar
● Git, hg and bazaar are newer and based on an
improved paradigm called Distributed Version
Control System (DVCS)
3. How will hg be useful for us?
● Keep versions of the scripts we create
● also for the datasets, results, etc..
● Have a common and official version of the
pipeline and the scripts, on bitbucket.org
● Everybody will work on his computer on his
version of the scripts; every once in a while, he
will merge it with the official version
4. Installing hg
● Hg can run on any operating system
● On linux, install it through your software center
● sudo apt-get install mercurial
● On other OS, go to http://mercurial.selenic.com/
and download the installer
5. Initial hg configuration
● Hg stores its configuration in a file called:
● ~/.hgrc on Unix
● C:Documents and Settingsyour_name.hgrc
● Open it and write your username:
[ui]
username = Giovanni Dall'Olio <dalloliogm@gmail.com>
6.
7. The basic operations of a VCS
● Creating a repository
● Can be equivalent to 'start keeping track of the
version of the files in this project'
● Adding files to the repository
● Files are not tracked unless you say so
● Committing changes
● Saving a version of the actual state of the files
● Pushing the changes and merging them with
the standard version
9. Effect of creating a new repo
● An hidden directory (.hg) will be created
● From now on, it will be possible to give other hg
commands
10. Adding files to the repo
● By default, no files are added to the repository
● It means that if you create a new file in the
directory, hg will ignore it
12. Files are not added automatically to
the repo
● The command:
● hg log file.txt
● should return the historial of changes of the file
file.txt. Since it is not in the repo yet, nothing is
shown
13. hg add
● To add a file to the repository, use hg add
● This will mean that the software should record
all the changes on that file
14. Committing changes
● The most important operation in VCS is the
commit
● This operation saves the status of the files
tracked and associate it with a version
● One commit → one version
15. Committing a change
● We have added the file file.txt to the repo
● This is a change compared to the previous
version (where this file was not present)
● So we have to record it with a commit
17. Effects of adding a file and
committing
● From now on, all the changes made to the file
will be tracked
18. What is being 'committed'?
● Every time you commit a new version, hg
stores the set of changes since the previous
version
● Other old VCS stored a copy of all the files for
each version
● => very big disk space occupation
● By storing only the changes, hg occupies less
space and makes it easier to compare versions
19. Hg diff
● The hg diff command will show the differences
between the file and its last saved version
20. Hg log
● Hg log will show the history of the changes in
the repository
22. The story continues..
● The basic operations in a VCS are adding files
to the tracking, and commit changes
● Next week we will see how to keep a copy of
our repository on a remote server, and how to
collaborate with other people
● Now I will show you some example of using a
version control system
23. Example: backup
● Imagine that for error, you remove a file or a
directory from your project
● With a VCS, you can revert to the previous
version and get the files back
24. Example: tracking code
● VCS have been developed to track changes in
the code
● Return to the point where you have made a mistake
or a typo
● Implementing a parallel version of the code, like
trying a different library or approach (branching)
● Remember what you have been doing, when you
have to change code written months ago
25. Example: releasing a software
● Mr. Werewolf publishes a software to predict
when the moon will be full
● The code gets adopted by the werewolf
community. Papers got published using it
● At a certain point, another werewolf discover a
bug in the code. It will be possible to seek the
version where the error occurred and identify all
the versions affected
26. Example: tracking data
● Version control can be applied to a dataset
● Example: Mr Dracula wants to write a paper on
the quality of the blood in his neighborhood.
Every time he gets new data, he commits a
change
27. Tracking everything else
● VCS can be applied to many kinds of file
● Usually they do not support binary files
● OpenOffice documents can be tracked (they
are XML)
28. Tracking huge files
● Hg stores the differences between two versions
● Storing all the 1000g will take:
● Some gigabytes to store a compressed version of the
files
● Less space to store the following commits (but these
commits will take time)
● Maybe it is not worth to put gigabytes of data
under version control
● No solution to date
● Some hg extensions for big files
29. How frequently should I commit?
● Everybody has his/her own phylosophy
● Some people prefer to commit every smallest
change
● Others prefer to make only a big commit every day
● As a general rule:
● The biggest the commit is, the most difficult is to
integrate it if there are conflicts
● It's up to you to decide
30. How to write the perfect commit
messages
● One or two sentences
● Avoid generic messages
● “new changes”, “fixed bugs”
● Use tags like 'Fix', 'Add', 'Config', etc..:
● “Fix: error when reading file”
● “Add: new function for plotting results”
● Cite the files changed if you think it may be
useful:
● Implemented new sorting algorithm for sorting.py