In the last post I reached the conclusion that every programmer should keep at least the following versions of a piece of software:
- The one in volatile memory, when editing it.
- One in persistent local memory, when editing and testing it.
- One in persistent memory, representing the latest tested version.
- One in persistent memory for every released version, representing that released version.
But some more versions may be needed.
Consider this. You have a tested version and you need to apply several features. You may apply one feature, test it, released it on the repository, and then proceed with the next feature. When the next feature is released, it overwrites the previously released version, and, as no official release has been published in the meantime, it is lost forever. If afterwards you want to examine the changes applied to that version, you find the changes of all the features applied, and you may find hard to distinguish which change has been applied for which feature.
Alternatively, you may apply all the feature before the release; but more so in such a case, you will no more able to assign a specific change to a given feature.
Sometimes you introduce a defect when you try apply a feature. In such a case it is useful to spot exactly the changes that you applied for that feature, possibly several months or years before. When correcting defects in maintenance stage, it is typical to correct three or four defects a day. Therefore a proper versioning scheme should store up to 80 versions every month. And this for a single programmer.
If archiving a version is done by copying all the source code in a persistent storage, such versioning scheme may be unwieldy, because a complex software system may consists in several tens of megabytes of software, and therefore the repository may grow up to several tens of gigabytes in a couple of years. If you need to search when a specific line has been changed by searching tens of gigabytes of source code, you are going to lose much time.
If you consider that a feature applied in less than a couple of hours usually changes a very small number of the lines in your source code, the way to simplify your repository lays in storing only one (or few) complete version, and the differences between every version and the next or the previous version. This “delta encoding” results in a much smaller repository when every version changes only few lines. Of course, if most of the lines has been changed, a complete copy is better. In addition, to save space, the repository should be compressed, as source code is rather highly compressible.
The problem is that no one likes to handle personally the delta encoding and the data compression. Therefore a specific software system to handle the source code repository is highly desirable.
All this holds for a single developer. But if the programmer team consists in more than one developer who need to access the same source code, there is the added problem of synchronizing the accesses to the repository by the various developers.
Therefore, our desired source code repository handling software should also coordinate the concurrent accessed by several users.
Such a system exists. Actually there are many of them. Some people name such category of software “Revision Control System”, others “Version Control System”. Some of them are proprietary, some others are open-source. Some are definitely obsolete. The most famous open-source modern version control systems, in order of increasing ease of use, and decreasing flexibility are: Git, Mercurial, Subversion.
Git is extremely flexible, but also very hard-to-use and not well implemented in platform different from Linux.
Subversion has the limitation of being centralized. There is no problem in a centralized system when there is only one or two developers and if there is only one main line of development, but if there are three or more developers, or if there are several lines of development and some changes are to be applied to several lines, a centralized system is not as useful as a distributed system, like Git or Mercurial.
Therefore I think that Mercurial is a reasonable compromise between ease-of-use, portability, and flexibility.