My previous posts about version control systems considered the need of a multiplicity of versions of source code. Using a centralized version control system (VCS for short), like Subversion, the versions of source code are:
- The one in volatile memory, when editing it.
- One in persistent local memory, when editing and testing it.
- One, handled by the VCS, in persistent memory, representing the latest tested version.
- One, handled by the VCS, in persistent memory, for every version released to the users, representing that version.
- One, handled by the VCS, in persistent memory for every version released to other developers or saved before a sweeping change, representing that released or saved version.
Actually, when a programmer wants to release the latest version to other developers or to the users, or wants to save a correct version before a major change that, at least for some time will not be as correct as the saved one, he releases the current version from his local working are to the repository. That procedure is named “commit” is (almost) all the VCSes.
After a commit, the repository has a new latest release, that is supposed to be of quite good quality. That latest release is the reference release for all the developers. Sometimes a developer needs to download the latest version, to get the changes to the software applied by other developers. This procedure is usually named “update” (because the local version is update using the latest change applied to the repository) or “pull” (because a version is moved from repository to the local work area with a command coming from the work area).
The latest release in the repository may be released to the users or not, and may be pulled by some, all, or no developers, according the needs. Of course, those changes, if not discarded, are there to stay. If some further commits add more changes, all those changes are added, and when a user or a developer gets the latest version, he gets all the changes of all the commits.
If a particular version is released to the users, or is considered a checkpoint before a major change, usually it is marked with a particular number or a particular name. Such marking is named “tag”. Therefore, is a repository there may be a version 876, 877, 878, an so on, and the version 877, as it is release to the users, is tagged as “3.1.0”, while the other versions are not tagged.
Actually the most compelling reason to adopt a VCS is to handle concurrent changes to the code base by several developers. Usually, all the developers need to read most of the source code, but to change only a relatively small part of it. What should happen if two programmers change the same source file and then release their changed version? Of course it is highly undesirable to allow that one programmers silently overwrites the changes of the other. Several solutions have been historically adopted:
- To keep all the sources is single persistent version, and when a user begins to edit a file, the editor automatically locks that file on persistent storage, so that no other user using that editor may edit that file; when the file is saved, every editor that is viewing that file is notified of the change, so that its volatile version is immediately updated, and the file is unlocked.
- To provide commands to download source files from the repository to the local work area, and simultaneously lock them. This command is named “check out”. Every attempt to check out a locked file fails. Of course a developer may copy other file form the repository without checking them out, but then he cannot change them. When a developer wants to release an edited file to other developers, or simply to allow other developers to check it out, he performs a “check in” command, that copies the file from the local work area to the repository and simultaneously unlocks it.
These two solutions suffers several disadvantages, and therefore they are no more used for source code. One of them is that if a developer forgets to check a file in, no other developer is safely allowed to edit it. One other defect is that a developer may check in a syntactically incorrect version, and no other developer is able to compile the software system.
Therefore, a better solution is to have a complete persistent version of the source code in the local work area of every developer. That version is lock-free, and is testable by the single programmers to ensure at least that it does compiles and starts.
The problems are two:
- When a developer wants to release his changes from to local work area to the repository, the repository may have been already update by another developer.
- When a developer wants to update his local work area from the repository, the local work area may have been already changed by him.
The first problem is solved by first updating the local work area with the latest version in the repository, and then committing the changes. The work area keeps in a hidden file the number of the version of the repository from which it was updated lastly. If a commit is attempted on a repository that has a latest version different from the one on which the work area is based, the commit fails.
The second problem is harder. The “update” or “pull” command has a merging capability that, if a file is changed only on he repository or only on the local work area, keeps the changed file. That capability is able also to merge two file when both have been changed, as long as the changes do not pertain the same lines. If a line has been changed both in the repository and in the local work area, the merge fails, but it generates a help merged file that contains highlighted both the the lines that couldn’t be merged. Then the developer needs to edit manually that file, deciding how to resolve the conflict, and then declaring to the VCS that the merge conflict has been resolved.