Usually the files used for a medium-to-large software development project are not stored in a single directory, but is several subdirectories of a project root directory.
Why files are stored in several directories?
In general, there are two reasons to use directories:
- To allow to store possibly different files having the same name.
- To allow to handle a group of files with a single command.
- To allow long pathnames in limited filesystems (like MS-DOS).
- To avoid slowdowns when many files are stored in a single directory in limited filesystems (like MS-DOS).
The last two reasons no more apply to modern operating systems.
Development files have distinguishing suffixes (“.c”, “.h”, “.o” or “.obj”), therefore, files of different type anyway have different names and can be handled as a separate group.
Therefore the above reasons become:
- To allow to store possibly different files of the same type having the same name.
- To allow to handle a group of files of the same type with a single command.
Therefore, let’s ideally start a software development project putting all the files in a single directory, and let’s find out which additional directories we need.
Separating source files
First of all, it is useful to split source files from generated files.
This is because we want to list source files, to back them up, to handle them using a revision control system, and on the other side we want to clear all generated files using a single command.
A generated file is not a kind of source file, and a source file is not a kind of generated file, and therefore it is not reasonable to put generated files in a subdirectory of the directory containing source files or vice versa.
So we have two parallel directories “source files”, or “src” for short, and “generated files”, or “gen” for short.
To simply run a program, we get into the “gen” directory, forgetting there is a “src” directory, and launch the generated program. Thus such directory should contain all is needed to run a program. Therefore, if a source file is actually needed “as is” to run a program, the build command should simply copy it from the directory “src” to the directory “gen”.
Separating generated configurations
With some development tools, like C and C++ compilers, there are several build options, like optimization flags, run-time checks (asserts), processor architecture targets, and often a single developer wants to keep generated versions for several option sets. Let’s name “configuration” a set of build options.
Typically, one want to keep a “debug” configuration and a “release” configuration, but more are usual. Therefore there is the need to keep many “generated files” directories for each supported configuration. Everyone of these directory “is a” generated directory, and therefore it is reasonable to put them under the “gen” directory”.
Up to now we have the following directory structure:
<project> | +-- src | +-- gen | +-- debug | +-- release
In such a way we can erase with a single command all generated files for all configurations, or only all generated files for a single configuration.
What if we have two or more orthogonal criterion to classify configuration?
For example we may need debug and release versions both for x86-32bit and x86-64bit.
There are three options, shown by the following structures.
<project> | +-- src | +-- gen | +-- debug | | | +-- x86-32bit | | | +-- x86-64bit | +-- release | +-- x86-32bit | +-- x86-64bit
Here above, first a distinction between debug and release configurations is made, and then, for each of them, there is a distinction between the processor architectures.
<project> | +-- src | +-- gen | +-- x86-32bit | | | +-- debug | | | +-- release | +-- x86-64bit | +-- debug | +-- release
Here above, first a distinction between processor architectures is made, and then, for each of them, there is a distinction between the debug and release configurations.
<project> | +-- src | +-- gen | +-- x86-32bit-debug | +-- x86-32bit-release | +-- x86-64bit-debug | +-- x86-64bit-release
Here above, there is a “flat” structure.
I think the best structure is the last one, i.e. the flat structure.
One can still select all debug configurations with the pattern “*-debug”, and all the 64bit configurations with the pattern “*-64bit-*”.
Instead the previous configurations do not allow both selections.
Separating other types of file
And where do we put pictures or other binary source files, and developer documentation?
Binary source files are of course source files; their name cannot clash with source code because of filename extension; they should be put under revision control even if they are binary; collective commands can isolate them by their filename extension. Therefore they should stay in the “src” directory, together with source code.
Developer documentation, like requirements or UML diagrams, may have a source version, for optimal editing and revision control, and a processed version, for optimal rendering. The source version, or the only version, should stay in the “src” directory,
while the possible processed version in the “gen” tree, provided there will be no need to edit it manually.
This raises the following question.
If there is only one kind of processing for a document, but 8 different configurations for processing source code, should we replicate the processing of the documentation for every configuration?
Well, such a procedure wouldn’t be so awful, as documents are usually easily processed and do not require inordinate amounts of storage space. However it may be better the following scheme.
<project> | +-- src | +-- gen | +-- x86-32bit-debug | +-- x86-32bit-release | +-- x86-32bit-doc | +-- x86-64bit-debug | +-- x86-64bit-release | +-- x86-64bit-doc | +-- -doc
The x86-32bit-doc directory contains all the processed documents specific for the x86-32bit processor architecture; the x86-64bit-doc directory contains all the processed documents specific for the x86-64bit processor architecture; and the -doc directory contains all the processed documents independent from the processor architecture.
But there is another, harder, question. What about code generated by design tools like UML tools or GUI designers?
Best code generation tools are two-way tools, a.k.a. round-trip engineering tools, i.e. such that when a change is made to the model in the tool, that change is applied to the code generated, without losing any previously code written by the user, provided that the user followed certain conventions, and when the generated code is edited in a text editor following those conventions, that change is automatically parsed and applied to the model in the tool. For such two-way tools, both the model and the generated code should be considered source code, as both may be explicitly changed by a user.
There is a redundancy in that, but it cannot be easily avoided.
Actually I don’t think that code generating tools are a real advantage for constructing easily maintainable software. They are RAD tools, but I think that Rapid Application Development invariably means Slow Application Maintenance.
One-way tools are even worse for software quality, but at least their generated code may be put into the generated code directory tree, with no redundancy in the “src” directory.
Automated test programs
And where to put test program sources, test data, test program executables?
Regarding test program sources, they of course should be considered source code.
The only issue is if they should be in the same directory of source code, in a subdirectory, or in a parallel directory.
It is rather typical that a test source file has the same name of a system-under-test source file, and however there is the need to handle with a single command all test source files or all non-test source files.
Five reasonable schemes (omitting the “gen” directory, and using the word “sut” for “System-Under-Test”, i.e. the main software product) come to my mind:
<project> | +-- sut | +-- test
Here above, test source code and SUT source code are parallel in the project root. This solution has the disadvantage of disallowing a single command to handle all source files.
<project> | +-- src | +-- test
Here above, test source code is in a subdirectory of the directory containing the SUT source code.
<project> | +-- src | +-- file.cpp | +-- test_file.cpp
Here above, test source code and SUT source code are in the same directory. Test source code files are distinguished by the fact that their names begin with “test_”. This solution has the disadvantage of disallowing a single command to handle all SUT (i.e. non-test) files.
<project> | +-- src | +-- sut_file.cpp | +-- test_file.cpp
Here above, test source code and SUT source code are in the same directory. Test source code files are distinguished by the fact that their names begin with “test_” and SUT source code file names begin with “sut_”. This solution has the disadvantage of forcing the use of prefixes to all file names.
<project> | +-- src | +-- sut | +-- test
Here above, test source code and SUT source code are parallel in the source code directory. This solution has the disadvantage of using the weird name “sut” for the directory containing non-test files. If possible, it is better to keep application file easily reachable and not into a hard-to-understand directory. Even renaming it “app” or “main” is not really better.
The only solution for which no disadvantage has been pointed out is the second. Therefore it should be the solution of choice. If needed, several test directory may be created, like “unit_test”, “system_test”, “stress_test”.
Regarding test data, well, there shouldn’t be much test data, as it would be hard to maintain.
If there is a need for stress test with much data, an algorithm should be implemented to generate it on-the-fly or to generate it (in the “gen” subtree) as part of the build process.
Given that there is not a huge amount of test data stored persistently, it should be considered part of source data, and therefore stored in the “src” subtree.
Regarding test executable programs, they of course should go in the “gen” subtree. If they share the configuration options of the SUT they should be in the same directory as the SUT generated files. Otherwise one or more parallel directories may be created in the “gen” directory.
A software product (program or library) may use libraries in several ways.
- External non-standard libraries, i.e. libraries created and maintained by another organization.
- In-house libraries, i.e. libraries maintained by the same organization, and used by several projects.
- Project-specific libraries, i.e. libraries created as part of a single project.
Regarding external libraries, like Boost, it’s better to put them in specific directories outside of the application projects directories. It may be where system libraries are installed or elsewhere. Then give to the build system references of the paths of such directories.
Regarding in-house libraries, they should be treated as external libraries, but not installed where system libraries are installed. In additions, but that’s not a directory issue, every major application should be registered as dependent for such libraries, so that a single command could build a library, run the tests for the library, rebuild all dependent projects, and possibly run all the automated tests for all dependent projects. If all this pass with no errors, it is a guarantee that the change to the library is not breaking existing code. Of course the library maintainer should keep a recent copy of all dependent projects. In addition, every project should keep a dependency on the library in its build script.
Regarding project-specific libraries, they are covered in the following section.
A project could generate several targets. Such situation is described by Microsoft Visual Studio as a “solution” referencing several “projects”, one for target.
There are many reasons for generating several targets (programs or libraries) from a single source code base. Many are wrong, but some are right.
However the reason, there is the need to organize such project. I think that it is better to put all the source files in the same directory, and possibly all the target files in the same directory too. Of course every file should have a distinct name, but that’s not so bad.
Sometimes there is the need to recreate the development environment of a previous revision of the software, for various reasons. And sometimes several different revision of the same software should coexist in the same PC for years. How to organize them?
Of course several branches do not share neither source code nor generated code, but they give the same name to different files, and therefore they should reside in different subtrees, and share a revision control repository. Well, it would be reasonable to put them in parallel subtrees.
As a conclusion, here is a typical directory organization for a project.
<project name> | +-- trunk | | | +-- src | | | | | +-- u_test | | | | | +-- sys_test | | | +-- gen | | | +-- x86-32bit-debug | | | +-- x86-32bit-release | | | +-- x86-64bit-debug | | | +-- x86-64bit-release | | | +-- doc +-- rev238 ... <in-house library name> ...