Announcement of cpp-mmf C++ open-source library

In the first days of August I published a stable version of the following project on GitHub:

https://github.com/carlomilanesi/cpp-mmf/

To install it, press the “Download ZIP” button at the right of its GitHub page, and read the file “manual.html”.

As explained in the Readme page, it is a C++98 (or successive versions) library to encapsulate access to memory-mapped files in every POSIX-compliant operating system (including Unix, FreeBSD, Linux, Mac OS X) and in Microsoft Windows.

For those who don’t know, memory-mapped files are very useful to optimize random access to binary files, and are very convenient because they let you manipulate their contents as if they were byte arrays. The library allow both read-only and read-write file access.

Some people use Microsoft Windows memory-mapped files also for implementing shared memory between processes. The cpp-mmf library is not meant to do that. If you need a platform-independent shared-memory C++ library, use another library, like QT or Boost.

Speaking of Boost, such library collection contains already a platform-independent library for accessing memory-mapped files, but I think it has at least the following shortcomings:

  • It cannot be installed separately, forcing you to install all Boost (around half gigabyte when expanded).
  • It is very slow to compile, because it includes a lot of header files. For example, the recompilation of a small Linux program that uses cpp-mmf takes 0.5 seconds, while the recompilation of an equivalent Linux program that uses Boost takes 2.5 seconds, that is five times as long.
  • It generates large object files and large executable files. For example the above programs, when stripped of debug symbols, is 9 KB long, while the corresponding Boost program is 28 KB long, that is three times as long.
  • It has no tutorial.

My cpp-mmf library is stable, meaning that now I have no intention to change it. Though, I have tested it only with 64-bit Windows 7 (with Visual C++ and GCC compilers) and with 32-bit Linux Mint (with Clang and GCC compilers), and only with small files, i.e. few bytes to few megabytes.

If someone points out a bug, or an important feature missing, or an important platform unsupported, I may change it to satisfy such requests.

Comments and suggestions are welcome, preferably by people who actually uses or wants to use this library.

Posted in Uncategorized | Leave a comment

Announcement of Cpp-Measures C++ library

Last September I published the following project on GitHub:

https://github.com/carlomilanesi/cpp-measures/

As explained in the Readme page, it is a C++11 library to encapsulate numbers in objects tagged by their unit of measurement. It uses a novel approach to such well-known problem, and it adds also support for handling 2D and 3D vector physical magnitudes.

It is still in development, but it has reached a maturity sufficient for production use.

Comments and suggestions are welcome, preferably by people who actually uses or wants to use this library.

Posted in Uncategorized | Leave a comment

Grammar style for code comments

There are several commenting styles in use.

Some programmers use a terse style like:

// Get value

Others write sentences in plain English language, like:

// This should get the current measured value.

When using natural language, some programmers use imperative form, like:

// Get the value.

while others use the indicative form, like:

// It gets the value.

sometimes shortened as:

// Gets the value.

Of course, it is better to use a uniform style; and if several programmers access the same code, a rule should be written be guide programmers to get such uniform style.

I propose (and, when possible, use) the following rule:

Inside the body of a function, always use the imperative form; outside use the indicative form.

The rationale is the following.

The body of a function is the implementation of the algorithm, and therefore it is procedure in nature, or, as it is said often, “imperative”. It is imperative as the statements are meant as orders that the programmer send to the computer. Of course, here I am talking only of imperative languages.

Comments inside function bodies are a usually pseudo-code, i.e. higher-level code that is typically written before the code itself as a specification, and then the comment is implemented by writing real code. If the code is quite self-explanatory, there is no need of comments. Comments are explanations for somewhat cumbersome implementations. These comments are a kind of natural language imperative programming, and therefore they should use the imperative form.

For example, here is a good comment:

// Check input until you reach end-of-file.
while ((ch = getchar() != EOF) ;

and here is a bad comment for the same code:

// It checks input until end-of-file is encountered.

The comments written just before function bodies are meant to explain the purpose of the function, not the implementation. They said what the function does, not how.

They are more a kind of declarative specification than a kind of imperative programming, even if the function is quite imperative and the function name is an imperative verb. These comments are not natural language imperative programming, they are natural language specification. Therefore an indicative verb is more appropriate.

Here is a good example of comment;

/// It returns the square root of "x", if "x" is non-negative.
/// Undefined otherwise.
double sqrt(double x);
Posted in Uncategorized | Leave a comment

Adopt a company library to strike a balance between flexibility and simplicity

When choosing a company programming standard, there are always competing choices, with different assets and liabilities.

The main dilemma may be described as “flexibility vs simplicity”.

A general-purpose programming language is more flexible and a specific-purpose programming language, like the so-called 4GLs, is more simple.

A large library or framework is more flexible, while a simple one is … simpler.

An extensible library or framework is more flexible, while a non-extensible one is usually simpler, and it is surely simpler than a library that has been radically extended by a client.

Let’s see an example. A user-interface component displays texts in a font that cannot be changed by the client. Another one uses a default font, and allows clients to set the font, but only at design-time. Another one uses a default font, allows clients to set the font at design-time and also at run-time.

Most applications need to use only a handful of carefully chosen fonts, and sometimes to switch at tun-time between them. If the library forces a single font, or forces to choose the font at design-time, it may be too rigid. But if it allows to choose among all the installed fonts, it may be unnecessarily complex, and it may run against several problems, like the absence of the chosen font on the user computer, or the illegibility of the resulting text.

I think that the best solution is the following.

A very powerful library is adopted, allowing all needed features, even at the cost of being unwieldly complex. At the extreme, it is the API of the operating system and graphical environment.

A specific-purpose library is developed in the company. Such library uses the adopted powerful library, and exposes and API to a category of application software. Such library is developed by a system-programmers, that is programmers who know well the powerful library, but little of application domain.

All application programmers are forbidden to use any library except the in-house developed specific-purpose library.

Whenever an application requirement cannot be satisfied by the current specific-purpose library, that requirement is removed, or the library is extended to satisfy that requirement, but anyway the application code is never allowed to access directly the general-purpose library.

For example, the specific-purpose library declares a handful of fonts, among which the application code must choose the desired font. If another font is really required by the application code, the library is extended by adding that font to its set of fonts.

Using this architecture, there is no absolute limit to what may be done by the applications, but application code remains simple and consistent, and typically also the resulting user-interface comes out quite consistent.

Posted in Uncategorized | Leave a comment

Using effectively components

For the last couple of decades, major software gurus are predicting that the future of software development is component-based software, as an evolution of object-oriented software.
By “component-based”, I mean using third-party object-oriented libraries by instantiating objects whose classes are declared in those libraries, without necessarily inheriting from those classes. Inheritance is seen as an advanced option to extend the features of an already rather complete object.
This kind of component-based software has had a rather good success for some styles of software. For example software built using RAD tools (like Visual Basic or Delphi), and software using the Microsoft COM model.
Large software systems are often not based on components, because many component-based implementations have many liabilities. Let’s see them.

The most notorious one is the one nicknamed “DLL hell”. It is the dependency problem of having several executable modules, and not updating all of them at the same time.

Another problem is the “monster class problem”. If a single class must provide all the feature that a client may need, it must include a huge number of features. For example in the Microsoft .NET Framework, there is the Windows.Forms subsystem that is a (large) GUI library, and it contains the DataGridView class to implement a widget to display and edit a scrollable table of items, that may be bound to an external collection or to a database.

This class has 1 public constructor, 153 public properties, 20 protected properties, 87 public methods, 287 protected methods, 187 public events. In all, there are 428 public members and 307 protected members. The nice thing of object-oriented programming, is that client code programmers can ignore the implementation of members; they just need to know the interface of members. But learning an interface composed of 428 members is a huge job! And that is just for those who want to instantiate that class. For those who want to inherit from that class, the members to know become 735.

Actually most users of that class don’t know actually all the members, but stick to the members they really need. But that amounts to bad programming, as you lose time to find the member you need, and often choose the wrong member.

I think that to be really usable, a class should not have more than few dozens of members. And the Windows.Forms DataGridView class exceeds that limit by a factor of ten.

Anyway, everyone of those hundreds of members have been added for a purpose, and for every member some programmers may need just that member, so it is difficult to remove unneeded members.

A better solution is the following.

Most programming shops (software houses or software development departments) use several times a single library component. Maybe they build a large application with several instances of that component, or they build several similar small applications, each one of them using that component one or more times.

Typically all the instances of a single component as used by a single programming shop use or should use no more than a few dozen members of that component.

Even more typically, all the instances of a single component, as used by a single programming shop, set or should set many properties at the same value. For example, a user-interface component has a Font property, that has a default value; in that programming shop, that component is always used or should always used with another font.

Another common case is that a property of component, as used by a single programming shop, has for 90% of cases the same value, that is different from the default one, and in the other 5% of cases, it has other values. Simply using the component, programmers are forced to specify the value for every instance.

Here a solution for all these cases is presented.

A new custom component is built by the programming shop. That component is just a simplified and specialized version of the library component. It is simplified as it publishes only a small fraction of the members published by the original class. It is specialized as it has default values different from those of the original class.

Of course, to ease the implementation, this new class contains the old class, and therefore it is just a façade to it. It shouldn’t public inheritance, as its instances shouldn’t be seen by the clients as instances of the original class.

To specify proper default values has two advantages:

* It is easier to develop a consistent look-and-feel.

* Less application code (or manual property setting) is necessary.

To have fewer members has two advantages:

* For programmers, it is easier to learn how to use that class in their code and to understand the code of other programmers that use that class.

* For programmers, it is easier to write correct programs, as fewer properties may have wrong values.

The simplification may be a restructuring of the members, not just an elimination. For example, instead of simply removing the BackColor and ForeColor properties, an enumeration of color styles may be substituted to them.

The simplification may be also an agglomeration of widgets. For example, instead of using separately a Label and an EditBox, a single EditField may contain both features.

I think that many people would think: “But there are already zillions of beautiful custom components!”

What I really think, is that no given set of components may satisfy all needs, and therefore every programming shop should implement their own set of components.

How is designed a component?

First, a toy application (or set of applications) containing all uses cases for the needed component should be built. Then, a component having only the members needed by that toy application should be built and used by that application. Then, the component should be used in real projects.

When a shortcoming of the component is encountered, it should attentively determined which is the case among the following ones:

  • The application is not really standard conforming, and therefore it should be changed, without changing the component.
  • The component is not powerful enough to support the standard guidelines, and therefore it should be fixed or extended. If the a component feature already used by some installed applications is to be changed, it is possible that a new incompatible version of the component is generated, and therefore a migration guide or a migration tool is required.
  • The standard guidelines are not flexible enough to support application requirements. A change in the guideline may result in a non-conformity of existing applications or components; in such cases, the previous two cases apply consequently.

Using this technique, the components begin as really minimal, and grow as applications require more features, but hopefully, remaining much simpler than the monstre components contained in libraries created by other organizations.

Posted in Uncategorized | Leave a comment

When to use C and when C++?

The C++ programming language has been created to supplant the C programming language. It has been a quite successful language, but C is still used in many projects. When it is better to stick to C and when the migration to C++ is come of age?

First of all, the most important aspect to choose any programming language are its knowledge by the developers, and the existence of code base to maintain.

Secondary factors are the availability of good tools and good libraries for the language, and a large and active community of developers.

But given that a completely new big project or a set of completely new projects is about to start, and the programming team has already some experience in using both C and C++, and for the target platform high quality tools are available, which language is to be used?

I think that the size of the program to build should be used as discriminating factor. And the size of the program should be measured by its addressable memory, using the following rule.

If the executable binary software to construct will need to address no more that 64KB of code and no more than 64KB of data (static+stack+heap), then use C language; otherwise use C++.

For old MS-DOS programmers, this requirement amounts to: if you are going to use the “small” memory model, using only “near” pointers, use C language; otherwise use C++.

Fifteen years ago, perhaps the threshold could be bigger (say 1MB for code+data) , because at that time C++ had some drawbacks that in the last fifteen years have been overcome.

The rationale for such a threshold is twofold:

  • If a program should be very strict in its memory use, then by using C it is easier to manage resource consumption, while by using C++ the design abstractions may make some resource waste slip in.
  • If a program should be small both in written code and in used libraries, the design abstractions and the large libraries of C++ are of little use.

But there is another important rule:

If less than 20% of developers is the team have read at least 1500 pages about C++, forget about using it.

This amounts to say: “A successful C++ team should have at least one knowledgeable C++ developer every five developers, and to become a knowledgeable C++ developer a person should read books for at least 1500 pages in all”. If this requirement is not fulfilled, study more, or hire C++ experts, or stick to C language. Of course all C++ developers must know some C++, but it is enough that some of them are experts, to design the architecture and solve problems. The others will begin with chore programming, and will become experts with time.

This is so, because C++ is difficult to use correctly (i.e. to make working software), but it is even more difficult to use well (i.e. to make efficient and maintainable software).

Posted in Uncategorized | Leave a comment

RAD ups and downs

Since around 1995, the acronym RAD is used for “Rapid Application Development”.

It means a development tool or an entire development environment that allows a programmer to quickly create and incrementally change an application whose requirements satisfy the application field of the tool or environment itself.

The most famous RAD environments are Microsoft Visual Basic and Borland Delphi, but there are much more.

There are also RAD tools that may be integrated in a non-RAD environment, like Glade.

As no developer wants to take more time than necessary to build a given piece of software, and as not every kind of software development is done using RAD techniques, obviously such techniques have limitations or disadvantages that prevent their usage to develop some kinds of applications.

Let’s first examine RAD techniques limitations.

RAD techniques are component-based. Using a RAD tool means to place in a window the icons representing the needed components.

All RAD tools are component-based, but not all component-based software tools are RAD tools.

In addition to being component-based, RAD tools add “direct manipulation”, that is such tools allow the user to manage the components by using a graphical application, and seeing immediately the effects of his/her changes.

There are advantages and disadvantages both with component-based software development tools and with direct-manipulation software development tools. Let’s see them separately.

There are essentially two kinds of RAD techniques: form designers and database accessors. These techniques are combined in data-bound controls, but may be used independently.

If most part of an application is not about dialog boxes nor about database access, RAD techniques are of little use. For example, no one uses a RAD tool to build a real-time application, or a device-driver, or a system administration script, or a computer game, or a compiler, as RAD techniques are not useful for such kinds of software. For some other applications they may be of some use, because they have some forms, or make some database access, but a large part of the application has other challenges.

There may be the case that RAD components are not flexible enough. They may impose a look-and-feel different from the one desired by the developer, or an inefficient database access technique.

Of course one may search for a better alternative, but it may be hard to find the optimum component in the world; and one may develop his/her own components, but it may be hard to do, as it requires advanced programming techniques.

Component programming has much progressed along the years.

The first generation RAD tools have no custom component support.

The second generation RAD tools allow to construct custom component using a lower-level language, like Visual Basic VBX, that have to be written in C language.

The third generation RAD tools allow to construct custom component using the same language used by application programmers, but the generated libraries have to be separately installed and run up against versioning problems, like Visual Basic OCX, that could be written in Visual Basic itself but have to be installed and registered in the user computer.

The fourth generation RAD tools allow to construct custom components using the same language used by application programmers, and incorporate them inside the application (using a linker), needing no installation, like Delphi VCL or .NET UserControls.

Another trend is that while first generation components can be created only at design-time, last generation components can be optionally created also at run-time, using specific code statements.

Except for maintaining legacy software, now there is no reason to use previous generation tools.

The main issue of component-based software is that for a component vendor it is hard to provide all the feature requested by all users of the component (i.e. application programmers) and no more. It is a situation analogous to that of any software library vendor, but with some differences.

A software library vendor may provide a huge library to support really many use-cases. The size of such huge library may slow down somewhat the compilation process, then the linker includes in the generated executable only the routines and data structures actually used. Therefore a small part of a huge library does not damage the performance of the resulting application.

Instead, a component first of all shows in the property page all its members, and if such lists (properties, methods, events) are very long, they may be daunting to navigate. In addition a component is added entirely to the generated executable; and if only a small portion of it is needed, that may disproportionately increase the data size, the code size, and the time to initialize and update all the data. Actually, components are typically used for large-grained features, that however use several tens of kilobytes and take several milliseconds to process. But if there is a need for a feature using less than ten kilobytes or whose update time is less than one millisecond, a RAD component may be inappropriate.

Regarding direct-manipulation, such paradigm is quite similar to WYSIWYG word processors. For simple tasks, this kind of word processors are quite easier to use than tagged text, like troff, TeX, or SGML, and actually they experienced a boom in usage not only among consumers and isolated clerks, but also in large organizations.

The main issues with WYSIWYG word processors are:

  • Non-standard and often obscure format, that may be processed only using the word processing application itself.
  • Non-obvious status of the word processor and of the document, that often generates unexpected results (for example, for a user it is not easy to distinguish between a hard new line and a word-wrap).
  • Possible inconsistent formatting, if direct formatting is applied instead of style-sheets.

Those issues are replicated in direct-manipulation RAD tools as:

  • Non-standard and often obscure settings, that may be processed only using the RAD tools itself.
  • Non-obvious status of the RAD tool and of the application source, that often generates unexpected results (for example, for a developer it is not easy to understand the order by which the components will be processed).
  • Possible inconsistent settings, if settings are applied directly to every component instead of using a routine.

In addition, direct-manipulation RAD tools have the following issues:

  • Many components do not provide internationalization support, as the developer has to write a specific text of a specific length.
  • Many components do not provide automatic layout facilities, as the developer has to place components in specific places of the form, and a layout change requires a long and cumbersome series of manual re-alignment.
  • Many RAD tools generate source code, that calls or is called by user code. Some tools are two-way, meaning that they immediately interpret changes the developer makes to the generated code to change the properties of the tools, while one-way tools forbid or ignore such changes. Both kinds of tools have issues.
  • In case on one-way tools, the code generated by RAD tools is not source code, and so it should be kept among intermediate files, but is has to be processed by the compiler like source code, and so it should be kept among source files.
  • In case on two-way tools, the code generated by RAD tools is not source code, but it may be intermixed with code written by the developer, which is source code, and so it should be handled part as generated code and part as source code.
  • Code generated by RAD tools may not respect coding standards, and may be hard to understand by developers when debugging.
  • Code generated by RAD tools be less efficient than equivalent code written manually by a developer.
  • Many RAD tools favor mixing user interface logic with application logic and with data access logic. The resulting code is simpler in simple cases, but is more tightly coupled, and therefore less easily maintainable, as every change in requirements may force major changes all over the application.
  • Many RAD tools have smart properties that may be used as simple variables but actually access the user interface toolkit or a database when accessed. Using such properties in loops may be inefficient and hard to test and debug.
  • Many RAD tools have smart properties that generates events when changed, and an arbitrary routine may be attached to such events. For example, when a statement changes the value of a check box widget, that changes may automatically call the same routine that is called when the user checks or unchecks that widget. Such behavior cause the code to be inefficient, hard to test and debug, and sometimes it may cause infinite recursion.

As a conclusion, when it is better to use a RAD tool and when not?

It is important to notice that a project may have subprojects and that a RAD tool may be used for some subprojects and not for others.

The choice to use a RAD tool for a given (sub)project depends mainly on the following conditions:

  • The size of the (sub)project in man-months. The smaller is the (sub)project, the better is to use a RAD tool.
  • The probability that the resulting software will be actually used anyway for a long time. The likelier the (sub)project will have a short life, the better is to use a RAD tool.
  • The fact that the (sub)project involves mainly data entry or database interfacing or not. The bigger is the portion of development dedicated to data-entry form designing or to database interfacing, the better is to use a RAD tool.

In short use a RAD tool for a small (sub)project, that probably will be used only for a short time (if at all), and that has many forms with many fields; but don’t use a RAD tool for a large (sub)project, that probably will be used for a long time, and that has many features not regarding forms nor databases.

For those who cannot think of software with a short life expectancy, here are some examples of projects that have an expected usage life of less than 6 months:

  • Software to organize specific political or social events.
  • Prototypes of larger applications.
  • Experimental software (if successful, they may be used for a long time, but they are probably unsuccessful).
  • Software to migrate data from a legacy information system to a newly adopted information system.

Of course, it is hard to decide, in case a project is of average size, or probably will be used for an average time, or has some forms but also some other features.

And it is hard also in case of a project that is large but it is only a collection of forms, and so on.

As a rule of thumb, I would suggest the following rules, where, as size of the project, I mean the effort to deliver the first version to end users.

  • If a (sub)project is large more than 12 man-months and has some data-entry-intensive parts, split it to a data-entry-intensive subproject and a computation-intensive subproject, if possible. This is needed to use the best tool for each job.
  • For a (sub)project that requires less than 10 man-months, if it is data-entry-intensive or if it has an expected usage life of less than 6 months, use a RAD tool, otherwise don’t.
  • For a (sub)project that requires more than 10 man-months, don’t use a RAD tool.
Posted in Uncategorized | Leave a comment