My desired features for a Rust debugger

Backward stepping

Most debuggers provide the following commands:

  • Step into the next function (reaching its beginning-point)
  • Hop over the next function call
  • Jump to the end of the current function
  • Run (until next break-point or end of process)

But the most needed commands are rare. They are:

  • Back-step into the previous function (reaching its end-point)
  • Back-hop over the previous function call
  • Back-jump to the beginning of the current function
  • Back-run (until previous break-point or beginning of process)

The key combinations for such commands must be customizable, as many people is already used to specific key combinations for debugging.

Stop-points are not lines

Some debuggers consider any source-code line as a possible point where the program can stop (stop-point). That is not true, for three reasons:

  • Many lines do not contain executable code, and so the program cannot stop in such lines.
  • Many statements span several lines, even if they do not contain function calls. There is no point in stepping through such line, and so the whole statement should be a single stop-point.
  • Many statements contain function calls, sometime disguised as an operator. Such statements contain as many stop-points as the contained function-calls, plus one. Consider this statement:
     let a = f(g(3 + a), h(k()));.

    It contains 4 function calls, and so it contains 5 stop-points:
    – After having entered the statement, before calling “g”.
    – After having called “g”, before calling “k”.
    – After having called “k”, before calling “h”.
    – After having called “h”, before calling “f”.
    – After having called “f”, before assigning a value to “a”.

So, when a program is stopped, there shouldn’t be a current line, but a current stop-point, that is a slice of source code that usually does not begin at the beginning of a line, does not end at the end of a line, and may span several lines. Here are the 5 stop-points of the previous example statement:

let a = f(g(3 + a), h(k()));
          ^^^^^^^^
let a = f(g(3 + a), h(k()));
                      ^^^
let a = f(g(3 + a), h(k()));
                    ^^^^^^
let a = f(g(3 + a), h(k()));
        ^^^^^^^^^^^^^^^^^^^
let a = f(g(3 + a), h(k()));
^^^^^^^^^^^^^^^^^^^^^^^^^^^

To show the current stop-point, it should have a different color, and in addition a marker should be shown at the beginning of its first line, because the current stop-point can be off-screen.
Break-points are particular stop-points, selected to stop the execution of the program. They, too, should be marked both as source-code slices, and as beginning-of-line markers.

Smart watches

When the program is stopped, it should be possible to view the list of all alive variables and function arguments (even if shadowed), in reverse order of appearance (and therefore in order of future destruction).
For each variable, it should be automatically shown its source-code name, its type, and its stack contents. When a variable is selected, its “{:?}” contents should be shown in a scrollable window.
Such variable list should have a horizontal line at each stack frame begins, marked by the name of the beginning function.
Each variable should have a “watch” check-box. If such checkbox is checked, the contents of the variable is shown even when that variable is not selected. The contents of the variable is updated whenever the program is stopped, and the whole watch line should be differently colored if the value of the contents of a variable has changed with respect to the previous stop.
Each variable should have also a “stop when it changes” check-box. If such checkbox is checked, the program stops whenever the value of this variable changes (because of safe statements).

Call stack

When the program is stopped, a list of the current function calls should be listed. If a function call is selected in this list, the source-code view shows the source code of such function, and the watches view scrolls to the variables of such stack frame.

Post-mortem

If a program is executed in debugging-mode, when there is a panic, there should be a debug dump. Such dump should containing enough information to perform all the debugging operations that would be possible if the program were run in the debugger and stopped at the statement causing the panic.

Changing data

It can be useful to be able to change the value of memory. It should be easy to implement, as long as no allocation/deallocation is performed. For example, it should be easy to allow the change of a numeric variable value, while it can be hard to allow the extension of a string variable value.

Moving around

It can be useful to be able to set the current stop-point inside the current function, i.e. to skip or to re-run some statements.

Conclusion

Summarizing, I think GDB is a very bad debugging tool. I think Smalltalk debugger and Elm debugger are very good, instead. For Rust I wish a debugger that has the best features of Smalltalk and Elm.

Look here to see a Smalltalk debugger in action: https://www.youtube.com/watch?v=1kuoS796vNw.

Look here to see an Elm debugger in action: https://www.youtube.com/watch?v=RUeLd7T7Xi4.

Advertisements
Posted in Uncategorized | 2 Comments

Announcement of cpp-mmf C++ open-source library

In the first days of August I published a stable version of the following project on GitHub:

https://github.com/carlomilanesi/cpp-mmf/

To install it, press the “Download ZIP” button at the right of its GitHub page, and read the file “manual.html”.

As explained in the Readme page, it is a C++98 (or successive versions) library to encapsulate access to memory-mapped files in every POSIX-compliant operating system (including Unix, FreeBSD, Linux, Mac OS X) and in Microsoft Windows.

For those who don’t know, memory-mapped files are very useful to optimize random access to binary files, and are very convenient because they let you manipulate their contents as if they were byte arrays. The library allow both read-only and read-write file access.

Some people use Microsoft Windows memory-mapped files also for implementing shared memory between processes. The cpp-mmf library is not meant to do that. If you need a platform-independent shared-memory C++ library, use another library, like QT or Boost.

Speaking of Boost, such library collection contains already a platform-independent library for accessing memory-mapped files, but I think it has at least the following shortcomings:

  • It cannot be installed separately, forcing you to install all Boost (around half gigabyte when expanded).
  • It is very slow to compile, because it includes a lot of header files. For example, the recompilation of a small Linux program that uses cpp-mmf takes 0.5 seconds, while the recompilation of an equivalent Linux program that uses Boost takes 2.5 seconds, that is five times as long.
  • It generates large object files and large executable files. For example the above programs, when stripped of debug symbols, is 9 KB long, while the corresponding Boost program is 28 KB long, that is three times as long.
  • It has no tutorial.

My cpp-mmf library is stable, meaning that now I have no intention to change it. Though, I have tested it only with 64-bit Windows 7 (with Visual C++ and GCC compilers) and with 32-bit Linux Mint (with Clang and GCC compilers), and only with small files, i.e. few bytes to few megabytes.

If someone points out a bug, or an important feature missing, or an important platform unsupported, I may change it to satisfy such requests.

Comments and suggestions are welcome, preferably by people who actually uses or wants to use this library.

Posted in Uncategorized | Leave a comment

Announcement of Cpp-Measures C++ library

Last September I published the following project on GitHub:

https://github.com/carlomilanesi/cpp-measures/

As explained in the Readme page, it is a C++11 library to encapsulate numbers in objects tagged by their unit of measurement. It uses a novel approach to such well-known problem, and it adds also support for handling 2D and 3D vector physical magnitudes.

It is still in development, but it has reached a maturity sufficient for production use.

Comments and suggestions are welcome, preferably by people who actually uses or wants to use this library.

Posted in Uncategorized | Leave a comment

Grammar style for code comments

There are several commenting styles in use.

Some programmers use a terse style like:

// Get value

Others write sentences in plain English language, like:

// This should get the current measured value.

When using natural language, some programmers use imperative form, like:

// Get the value.

while others use the indicative form, like:

// It gets the value.

sometimes shortened as:

// Gets the value.

Of course, it is better to use a uniform style; and if several programmers access the same code, a rule should be written be guide programmers to get such uniform style.

I propose (and, when possible, use) the following rule:

Inside the body of a function, always use the imperative form; outside use the indicative form.

The rationale is the following.

The body of a function is the implementation of the algorithm, and therefore it is procedure in nature, or, as it is said often, “imperative”. It is imperative as the statements are meant as orders that the programmer send to the computer. Of course, here I am talking only of imperative languages.

Comments inside function bodies are a usually pseudo-code, i.e. higher-level code that is typically written before the code itself as a specification, and then the comment is implemented by writing real code. If the code is quite self-explanatory, there is no need of comments. Comments are explanations for somewhat cumbersome implementations. These comments are a kind of natural language imperative programming, and therefore they should use the imperative form.

For example, here is a good comment:

// Check input until you reach end-of-file.
while ((ch = getchar() != EOF) ;

and here is a bad comment for the same code:

// It checks input until end-of-file is encountered.

The comments written just before function bodies are meant to explain the purpose of the function, not the implementation. They said what the function does, not how.

They are more a kind of declarative specification than a kind of imperative programming, even if the function is quite imperative and the function name is an imperative verb. These comments are not natural language imperative programming, they are natural language specification. Therefore an indicative verb is more appropriate.

Here is a good example of comment;

/// It returns the square root of "x", if "x" is non-negative.
/// Undefined otherwise.
double sqrt(double x);
Posted in Uncategorized | Leave a comment

Adopt a company library to strike a balance between flexibility and simplicity

When choosing a company programming standard, there are always competing choices, with different assets and liabilities.

The main dilemma may be described as “flexibility vs simplicity”.

A general-purpose programming language is more flexible and a specific-purpose programming language, like the so-called 4GLs, is more simple.

A large library or framework is more flexible, while a simple one is … simpler.

An extensible library or framework is more flexible, while a non-extensible one is usually simpler, and it is surely simpler than a library that has been radically extended by a client.

Let’s see an example. A user-interface component displays texts in a font that cannot be changed by the client. Another one uses a default font, and allows clients to set the font, but only at design-time. Another one uses a default font, allows clients to set the font at design-time and also at run-time.

Most applications need to use only a handful of carefully chosen fonts, and sometimes to switch at tun-time between them. If the library forces a single font, or forces to choose the font at design-time, it may be too rigid. But if it allows to choose among all the installed fonts, it may be unnecessarily complex, and it may run against several problems, like the absence of the chosen font on the user computer, or the illegibility of the resulting text.

I think that the best solution is the following.

A very powerful library is adopted, allowing all needed features, even at the cost of being unwieldly complex. At the extreme, it is the API of the operating system and graphical environment.

A specific-purpose library is developed in the company. Such library uses the adopted powerful library, and exposes and API to a category of application software. Such library is developed by a system-programmers, that is programmers who know well the powerful library, but little of application domain.

All application programmers are forbidden to use any library except the in-house developed specific-purpose library.

Whenever an application requirement cannot be satisfied by the current specific-purpose library, that requirement is removed, or the library is extended to satisfy that requirement, but anyway the application code is never allowed to access directly the general-purpose library.

For example, the specific-purpose library declares a handful of fonts, among which the application code must choose the desired font. If another font is really required by the application code, the library is extended by adding that font to its set of fonts.

Using this architecture, there is no absolute limit to what may be done by the applications, but application code remains simple and consistent, and typically also the resulting user-interface comes out quite consistent.

Posted in Uncategorized | Leave a comment

Using effectively components

For the last couple of decades, major software gurus are predicting that the future of software development is component-based software, as an evolution of object-oriented software.
By “component-based”, I mean using third-party object-oriented libraries by instantiating objects whose classes are declared in those libraries, without necessarily inheriting from those classes. Inheritance is seen as an advanced option to extend the features of an already rather complete object.
This kind of component-based software has had a rather good success for some styles of software. For example software built using RAD tools (like Visual Basic or Delphi), and software using the Microsoft COM model.
Large software systems are often not based on components, because many component-based implementations have many liabilities. Let’s see them.

The most notorious one is the one nicknamed “DLL hell”. It is the dependency problem of having several executable modules, and not updating all of them at the same time.

Another problem is the “monster class problem”. If a single class must provide all the feature that a client may need, it must include a huge number of features. For example in the Microsoft .NET Framework, there is the Windows.Forms subsystem that is a (large) GUI library, and it contains the DataGridView class to implement a widget to display and edit a scrollable table of items, that may be bound to an external collection or to a database.

This class has 1 public constructor, 153 public properties, 20 protected properties, 87 public methods, 287 protected methods, 187 public events. In all, there are 428 public members and 307 protected members. The nice thing of object-oriented programming, is that client code programmers can ignore the implementation of members; they just need to know the interface of members. But learning an interface composed of 428 members is a huge job! And that is just for those who want to instantiate that class. For those who want to inherit from that class, the members to know become 735.

Actually most users of that class don’t know actually all the members, but stick to the members they really need. But that amounts to bad programming, as you lose time to find the member you need, and often choose the wrong member.

I think that to be really usable, a class should not have more than few dozens of members. And the Windows.Forms DataGridView class exceeds that limit by a factor of ten.

Anyway, everyone of those hundreds of members have been added for a purpose, and for every member some programmers may need just that member, so it is difficult to remove unneeded members.

A better solution is the following.

Most programming shops (software houses or software development departments) use several times a single library component. Maybe they build a large application with several instances of that component, or they build several similar small applications, each one of them using that component one or more times.

Typically all the instances of a single component as used by a single programming shop use or should use no more than a few dozen members of that component.

Even more typically, all the instances of a single component, as used by a single programming shop, set or should set many properties at the same value. For example, a user-interface component has a Font property, that has a default value; in that programming shop, that component is always used or should always used with another font.

Another common case is that a property of component, as used by a single programming shop, has for 90% of cases the same value, that is different from the default one, and in the other 5% of cases, it has other values. Simply using the component, programmers are forced to specify the value for every instance.

Here a solution for all these cases is presented.

A new custom component is built by the programming shop. That component is just a simplified and specialized version of the library component. It is simplified as it publishes only a small fraction of the members published by the original class. It is specialized as it has default values different from those of the original class.

Of course, to ease the implementation, this new class contains the old class, and therefore it is just a façade to it. It shouldn’t public inheritance, as its instances shouldn’t be seen by the clients as instances of the original class.

To specify proper default values has two advantages:

* It is easier to develop a consistent look-and-feel.

* Less application code (or manual property setting) is necessary.

To have fewer members has two advantages:

* For programmers, it is easier to learn how to use that class in their code and to understand the code of other programmers that use that class.

* For programmers, it is easier to write correct programs, as fewer properties may have wrong values.

The simplification may be a restructuring of the members, not just an elimination. For example, instead of simply removing the BackColor and ForeColor properties, an enumeration of color styles may be substituted to them.

The simplification may be also an agglomeration of widgets. For example, instead of using separately a Label and an EditBox, a single EditField may contain both features.

I think that many people would think: “But there are already zillions of beautiful custom components!”

What I really think, is that no given set of components may satisfy all needs, and therefore every programming shop should implement their own set of components.

How is designed a component?

First, a toy application (or set of applications) containing all uses cases for the needed component should be built. Then, a component having only the members needed by that toy application should be built and used by that application. Then, the component should be used in real projects.

When a shortcoming of the component is encountered, it should attentively determined which is the case among the following ones:

  • The application is not really standard conforming, and therefore it should be changed, without changing the component.
  • The component is not powerful enough to support the standard guidelines, and therefore it should be fixed or extended. If the a component feature already used by some installed applications is to be changed, it is possible that a new incompatible version of the component is generated, and therefore a migration guide or a migration tool is required.
  • The standard guidelines are not flexible enough to support application requirements. A change in the guideline may result in a non-conformity of existing applications or components; in such cases, the previous two cases apply consequently.

Using this technique, the components begin as really minimal, and grow as applications require more features, but hopefully, remaining much simpler than the monstre components contained in libraries created by other organizations.

Posted in Uncategorized | Leave a comment

When to use C and when C++?

The C++ programming language has been created to supplant the C programming language. It has been a quite successful language, but C is still used in many projects. When it is better to stick to C and when the migration to C++ is come of age?

First of all, the most important aspect to choose any programming language are its knowledge by the developers, and the existence of code base to maintain.

Secondary factors are the availability of good tools and good libraries for the language, and a large and active community of developers.

But given that a completely new big project or a set of completely new projects is about to start, and the programming team has already some experience in using both C and C++, and for the target platform high quality tools are available, which language is to be used?

I think that the size of the program to build should be used as discriminating factor. And the size of the program should be measured by its addressable memory, using the following rule.

If the executable binary software to construct will need to address no more that 64KB of code and no more than 64KB of data (static+stack+heap), then use C language; otherwise use C++.

For old MS-DOS programmers, this requirement amounts to: if you are going to use the “small” memory model, using only “near” pointers, use C language; otherwise use C++.

Fifteen years ago, perhaps the threshold could be bigger (say 1MB for code+data) , because at that time C++ had some drawbacks that in the last fifteen years have been overcome.

The rationale for such a threshold is twofold:

  • If a program should be very strict in its memory use, then by using C it is easier to manage resource consumption, while by using C++ the design abstractions may make some resource waste slip in.
  • If a program should be small both in written code and in used libraries, the design abstractions and the large libraries of C++ are of little use.

But there is another important rule:

If less than 20% of developers is the team have read at least 1500 pages about C++, forget about using it.

This amounts to say: “A successful C++ team should have at least one knowledgeable C++ developer every five developers, and to become a knowledgeable C++ developer a person should read books for at least 1500 pages in all”. If this requirement is not fulfilled, study more, or hire C++ experts, or stick to C language. Of course all C++ developers must know some C++, but it is enough that some of them are experts, to design the architecture and solve problems. The others will begin with chore programming, and will become experts with time.

This is so, because C++ is difficult to use correctly (i.e. to make working software), but it is even more difficult to use well (i.e. to make efficient and maintainable software).

Posted in Uncategorized | Leave a comment