July 26, 2010

The Diabolical Genius of C++

I have been feeling a strange pull to revisit C++ lately. Partly this is because C++ continues to be the language of choice for certain domains such as games and graphics programming that I find interesting. But more importantly, I have been curious to see what I would make of C++ if I took a fresh look at it all these years later, with the benefit of all the practical and theoretical expertise in programming and programming languages that I have acquired in the meantime.

I remember when the first edition of Effective C++ by Scott Meyers came out, and I was very tempted to buy it, but in those days computer books were ridiculously expensive, and I bought Bjarne Stroustrup’s equally fresh-off-the-press The C++ Programming Language 2nd edition instead, on the logic that it was the official reference and so its utility would stand the test of time. (Something you couldn’t count on with most technical books of the time.)

So given that Meyers’ book is still in print (in its 3rd edition) almost 20 years later, I figured it was the best place to go to reacquaint myself with C++.

Overall, I found Effective C++ to be a good book with good advice (though I might quibble here or there) and an excellent reminder of what programming C++ is like. What I rediscovered was that C++ is both a paragon and abomination of programming language design. It is in a way a poster child for the title of my blog, “Philosophy Made Manifest”, in the sense that it is so exquisitely the logical outcome of its philosophical premises. More particularly, it is the synthesis of two, quite different philosophies of software construction, and the extent to which it is a paragon or abomination is a direct result of the relative compatibility and incompatibility of these two different paradigms , in the Thomas Kuhn, The Structure of Scientific Revolutions sense.

To give you a sense of what these two philosophies are like, I can use my own early development as a programmer as an example. Like many programmers of my generation, my first language was a flavour of BASIC. BASIC was a good language to get your feet wet with programming, but too much of it was “magic”, in the sense that it buffered you from the real workings of the machine. It was fine for relatively simple programs, but once you got to more complex applications on a machine with memory measured in kilobytes, it didn’t really give you enough awareness or control of your environment to manage your resources.

The next step up from there was often assembly code or even raw machine language, and that looked like alphanumeric gibberish rather than comprehensible language.

So when I discovered C, it was a revelation. Here was a language that had a comprehensible syntax like BASIC but that really allowed you to specify exactly how you were using your resources. By this time, I had 1MB of RAM and clock speeds in the MHz, which seemed like a lot at the time, but for some of the applications I was interested in, you still had to juggle your resources to make this work, and C let you do that. With C, I went from being a dabbler in programming to being a programmer.

Moreover, C helped to give me an entrée into the world of assembly. The C compiler I used allowed me to generate assembly code as output, and I became very familiar with how my C code was translated to the machine. Sometimes I even wrote super-optimized functions in assembly for maximum speed and efficiency and called them from C.

But this was the first sign of trouble in paradise. Two things became apparent to me around this time.

The first was that the “C with assembly” approach wasn’t very portable or maintainable. Different versions of the 80x86 architecture, of DOS (and soon Windows), and of the compiler and associated libraries, made work done this way very fragile.

The second was that the low-level approach of C was very awkward for higher-level tasks such as GUI design, where what you wanted was reusable components that interacted on an event model. And this is the locus of the paradigm shift: from programmer as juggler of resources on a machine to programmer as designer of solutions in the problem space.

I think this dichotomy is a major one in software development, and it is not the exclusive domain of the C/C++ world. I’ve seen it play out in the Java world too.

Many technical people are (reasonably enough) oriented towards the technical side of things. They’re focused on the solution space. They have staked their professional mastery on understanding the arcane details of various languages, platforms, libraries and tools. This leads to a natural tendency to believe that the role of the programmer is to redefine the problem until it fits the bounds of the available solutions. They seek to turn the problem at hand into the proverbial nail for the hammer they have.

This is not a wholly bad phenomenon. In fact, in the kind of resource poor environment my C-programming self was contending with, this kind of shoe-horning was a necessity for being able to accomplish anything of value. And since resources are never completely unlimited, some of this thinking always ends up being essential on a technical project of any scope.

But the world changed as more computing resources became widely available, and richer, more user-friendly GUIs and application came to be the norm. Customers for software products and the software teams that built them started to want a more problem-focused approach to software. Instead of reworking the problem to fit the technical solution, there was more value in squeezing the technical solution into the mould of a more natural, conceptual model of the problem space being addressed.

And this is where Object-Oriented Programming (OOP) came in. The classes and objects that were the ++ to C gave developers a new set of abstractions to capture such a model in the source code. In principle, OOP was supposed to remove the focus from the implementation details of the application and place it on the modelling of the entities and activities associated with what the user wanted the application to do. Again, in principle, the programmer of an OOP language was supposed to cede control over some of the low-level resource allocation issues to the language implementation, and focus on the world according to the user. Some OOP languages did go quite a way down this road.

C++, however, made a different choice. It decided to try to fulfill both paradigms. Want to micro-manage resource usage? No problem: C++ includes C. Want to work at the higher-level abstraction of OOP? No problem: C++ has all the OOP features you want.

In some ways, C++ has been wildly successful in its goals. It succeeds in having all the features of both so that the developer is free to choose his approach. But it is exactly this blend that makes it such a nightmare from a design perspective.

As I was reading Meyers’ book, I was struck by how often he says of some C++ feature “there is a very simple rule for this in C++, with a few specific exceptions”, and the exceptions turn out to be mind melting and highly unintuitive to someone trying to avail himself of the “high-level” approach to their application. The language is designed to “help” you by doing certain things automatically, but it often doesn’t do the thing that seems to be most reasonable, since it doesn’t want to conflict with the freedom of the programmer to operate at a lower-level of control.

I think this illustrates a general principle of design: simple solutions can only exist for focused design philosophies. When you try to be “all things to all people” you necessarily end up with complicated, hard-to-manage solutions.

Having identified the “original sin” of C++, I think we still need to give it its due. Decades later, it is still going strong in application domains such as bleeding-edge games and graphics and in device-embedded controllers – domains where the spirit of scarce resources is still alive and well, and where the need exists for both the large-scale organizing principles of OOP and the “down-to-the-bare metal” optimization of resources.

Given that the pendulum in popular programming languages has swung back to the spirit of BASIC (interpreted languages that are “fun” and where resource allocation is mostly “magic”), I wonder if the end of Moore’s Law spells a return to the spirit of C++. If so, any successor to C++ will have to start where it left off and learn from both the bad and the good of its diabolical genius.