When to Automate - Perfect Abstractions

07 February 2022

There is an idea in software development that suggests, if a particular task can be managed automatically then it should be done so. That way the developer is able to maintain a focus on those areas of the project that create value while leaving the tedious, repeating steps to the computer. Although I’m in favor of this approach, however, there is also a huge caveat watching on the corner - namely, automatizing methodologies one does not have a broad-enough understanding of just yet.

Forming abstractions - “boxing” work that is done

In this field, engineers build abstractions on top of other abstractions. While writing a Python routine, I haven’t ever had to think about the state of currents and voltages in my CPU. Most of the time I don’t even need to think about the operating system I run, or, speaking of Python, lower level details such as memory management. It is abstracted away by the language itself in such a way that I am not expected to deal with pointers in the classic sense, or cleaning up claimed memory after use. I sometimes call this design a (nearly) perfect abstraction.

I often find myself using metaphors while forming abstractions. In the case of a perfect abstraction in my terminology, implementation details are well hidden, even in case of errors. With my own words, such an artifact is ready to be “boxed”, its details can be safely ignored and is ready for other pieces of mechanics to be built on top of it. I suspect that what I call a box in this context is conceptually pretty close to what most developers would call a “layer”, though I prefer to emphasize it being sealed, therefore my word for it.

Well-sealed boxes are scalable

In terms of software I define a well-sealed “box” as an artifact, which:

  • has well defined boundaries,
  • is well-tested,
  • is well-documented,
  • is able handle expected errors.

I define the value of a box as the quotient of how much time and effort it can save, and its cost where cost manifests itself in the difficulty to learn its interface and the amount of maintenance it requires.

Maintenance costs may be saved in two ways: it can either be avoided in the first place by keeping the project’s scope narrow and thus the implementation simple, or, when the former is not possible, there’s still a chance to make it easier by keeping the mechanics of the artifact clean, straight-to-the-point, and just as complex as it needs to be (though not less complex).

Boxes of this kind may appear in various forms:

  • modules, classes and routines,
  • libraries,
  • scripts,
  • package manager artifacts,
  • build pipelines,
  • GUI applications,
  • physical hardware such as the computer itself or a simple toaster,
  • etc.

Torn boxes (frequently in the form of happy-path-only boxes)

I like how a well-chosen metaphor immediately suggests good practices. If the above criteria are accepted for a box, then examples of torn boxes could be:

  • a script that sometimes raise a stack trace or leads the context into undefined state due to poor error handling,
  • a package manager artifact that does not confirm to standards or does not implement expectations in other ways
    • a package manager being only used as a build cache, instead of an actual build cache, just because Linux packages where impulsively chosen for this purpose due to a lack of an inspection of requirements (this one is quite random since I did it recently…).
  • a non-idempotent build pipeline that creates a hidden state in the system as a side-effect, not bothering cleaning it up or at the very least documenting this deficiency,
  • a function that raises an error from the domain of the implementation rather than their own domain (see an example below of the latter below).

Example of a well-sealed box in form a function.

Take a look at a simple function from the API of a fictional package manager:

/// @brief Installs the package called \p name. Does nothing if it is installed already.
/// @throws PackageDoesNotExistException, PackageAlreadyInstalledException
void ensure_package_installed(const char * name);

I call this well sealed, because:

  • it is known exactly what this does since its boundaries are well defined:
    • the exact behaviour is documented and can be easily guessed of its name,
    • it only throws exceptions from the same domain,
  • and let’s suppose it is well tested for all relevant edge cases.

Mechanisms of these kind are not expected to bombard the consumer developer with low level details.

Example of a poorly sealed function

A counter example would be this same function if it would lack information on its behaviour or would throw an SQLite query error in case of a non-existent package.

A developer unfamiliar with its nature may have to dig into the implementation to find why the problem occurred. Not even a note in the documentation on this behavior can mitigate the problem entirely, since it still has to be looked up in order to determine the problem from the SQLite exception - although it certainly can help.

A note of such is what I call a “first-aid help text”. It provides a quick and pretty bearable workaround for the problem, but does not entirely solve it.

On “first-aid help texts”

The artifact’s behavior of this kind forces the developer to look under the hood when an error occurs and learn about the implementation details of the system, destroying the value of the automation which is to keep the developer with their work on track. Now, however, that there is automation, there is also a non-negligible chance that a note like that one may be the only relevant and up-to-date information on the quirks of the mechanism. In that very case, I think having a first-aid documentation only is far better than having an artifact that only works in one specific set of unknown circumstances (including additional layers of “magic” generated by poorly done attempts for fixing it).

Maybe it is not that bad to have a script that fails under some well known circumstances, however, it is a problem if that script is the source of truth itself on the intention or on the requirements - which happened to be the case because at the time of fixing a quick and poor programmatic solution was chosen instead of the ugly, cheap but much less risky “fist-aid note”.

How to avoid creating torn boxes?

I can only talk from my experience and I tend to create them when I rush into automatizing something that I don’t understand well enough. Ironically, by creating a box with the intent of maintaining architectural scalability I actually set an obstacle in front of it by sealing incomplete design and thus creating a shaky foundation to build upon.

In my case, I think the solution is to be ready to postpone sealing. Actions I can take (or avoid taking) in my currently relevant projects:

  • If a dependent sub-component of a project changes frequently, I just shouldn’t create a Conan or Debian package artifact of it. If I don’t rely on that package during development then it serves no purpose, and if I do, additional complexity may arise very easily that instantly generates frustration. It is probably a better idea to symlink a folder and build the dependency with the main project. That also instantly eliminates the possible problem of binary incompatibilities or package version mismatches, since there is only one version - the one that is compiled upon triggering the build.
  • It is okay if I use e.g. CMake as long as I understand every line of the CMakeLists.txt that I wrote. However, it is very bad if I force automation without proper understanding the problem and my tools, achieving the wanted result somehow, anyhow. The existence of such a solution maintains the illusion of saving time in the long-term. Hacks as such are usually very hard to understand. This is not a problem if I never have to work with that particular corner of the project again, but solutions like these tend to break frequently. Breakage forces me to open them up and some seconds later create an urge to have a word or two with my past self.
  • Generally, I should avoid solutions that generate frustration upon revisiting. Even if a line (or fifteen, more probably) solves a problem right now, it will psychologically hinder me to open up that particular project again. That is especially dangerous for hobby projects, where the risk of abandonment is high.
  • Counter-intuitively, spending hours just doing research, planning and learning seems to be a waste of time since no line of code is produced. That is actually far from the truth and I know that. It different however to know and the have an intuition about it (to “feel” it). Therefore I should resist hurrying starting to code as long as I’m not confident enough in what I am about to do and keep looking (and if I have a chance, asking even more questions).

By leaving the boxes open, I keep myself the chance to make them more mature. As a new rule of thumb, as long as they don’t hinder me more than avoiding sealing them, I should leave them ready for modification. It is less of a problem to have some lower level instructions in my codebase here and there than to have some seemingly in the same domain but with the wrong abstraction.

Thanks for reading!

I’m a huge fan of programming conferences for seeking inspiration. I’m sharing you two talks that helped me forming the conclusions above. I did not exactly repeat their thoughts here, but they helped to get some of the take-aways.

A Philosophy of Software Design by John Ousterhout

“Good Enough” Architecture by Stefan Tilkov