version = f(changelog)

The changelog is cast, let the versions fall where they may.

Nov 28, 2023

It's interesting that there are almost no software projects that version themselves with just 1, 2, 3, 4, 5, 6, etc. Even if you think semver is for space shuttles and not todo apps, you're still not a monster, you wouldn't just call your next version "4". You'll look up the last version you published, guess at what the schema was, and then make number go up more or less depending on how much progress you made since then.

Let's imagine that the progress was enumerated in a hypothetical "changelog", ignoring the detail of whether it's written down or not, and see where it takes us:

changelog: a list of changes since the last published version
new version: project these changes, and the absolute level of project quality, onto the three-dimensional vector space of non-negative integers “x.y.z”, path dependent on the previously published three-vector

That sounds silly, but it's not. There are real cultural norms around what level of quality is required to call something “1.0”, what kind of a change is big enough to count as “2.0”, and how silly it would be to publish a “52.0”. You could definitely train a neural net to approximate “version ≈ f (git)”, and the results would feel about right, but I bet the correlations would be surpising. “0.x” probably has less to do with the quality of the code than the quality of the coder’s childhood.

Risk, time, and effort

I have a bicycle. If I gave this bike to a semver-compliant mechanic, they would say one of three things to me when their work was complete:

If you don't adjust the way you ride, you will crash.
You can ride it the same way, but I added some extra gears you can use if you want.
One of the bolts was loose, so I tightened it.

Obviously, (1) is a more important message than (2) or (3). A more subtle distinction is that (2) has more risk than (3). Even the best mechanics aren't perfect, and it's more likely that they made a mistake installing a whole new system than if they only tightened a loose bolt.

From opinion to fact

For millennia, people described temperature as: freezing, cold, warm, hot, boiling, melting. Even today, that's the scale that I use. Does my kid need a jacket if it's 65°F? I have no idea. But if it's cold? Yes, they need a jacket if it's cold.

It’s a funny thing, because we really couldn’t do better than this for a very, very long time, until 1701 when Newton’s anti-fraud work at the Royal Mint led him to invent an apparatus and absolute scale1 for measuring the “degrees of heat” at which various metals melted.

Project quality is a bit like temperature. Cold : bad :: warm : good, etc. And numbers are better than words, so programmers have defined 0.x to mean bad, and 1.x to mean good. Like everything made by programmers, it goes from 0 to 1 with no meaningful gradations in between. And like every absolute estimate made by programmers, it is useless.

From subjective absolute to objective relative

Temperature has two great properties - it is objective (everyone gets the same answer) and it is absolute (you can just say “50F”, you aren’t forced to say “30F hotter than the thing next to it”).

Many programmers are tempted to make their version numbers absolute by defining the version as either 0.x (bad quality / unstable) or 1.x (good quality / stable). But in so doing, they make their version number subjective.

We have had objective units of length for many millennia, and so we have pyramids and temples which are thousands of years old. But we didn’t have an objective measure of temperature until 1700, and it just so happens that we started harnessing thermal power on an exponential singularity curve almost immediately after that.

So if you have to choose to build your house on a subjective or an objective foundation, I would advise you choose the latter. Even if it means trading absolute for relative.

Because we have right now, today, an objective and standardized scale which describes how a dynamic system intended for usage by humans has changed. This system allows a human to take any two versions from any point in the history of the system, and trivially calculate, in their head

whether integration work will be required or optional
whether new capabilities will be available
relative integration risk

And you slump sideways in your ergonomic chair, not using semver, but nevertheless putting massive and unsure effort in an attempt to project a subjective measure of absolute quality into a number? When you could instead trivially assign that number using an objective measure of relative quality? Only programmers and LLMs waste time putting numbers on subjective assessments, but at least LLMs are improving over time.

Version can be a pure function of the changelog

Far and wide, projects have versioning systems that are bad. And yet, a lot of software is good. So apparently I’ve wasted your time, and versions don’t matter that much. But changelogs absolutely do. And there are a lot of projects in MavenCentral and NPM which have no changelogs, but every single one of them has versions.

It's the opposite of what you want! The meaningful thing? Optional. The ambiguous, often meaningless thing? REQUIRED. If there's only enough energy to do one thing, spend it all on the changelog! As Julius Caesar said at the RubyConf in 49BC: The changelog is cast, let the versions fall where they may.

If all you have are version numbers, all you can guess at is the psychology of the authors. But if you have a changelog, then you can calculate Good Version Numbers automatically:

There is some disagreement over how to name the digits in x.y.z. I'm going to use breaking.added.fixed. If you're using the standard keepachangelog format, it is a trivial computation to turn a changelog into a version bump.

## [Unreleased]
### Added
- `foo()` can now accept `bar` as an input
- **BREAKING** you now have to call `fooInit()` before any call to `foo()`

## [3.1.4] - 2020-01-02
...

When it comes time to cut a release, just follow this algorithm:

Find \n## [Unreleased]
Scan from there until you find \n## [x.y.z] - yyyy-mm-dd, which will be the last published version.
Within that string that you just extracted
- Can you find **BREAKING**? If so, bump breaking, which functions as a concise compatibility guarantee.
- Else, can you find \n### Added? If so, bump added, which functions as a new feature advertisement.
- Else, bump fixed, which functions as a signal of low integration risk.

If you happen to be using the JVM, Spotless Changelog has implemented this dead simple logic as a library and also as a gradle plugin. CONTENT MARKETING BABY, THATS HOW YOU DO IT. THATS HOW YOU GET THE STARS.

Pre-1.0, 0.x, and other forms of performative insecurity

As we just showed, one of the things your version can easily be is (concise compatibility guarantee).(new feature advertisement).(lowest downside risk to upgrade), but that is an unpopular way for authors to pick their version. The most popular way to use the version string is to communicate that the author holds their code (or at least its public API) in low regard. Across all of NPM in 2014:

82% of packages are maintained by impostor syndromes publishing as 0.x
14% of packages are maintained by Dunning-Krugers who publish as 1.x
3% of packages are maintained by engineers, with a wide-spectrum of self-confidence, who nonetheless turn the crank and publish (concise compatibility guarantee).(new feature advertisement).(lowest downside risk to upgrade)

If a library has a version, and no one depends on it, does it even have a version? Who cares! But once someone has decided to use your library as a dependency, who cares how good you think it is. Your user(s?) thinks that it's good! Or at least good enough. The terrible thing about “0.x” is that the more unstable a codebase is, the more valuable (concise compatibility guarantee).(new feature advertisement).(lowest downside risk to upgrade) would be!

But habits are what they are, and you're going to keep publishing things with 0.x. I will judge you for that, but Spotless Changelog won't. It will just increment the “added” version (“0.1.0”, “0.2.0”, “0.3.0”, etc) whether your changelog has **BREAKING** or just ### Added. In terms of 3D vector space, this is exactly analagous to smashing an R.G.B image into 0.R+G.B.

But in terms of information content, it's far worse. The 0.R+G.B image transformation preserves intensity information, which is by far the most important signal in vision. But the 0.breaking+added.fixed loses compatibility information, which is by far the most important signal in a version string. For an information content analogue, we have to remove intensity information from the image2.

One of the few good usecases for 0.x is to make sure that the publish process is actually working. But once you've got that working, it's time for 1.x. If nobody else uses it, then it doesn't matter what you picked. And if they do, they're better off if you're giving them semver. This is how solemn the 0.x to 1.x should feel:

In summary

Changelogs are more useful than versions.
It is easy to remove an unnecessary degree of freedom by setting version = f(changelog).
This constraint lets you ignore versions, which don't matter very much.
This constraint will force your versions to have more information in them.
This constraint will better conceal your embarrassing insecurity.
Discussion on reddit.
Want more like this, hot off the press?

I think his temperature scale gets undersold. The straightforward part is that he used a linseed oil thermometer (that idea had been around for a while), and defined freezing water as 0, and body temperature as 1 (his innovation to make it repeatable for other people). But the tricky part is that he wanted to measure melting metals, but glass and oil thermometers will melt and boil away at those super high temperatures.

The genius part is that he took a giant thick piece of iron, heated it until red hot. Then he’d place the molten sample of the metal in question, and pour it on the red hot iron. Take that whole contraption, set it out in a windy spot, and time how long it takes for the molten sample to solidify. He had to invent a whole other law of physics, Newton’s law of cooling, and solve it to turn a cooling time into a temperature.

So he’s got a linseed oil thermometer, and an absolute scale that anyone can calibrate. And he’s got a totally different molten thermometer, but no way to calibrate the molten thermometer to the ice and body temperature scale. But by good luck, tin melts at 450F, and linseed oil doesn’t boil until 650F. So you’ve got a pretty big stretch where you can mix various alloys of tin, which each melt at a different temperature, and use those to calibrate the linseed ice and body temp scale all the way into solifying times of molten metal cooling in the wind. Dude was nuts.

Technically, this is the "(rg)b chromaticity space", because we've made R and G indistinguishable from each other by mashing them together. (code)

The Context Window

Discussion about this post