Family spaghetti of programming languages

Created: 2019-02-08
Updated: 2019-02-10

This is my attempt at making a family tree of programming languages. (It's technically a directed acyclic graph.) Notably absent are certain multiparadigm languages, such as Java.

Here's the source code. I'm sure it's full of mistakes, so please just send fixes, patches and errata to % base64 -d <<< 'bWFpbHRvOm1lQGVya2luLnBhcnR5' and I'll update this post. Also you'll notice that the splines take absurdly roundabout ways in the graph. That's Graphviz doing its best at making sense of this humongous mess. If you know of a better way to do it, let me know.

Errata: I'd like to thank

Thra11 from Lobste.rs for pointing out that D descends from C++, not C.
Marc Paquette for pointing out that I should include ABC as a Python influence and letting me use his research on shell genealogy. (See below.)
Jonatan Andersson for pointing out that Newsqueak is not a Smalltalk language.
Rainer Joswig for pointing out that Logo is older than Scheme, correcting me about XLISP versions and sending me a helpful graph on the history of Lisp dialects. (See below.)

Background

There are a lot of programming languages. A lot. Rosetta Code lists 714 of them, as of the date of this post. Just like human languages, it's really difficult to map them on family trees, because they continuously evolve and have numerous dialects of varying mutual intelligibility, informal dialects, different incompatible versions and divergent implementations.

However, unlike natural languages, programming languages have a significantly more problematic property that makes them nigh impossible to cleanly reduce into pretty graphs: They're arbitrarily constructed. A natural language simply descends from a previous language, and languages form trees. Of course, interlanguage influence is widespread (and let's not forget that there are pidgins, creoles and, of course, constructed languages). Even then, it's still perfectly possible (if not feasible) to trace each language's ancestor back.

The reason tracing genealogical roots is infeasible in software is simply because we "unorganically" take various influences from here and there and sculpt a whole new project. This gets especially more problematic with newer multi-paradigm languages that borrow something from everyone! Whilst borrowing a few words from another language doesn't align a natural language's lineage, borrowing a considerable amount of semantics or syntax from another language completely alters a programming language's lineage.

For instance, ask yourself: Where does Python come from? Graphs I found online gave me various ridiculous answers, such as C, Perl and even Java. Of course, you say to yourself, the answer is ABC, right? CLU? Not quite. How about Rust? Where does it come from? The answer is, unfortunately, a bit of everything.

Last year, August, I was chatting with my friend Daniil (who's also proofreading this blogpost right now) on IRC and he showed me the blogpost he was working on, an Introduction to OCaml. He had made a very simple family tree of the three main functional programming language families from which all else descends. (I exaggerate, but you get my point.)

A tree depicting three groups — Dan's functional programming family tree

I thought that was fascinating. I tried to find similar graphs but, much to my chagrin, they were all either way too reductionistic, or, well, seemed to have strange preconceptions about what defines a language (see notes below). Hard to blame them, really.

Instead of taking such an approach, I tried to go by what the languages' respective users (or enthusiasts, in case of historic languages) go by and set out to make my very own incomplete and sloppy graph.

Unfortunately, I gave up halfway through and mothballed it in a single Graphviz dot file and completely forgot about it by October.

Just today, I rediscovered it in the random junk I had stashed in my gopherspace. I brought it up in #lobsters and talked about how I made it. Much to my surprise, people encouraged me to make a blogpost out of it. So, there you go.

Methodology

You can probably guess that I'm not well-versed in a good portion of these languages, nor their historical backgrounds. I picked the most straightforward method and simply asked people who are and consulted the mighty information superhighway when that failed. Wikipedia has "influences" and "influenced" sections in infoboxes of programming language articles, just like those of artists and philosophers. Unfortunately, it's rather lax about what can be considered influences, so I had to narrow it down a bit with independent research. Languages that openly describe themselves as being based on a single language are the best in this aspect.

I considered design choices as of the language's birth more important than later additions, even if they were slowly eclipsed by foreign influences in later versions. In the end, I ended up including many insignificant languages and excluding equally many significant languages.

A few notes

Lisps

Is Lisp a single language? Are the fragmented Lisp "dialects" from Lisp Machine era separate languages? Are CLOS and Loops languages at all, as some graphs insinuate? Is Racket a Scheme implementation or a Scheme-based language on its own? (Of course, there are no single answers to these questions. After all, a language is a dialect with an army and fleet!)

Honestly, Lisp was especially problematic, as most recent Lisps can be said to descend from Common Lisp and Scheme at the same time, which tells you nothing at all. For instance, PicoLisp and Carp are pigeonholed into descending from Common Lisp, but they're actually completely independent projects. Edit: I ended up attaching them to the now-generic "Lisp" node.

Rainer Joswig sent me this immensely helpful graph on Lisp genealogy, which I might use to rewrite the this branch later on.

Basics

This was the simplest one and it contains the least significant dialects of any family. Basic vendors were very straightforward about taking a dialect and expanding on it, fitting it to their own equipment and shipping it as is.

Shells

This was the most painful one. There's a myriad of different Unix shell implementations out there with completely varying degrees of compatibility and it's nigh-impossible to map them into a neat tree. So I ended up pruning away 80% of it in the end.

Later, I used portions of Marc Paquette's shell ancestry project to rewrite this portion, with various modifications. Thanks Marc!

Both images and the source file are licensed under CC-BY-SA 4.0.