expressivity - LeFreq/Singularity GitHub Wiki

My take on quantifying language expressivity ("Ending the Language Wars") is derived from work by Andre Kolmogorov who invented the field of Algorithmic Information Theory. The idea is that you can quantify the complexity of something by measuring how BIG a computer program you need to reproduce the complexity of that thing. In this case, we'll be doing the inverse: measuring the simplicity. How small can your language make your code?

Proper expressivity means more than a host of sophisticated constructs for common operations, but also a reduction of ambiguity in what is happening, so that a glance can lead the reader to understand what the programmer was saying.

XXXyet to be integrated: We'll define this idea of "simplification" as a factor of text-wise reduction (fewer characters needed to express complex concepts (a sort of reversal of Algorithmic Information Theory) or in literature, the equivalent is the use of sophisticated words that say a lot in a few characters without increasing ambiguity of meaning) and another, less easy-to-quantify concept of maintainability. Text that doesn't get turned into code (comments, docstrings) doesn't count. Fleshing out this latter concept, it is clear it has to do with how easily one can establish programmer consensus for the given task; i.e. how many programmers of the language would put it back the same way you've expressed it or otherwise agree on the best implementation of a given problem.

I will define the Kolmogorov Quotient so that higher numbers for a given language denote a reduction in the complexity (i.e. greater amount of succinctness) of solving the problem in the given language.

Once the basic premise and a methodology above is agreed to, it is only a matter of a rough constant of difference for any specific implementation/architecture. That is, as long as the architecture is the same across all measurements, the number should be valid and comparable between languages.

But it could be implemented something like so: Implement a language that maps machine op-codes to mneumonics (like Assembly) and measure the amount of bytes of machine code to perform a standard suite(*) of common, non-threaded programming tasks (call it: machine_language_count). Then code that exact functionality in the language you are wanting to measure (without using external libraries) and count the number of bytes of source code you used (call it: test_language_count).

The expressivity of your language, then is machine_language_count / test_language_count. The elegance of your program is 2/#_of_equivalent_programs * this number. By "equal programs", it is meant the number of programs fluent programmers of the language agree that are the good and equivalent solutions to the problem. Whether the output should be the same for every input is untested (there may be good solutions that don't include every extreme case). Languages which have a large number of equal programs say something interesting about either the programming language or the types of programmers the language attracts.

Expressivity should always be greater than 1.0, otherwise there was no purpose to your language: machine language implemented it more succinctly. As elegance gets closer to 1.0, the closer code is to its maximum elegance.

One might consider replacing keywords and function names with a minimal symbol/identifier as substitute for the purpose of measuring and not getting caught up in particular naming choices. However, one eventually sees that naming and keywords are a significant source of expression, and therefore expressivity, and that the only other item really being measured is a language's architectural tools, provided over assembly, like parametrized functions, objects, containers, type safety vs. runtime choices. The answer to the former is to fine-tune your spoken language into shorter forms, not sacrifice readability.

  • also Credit to Chaitin, Gregory

Out of date: (*) "standard suite of common programming tasks...": I see two main categories:
  • Data-processing suite, limited to simple text I/O (computation towards the machine)
  • GUI suite (computation towards the user)
The purpose of this idea is to end the tiring language wars about whose language is the best. By giving a quantitative metric, people can at least argue better.

Once you've made an expressive or higher-level language, you can move onto elegance.

⚠️ **GitHub.com Fallback** ⚠️