Design of the HNix code base - haskell-nix/hnix Wiki

Original URL: https://github.com/haskell-nix/hnix/wiki/Design-of-the-HNix-code-base

Module connections over the project

Module connections give the idea of the project design & structure:

Simplified idealistic (non-transitive) dependency graph:

Shows the main real semantic dependencies between modules. simple Update: (2021-07-23)

Real graph of module imports:

real Update: (2021-07-23)

HNix & HNix-Store-Core & HNix-Store-Remote nontransitive connections:

all-simple Update: (2021-02-02)

Real imports between HNix & HNix-Store-Core & HNix-Store-Remote:

all-real Update: (2021-02-02)

Welcome

Welcome to the HNix code! You may notice some strange things as you venture into this realm, so this document is meant to prepare you, dear reader, for the secrets of the labyrinth as we've designed them.

The first thing to note is that HNix was primarily designed so that Haskell authors could use it to craft custom tooling around the Nix ecosystem. Thus, it was never fully intended for just the end user. As a result, we use a great deal of abstraction so that these enterprising Haskell authors may add new behavior at various points: within the type hierarchy, the value representation, and the evaluator.

To this end, you'll see a lot of type variables floating around, almost everywhere. These provide many of the "injection points" mentioned above. There is a strict convention followed for the naming of these variables, the lexicon for which is stated here.

Notes on basic Nix types and the recursion schemes implementation

Constants (NAtom)

-- | Atoms are values that evaluate to themselves.
-- In other words - this is a constructors that are literals in Nix.
-- This means that
-- they appear in both the parsed AST (in the form of literals) and
-- the evaluated form as themselves.
-- Once HNix parsed or evaluated into atom - that is a literal
-- further after, for any further evaluation it is in all cases stays
-- constantly itself.
-- "atom", Ancient Greek \( atomos \) - "indivisible" particle,
-- indivisible expression.
data NAtom
  -- | An URI like @https://[email protected]
  = NURI Text
  -- | An integer. The c nix implementation currently only supports
  -- integers that fit in the range of 'Int64'.
  | NInt Integer
  -- | A floating point number
  | NFloat Float
  -- | Booleans. @[email protected] or @[email protected]
  | NBool Bool
  -- | Null values. There's only one of this variant: @[email protected]
  | NNull
  deriving (Eq, Ord, Generic, Typeable, Data, Show, Read, NFData, Hashable)

Nix types and their core type system

HNix uses recursion schemes in the core part of the project to express and administer the recursive nature of the Nix language in the most effective implementation.

Recursion schemes DSL engine is a hassle to implement, but after it is done, the further handling is quite easy.

What is F-algebra

*F type is F/f (Functor) for its particular F-algebra.

Literally, the full description of what is F-algebra and its parts, translated into Haskell:

Screenshot-2021-02-05-19:36:21edit

The recursion-schemes perspective is to view recursive types as fixed points of functors.

(quoting Oleg GrenRus)

The core idea ... is that instead of writing recursive functions on a recursive datatype, we prefer to write non-recursive functions on a related, non-recursive datatype we call the "base functor".

(quoting Gabriel Gonzales)

Actually, even in Haskell recursion is not completely first-class because the compiler does a terrible job of optimizing recursive code. This is why F-algebras and F-coalgebras are pervasive in high-performance Haskell libraries like vector, because they transform recursive code to non-recursive code, and the compiler does an amazing job of optimizing non-recursive code.

2018-04: Johns "Program Reduction: A Win for Recursion Schemes during HNix creation.

So the core of the Nix DSL type system:

Since:

NValueF and NValue' are used only to implement the NValue, and all other project works with NValue exclusively. NValue can be met everywhere in execution code: primOps, builtins, effect, language conversion.

-- | 'NValue' is the most reduced form of a 'NExpr' after the evaluation is
--   completed. 's' is related to the type of errors that might occur during
--   construction or use of a value.
data NValueF p m r
    = NVConstantF NAtom
     -- | A string has a value and a context, which can be used to record what a
     -- string has been build from
    | NVStrF NixString
    | NVPathF FilePath
    | NVListF [r]
    | NVSetF (AttrSet r) (AttrSet SourcePos)
    | NVClosureF (Params ()) (p -> m r)
      -- ^ A function is a closed set of parameters representing the "call
      --   signature", used at application time to check the type of arguments
      --   passed to the function. Since it supports default values which may
      --   depend on other values within the final argument set, this
      --   dependency is represented as a set of pending evaluations. The
      --   arguments are finally normalized into a set which is passed to the
      --   function.
      --
      --   Note that 'm r' is being used here because effectively a function
      --   and its set of default arguments is "never fully evaluated". This
      --   enforces in the type that it must be re-evaluated for each call.
    | NVBuiltinF String (p -> m r)
      -- ^ A builtin function is itself already in normal form. Also, it may
      --   or may not choose to evaluate its argument in the production of a
      --   result.


-- | At the time of constructor, the expected arguments to closures are values
--   that may contain thunks. The type of such thunks are fixed at that time.
newtype NValue' t f m a =
  NValue
    {
    -- | Applying F-algebra functor (@[email protected]) to the F-algebra carrier (forming the \( F(A) \)).
    _nValue :: f (NValueF (NValue t f m) m a)
    }


-- | 'NValue t f m' is
--   a value in head normal form (it means only the tip of it has been
--   evaluated to the normal form, while the rest of it is in lazy
--   not evaluated form (thunk), this known as WHNF).
--
--   An action 'm (NValue t f m)' is a pending evaluation that
--   has yet to be performed.
--
--   An 't' is either:
--     * a pending evaluation.
--     * a value in head normal form.
--
--   The 'Free' structure is used here to represent the possibility that
--   Nix language allows cycles that may appear during normalization.
type NValue t f m = Free (NValue' t f m) t

So it is real-deal elegant F-algebra. It implements Nix language in the most direct way from Haskell, Nix becomes a DSL of Haskell from now on. With Haskell type-checking system. Moreover, F-algebra is very convenient implementation - it is an endlessly conveniently transformable thing through just several basic operations, as would be seen further. The transformable thing that is fully recursive in its nature, but from now on that does not bother us anymore, since it is under control.

Type variables

There are NValue* t f m and such, where:

t is the type of thunks. It turns out that HNix doesn't actually need to know how thunks are represented, at all. It only needs to know that the interface can be honored: pending action that yield values may be turned into thunks, and thunks can later be forced into values.

f is the type of a comonadic and applicative functor that gets injected at every level of a value's recursive structure. In the standard evaluation scheme, this is used to provide "provenance" information to track which expression context a value originated from (e.g., this 10 came from that expression "5 + 5" over in this file, here).

m is the "monad of evaluation", which must support at least the features required by the evaluator. The fact that the user can evaluate in his own base monad makes it possible to create custom builtins that make use of arbitrary effects.

v is the type of values, which is almost always going to be NValue t f m, though it could be NValueNF t f m, the type of normal form values. Very few points in the code are generic over both.

Base funtor data types (NValueF a) also call "unfixed".

Recursion schemes

Morphim that is transformations between 2 F-algebras called recursion scheme. So recursion schemes are just simply different operations on initial F-algebra. Why it is said as just any operations? Because any f a (of kind >1) - is a functor, so any F a b c d.. to infinite complexity - is also a functor. And the only remained case is kind 1 type. And surprise - it is isomorphic to the Id a, and that is a functor, so from 1-type F-algebra also can be constructed: Id a -> a, and as regular F-algebra they are the first-class citizen for recursion schemes, there is no longer any difference for us.

Recursion scheme - transformation takes the initial(term) F-algebra and returns terminal(term) F-algebra. Folds, unfolds, traverses are just F-algebra transformations that are particular recursion scheme type. For folds, unfolds, traverses we need to provide the implementation ("algebra" *term) of how particularly fold, unfold, traverse, and we do for the recursion scheme just the same.

The possible operations: Cheat-sheet. Put in the other words: Table.

For example in parsers - are so called "unfixed" types for/from recursion schemes (common recursive operations), these types are a generalization of the recursive data types into open (non-explicitly recursive, but allowed through polymorphism) form. So Fixed data type versions - are (equivalent to/are) regular data types to work with (aka list), and the *F - are their generalizations (one layer open for possible recursion (through self-application) - but without explicit recursion). *F is a type that both can become recursive structure by applying Fix to it, and as a single layer of the structure - internal value/data can be extracted from it in the initial form. For example - from the list cell the value extracts in the initial form, and it is impossible to extract initial data from the list (recursive structure) without referring to list cells (without referring to building blocks - the recursive structure can transform only into the recursive structure - which never terminates into receiving initial data), so addressing the cell in a list - is a unFix of a list to the outermost cell, and from the cell, the data can be extracted.

Fixed types - are the actual presentation of the data types.

Other notes

Use of Relude

The project currently uses relude, which is a straightforward helper prelude that currently helps to reduce the default Haskell technical debts in the code.

Use of OverloadedStrings

The project enables OverloadedStrings by default.

Use of ApplicativeDo

The project enables ApplicativeDo by default, so maybe think about wrapping the (=<<) & (<=<) into do block, if there may some optimization be possible.

Project uses Abstracting Definitional Interpreters technique

Essentially, ADI allows for evaluation what recursion schemes allow for representation: allows threading layers through existing structure, only in this case through behavior. Roughly: ADI allows 1:1 map evaluation of Nix language to Haskell language evaluation, inherit & passthough the nature of GHC Haskell evaluation into HNix Nix evaluation.

History

Some of the history is at the end of The Haskell Cast: Episode 13 - John Wiegley on Categories and Compilers.