Chai for Zero Or More

Qi Meeting October 10 2025

Adjacent meetings: Previous | Up | Next

Summary

Sam presented Chai, a proof-of-concept compiler for the Qi core language that performs constraint unification to infer input and output arities for flows as finely as possible, generating fixed-arity lambdas where possible instead of variadic ones. We also caught up on the latest rabbit holes and other diversions.

Sam here: I've dropped some clarifications on a few aspects that I did not communicate well during the meeting. Hopefully they add clarity and not confusion.

Background

Sam recently became possessed with an idea for optimizing Qi flows to take advantage of arity information while at the same time keeping the flow runtime abstracted. He woke up from a nap one day and had it all figured out!

Chai

Approach

Chai's essential approach to compile flows is to:

Label the various input and output arities in the input flow syntax order to be able to treat these arities as logical variables.
Solve constraints on these variables to narrow down the arities as finely as possible.

At this step, separate assertions about the arity resolve to the most specific arity. For example, (> 5) and 6 resolves to 6.

Compile the resulting enriched representation to "connections" which are effectively graph nodes with in-edges and out-edges.
Generate (codegen) a single lambda from each connection, which is as optimized (specific) as possible from an arity standpoint.

Sam here: each connect or other flow at the codegen stage generally compiles to a let-values form when the exact arity is known. Otherwise a call-with-values is generated to handle varying output arity, or apply is generated to handle varying input arity.

This overall sequence of operations can be roughly seen in the high level implementation in the main compile function:

(define (compile floe)
  (define floe^ #R(add-lvars #R floe))
  (define cst* #R(generate-constraints (set) floe^))
  (define subst #R(solve-constraints cst*))
  (define floe-arity #R(annotate-arity subst floe^))
  (define floe-arity^ #R(add-vars floe-arity))
  (define conne #R (extract-connections null floe-arity^))
  (define conne^ #R (conne:optimize conne))
  (conne:codegen (floe-in floe-arity^)
                 (floe-out floe-arity^)
                 conne^))

It can be even more clearly seen as:

(require qi)
(define (compile floe)
  (~> (floe)
      add-lvars
      (ε (as floe^))
      (generate-constraints (set) _)
      solve-constraints
      (annotate-arity floe^)
      add-vars
      (ε (-< (~> floe-in (as in))
             (~> floe-out (as out))))
      (extract-connections null _)
      conne:optimize
      (conne:codegen in out _)))

(FTFY 😜)

Implementation

Sam pointed out some interesting features of the implementation.

Constraint Solver

First, he chose to implement a self-contained constraint solver because he realized that a full arithmetic solver wasn't needed, and so something like Minikanren, too, was more sophisticated than needed here.

Sam here: in some ways a minikanren style solver was overkill. The solver implementation in Chai works towards a single answer without backtracking. Also the Chai solver encodes an idea of refining partial answers (varying arities) to complete answers (exact arities.)

The constraints are recursively evaluated until they reach a fixed point, e.g., an arity value like "0 or more."

Sam here: the solver processes each constraint in a set. Each constraint is removed from the set, updates the answer and returns more constraints. If no more progress is made on solving the set of constraints the solver stops.

At this, Eutro recited the obligatory version of Greenspun's Tenth Rule:

Any sufficiently complicated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of Prolog.

We all nodded our heads sagely in agreement and Sam pled guilty.

Form-specific Arity Inference

Sam then proceeded to explain more about the implementation, including that Chai independently handles each core Qi form (currently implemented for a subset of Core Qi), inferring arities based on knowledge of the specific semantics of the form.

For select, for instance (whose syntax is (select n:number ...)), the inference is done by finding the largest number indicated (say N), which then tells us that the input arity must be at least N. The output arity is determined simply as the number of values selected.

Likewise, for block, the largest number N gives us similar information about the input arity, and additionally, tells us that the output arity is at least N minus the number of blocked values, without determining it exactly.

Forms like -< and == are more complex, where it's necessary to define a concept of an "arity sum" over the component flows. This concept isn't a simple number as it must account for the logical ambiguity in the knowledge about the component arities, e.g., "arity exactly 1" vs "arity somewhere between 2 and 5" and "arity 0 or more," and so on.

ground (⏚) is special. Currently it's handled by simply removing it from the produced syntax, but that doesn't appear to be the right way to do it, and it might require developing a different approach.

Sam here: ground works fine in Qi with function call and return semantics. If there is a flavor of Qi that works by sending/receiving values on channels then the current strategy would be incorrect and cause the flows to block.

as is a tricky one as it compiles to a connection, just like the other forms. But the other forms are trivially representable as graph nodes with input and output arities, whereas as doesn't suggest an obvious representation. Recently, in the context of formalizing Qi semantics, we talked about formalizing bindings as extra, invisible values in the flow. That appears to be the approach Sam has also converged on. He said that as does a "fake out," ostensibly producing no output, but from a connection standpoint it can be seen to have as many outputs as inputs, but with "different names."

Nicer Error Messages?

When there is an error in resolving constraints, that typically indicates a compile-time error, for instance, if the input arity of a (select 5) happens to have already been inferred as (< 3) based on the output arity of the preceding form. We agreed that it would be valuable to raise this error immediately rather than let it become a runtime error, but at the moment, the error says something like "unification failed" or "constraint not satisfied." It would be better to provide a proper Qi syntax error in terms of the surface syntax used, e.g., "select: Arity error: Expects 0-3 values."

This always ends up being a rabbit hole!

Eutro also lamented that a lot of arity information is only available at runtime, and it would be nice if we could time travel to inform the compiler of this wealth of arity information suddenly available at runtime. But we don't know whether "staging" or JIT approaches that interleave compilation and execution, in some form, could be possible for DSLs hosted on Racket and Syntax Spec.

Insights

In solving and compiling different flows to the common "connection" core IR form, Sam noticed some interesting properties of Qi that weren't obvious on the surface, for instance, identities between different flows, like fanout being equivalent to some specific arrangement of a tee junction.

Sam here: (-< _ _ _) is equivalent to (fanout 3)

It's possible that such identities could inform Qi normal form or further reduction of the Qi core language.

Incorporating Into Qi?

One concern that Sam had with integrating the implementation into the standard Qi compiler was around how it would connect with deforestation. Currently, deforestation follows a special pattern circumventing ordinary code generation and producing escaped continuation-passing lambdas.

We agreed that we could for the moment simply ignore the interaction between arity inference and deforestation. Hypothetically, it could be valuable for deforestation to employ this information to compile to, say, unary vs variadic streams. Currently, we only support unary streams, while the latter could be beneficial for compiling core forms like >< which are currently very slow. But for now, we should be able to handle these safely as independent and noninteracting compilation pathways.

Towards this goal, we felt that it could make sense to introduce this "arity" pass after deforestation, as the last step before code generation.

Difficulties

Dominik observed that in its current form and under this proposal, the arity inference happens too late in compilation to be useful for the rest of the compilation process. Additionally, the "connection" abstraction currently forms the basis for an entirely orthogonal code generation process, operating on the entire Qi core language. Based on these aspects, it may take significant refactoring in order to be usefully integrated into the Qi compiler.

Discussing further, we identified 3 main components to the implementation which could profitably be decomposed and incorporated into the compilation pipeline at different stages. These components were:

Arity inference
Graph representation of flows through compilation to "connections"
Code generation to optimized lambdas

We discussed these in turn.

Decomposing Components: Arity Inference

For the first (arity inference), the options were:

A. Do it in the penultimate compiler pass (not useful to the rest of the compiler).

B. Do it at an early stage and enrich the IR (either literally or via syntax properties).

C. Implement it as a helper module that can be queried on-demand in each compiler pass that is interested in arity information.

An Early Arity Pass?

We had already discounted (A). For (B), we considered that arity information could expand the set of equivalences that could be found in normalization, and so, it could be appropriate to annotate the syntax with this information as early as during expansion or as part of normalization.

For an example of "expanding the set of equivalences," in (~> (-< add2 +) ...), naively, we cannot simplify this. But with arity inference, we would know that + here is equivalent to add2, and we could rewrite this as (~> (-< add2 add2) ...) and then that could benefit from a hypothetical normalization rule that would rewrite (~> (-< f f) ...) to (~> f (-< _ _) ...), in the absence of effects.

Having an arity-enriched IR at this early stage would enable effectively all compiler passes to have this arity information available.

An On-Demand Arity Inference Module?

Option (C) sidesteps the question of when to do the inference by enabling any pass to request this information.

A drawback of (C) is that it duplicates work, as the arity inference may be done multiple times over the course of compilation, instead of just once.

On the other hand, an advantage of (C) is that it avoids introducing nonlocal considerations into the design of each compiler pass. That is, if arity inference is done in one pass and expected to be available in downstream passes, then the design of a pass must consider where in the sequence of compilation it occurs, to know whether this information would be available or not. Instead, with (C), developers could explicitly request the information when needed, and the pass would "commute" with other passes and be easily reorderable if necessary.

Syntax Property vs IR

We also considered that in either (B) or (C), we could end up complicating the IR for downstream passes, making this downstream IR distinct from that used in upstream passes. Sam pointed out that syntax properties can solve this issue by leaving the literal syntax unmodified, thus not affecting pattern matching in any compiler pass, while at the same time encoding the desired information in the syntax.

Arity API

Additionally, Sam pointed out that the actual representation of the inferred arity data could remain abstracted behind APIs.

Two Aspects of Arity Optimization

Also in connection with syntax properties, we discussed that flows defined using define-flow are independently compiled by the Qi compiler. Through this process of compilation, we would have inferred arity information about the flow that we could make available to other flows that refer to this one. This is similar to the problem we addressed with inlining recently, where we sought to make the uncompiled syntax of one flow available to another for inlining purposes.

This problem could be uniformly handled using syntax properties --- that is, the arity metadata could be attached to this binding as a syntax property using the same syntax as what's used by the proposed arity optimization pass.

But we realized that this is actually an instance of a different arity problem we have discussed in the past --- that of informing the compiler about the arities of function identifiers in general, not just Qi definitions in particular. For that problem, the proposed solution was to maintain a compile-time table, which would be initialized to contain arities of built-in Racket functions. We realized that define-flow would now be able to simply add an entry for the defined flow to this general arity table, without requiring a Qi-specific solution here.

Information Loss?

There are two special cases to consider that we lightly broached in the meeting:

An earlier pass infers the arity of a syntax object for which subsequent transformations preclude inference of the arity at a later stage (e.g., a flow definition for which arity information is available in the table is inlined).
An earlier pass infers arity of a syntax object that is subsequently obscured through transformations (e.g., the syntax property is unwittingly dropped).

Always requiring passes to request the analysis afresh (i.e., option (C)) would address (2), but it would not address (1).

However, in this case, the arity information derives not from analysis but from the table. As we are obscuring the link to the table by an inlining (or whatever) transformation, it becomes the responsibility of the code doing this to facilitate preservation of that information in some other way, if we desire to do that. For this purpose, we could employ a syntax property, but this does not require there being a preceding arity compiler pass.

So the question remained, should arity inference be done as an early pass or as a helper module?

Analysis vs Transformation

Eutro proposed conceptually distinguishing a compiler transformation from an analysis informing a transformation. She felt that it makes sense for compiler passes to request analyses, and that it isn't a good idea for the analysis to become part of the IR --- only the resulting transformation.

When there is possibility of information loss (as discussed above), we could additionally consider that if a transformation ever destroys information in the IR (or which is derivable from the IR, e.g., by looking up an identifier in a table) that may be necessary to an analysis that may be desirable in a later pass, the transforming agency is responsible for preserving the information in an appropriate way, such as in a lookup table or in a syntax property.

Based on this distinction, we came to solution reconciling all the points that had been brought up:

A Reconciled Solution

Implement arity inference as a helper module providing an analysis in the form of enriched syntax.
The enriched syntax could encode the arity information as syntax properties, thus keeping the literal syntax unmodified and avoiding the need for custom pattern matching in the requesting pass.
Passes rewriting function identifiers would be responsible for attaching any associated arity metadata (from the arity table) as syntax properties in the same format as what the arity analysis produces.
The actual structure of the annotations could remain abstracted behind an arity analysis API that would allow compiler passes to ask any questions we can think of that would be useful, with the actual implementation abstracted behind the API.

Decomposing Components: Connections

"Connections" are currently a core form of the specialized IR Sam invented for Chai. We discussed that we could simply promote this to be a Qi core form, and then the arity-based optimization pass (occurring right before code generation, i.e., before Qi0->Racket) could produce this form as output for any optimizations it does.

We also felt that compiling flows to connections could provide a basis for graph visualizations, which Dominik has been keen on exploring for a long time. For this to faithfully represent an entire flow, however, we would likely need a version of arity compilation that compiles all flows to connections, not just the ones that it is able to make useful inferences on and optimize. This may involve treating currently unmodeled core forms (like feedback) in a generic way, and just wrapping them with the equivalent of (connect f 0-or-more) (which, in our actual arity pass implementation, if we ever encountered this, we would likely "unwrap" to the underlying flow so that it can go through the regular code generation pathway). So this compilation to connect, too, could be abstracted as a helper module (consulted in the arity pass), so that we could request on-demand compilation to connect forms for the purpose of building visualizations, etc., separately from mainline compilation.

Decomposing Components: Code Generation

That left code generation of "connections" to Racket. With connect being promoted as a new Qi core form, this seemed straightforward, as we could simply introduce a pattern matching rule in the qi0->racket code generation macro which could perhaps even use Chai's existing code generation implementation directly.

Any syntax that isn't optimized by the arity pass would simply be codegen'd as before. Syntax that is optimized as a connection would be compiled by this new rule.

We didn't discuss whether this approach would work when bindings are involved, as bindings are processed in a "process bindings" pass subsequent to code generation. If this is a problem, we could potentially, in the arity pass, abstain from optimizing the code if bindings are present.

Plan for Chai

To summarize and synthesize the provisional plan for integration that we came to (leaving details in the foregoing discussion):

Implement an arity analysis module and a connect compilation module.
Add an arity optimization pass prior to code generation, compiling core Qi syntax to connect where inference produces useful arity constraints.
Add a connect core form, including notating the syntax in the Syntax Spec expander and adding the codegen clause for it in qi0->racket.

Diversions

Emacs

Ever since Straight.el came out, the Emacs ecosystem has felt to be in the middle of a transition that has been years, even decades in the making. Since then, it has become apparent that package management is most properly thought of as having two distinct aspects: installation (including dependency resolution) on the one hand, and configuration (e.g., load and evaluation order) on the other. Installation tools like Straight.el have introduced superficial integrations with configuration tools like Use Package, for the sake of convenience. Yet, such superficial integrations constitute crude coupling between these two separate concerns that is at times awkward. For example, shared keywords like :after apply only to configuring packages and not to installing them --- thus, Straight+Use Package is unable to resolve forward references to dependencies.

Eutro has been working on a more holistic, "System D"-like init system for Emacs this summer, called xup. It's intended to address some of these limitations by making the conditions on configuration and installation explicit while managing them independently. For instance, we might want a particular "unit" to be enabled only on an Android device, but this unit may share packages with other units intended for activation on other platforms, and there may even be units that may share common packages (e.g., "Racket" unit and "Clojure" unit) that may both be active at the same time. In order to efficiently and correctly resolve these dependencies (e.g., not installing redundant packages, not attempting to install the same package twice, etc.), Eutro plans on expanding xup specifications to an intermediate representation that can be analyzed and reconciled prior to generating ELisp. She's now trying to get xup to an MVP state where she can write her own Emacs config in it.

Along the same (and complementary) lines, Sid has recently been working on a tool, called elacarte, for managing Emacs package recipes. He believes that, in a Straight.el world, these recipes are now the true primitive of package distribution in the Emacs community, rather than the tarballs that are still used in centralized package archives like MELPA. Sid hopes that making recipe management convenient for users and tools could provide an efficient alternative to centralized tarball-hosting archives, decentralizing authority and minimizing costly intermediate steps to package distribution.

Gradschool

Both Dominik and, now, Eutro, are officially grad students! We neglected to celebrate this milestone with another trivia challenge, to give everyone the chance to challenge Dominik for the computing trivia crown. Perhaps next week!

The Year of Syntax Spec

We noted that there were at least three talks at RacketCon this year connected to Syntax Spec. It's an exciting time for DSLs in Racket!

Musicians of the Racket Community

We observed how there were many music-related talks at RacketCon this year, and realized that everyone in the room had done some music composition or performance. Sam showed off a vast collection of instruments, both familiar and esoteric, that happened to be within reach.

Next Steps

(Some of these are carried over from last time)

Incorporate Chai into Qi.
Implement a compile-time table of known arities and add an entry to it as part of define-flow.
Ready the inlining PR to be merged and tag for code review.
Write phase 1 unit tests for inlining.
Define the define-producer, define-transformer, and define-consumer interface for extending deforestation (also encapsulating both naive and stream semantics in the definition), and re-implement existing operations using it.
Implement the remaining producers and transformers in racket/list for qi/list.
Attach a deforested syntax property in the deforestation pass, and use it in compiler rules tests (instead of string matching).
Improve qi/list and deforestation testing by writing a macro to simultaneously test the expansion and the semantics.
Investigate whether the deforested operations could be expressed using a small number of core forms like #%producer, #%transformer, #%consumer, and #%stream.
Decide on whether there will be any deforested operations provided in (require qi) (without (require qi/list))
Review which racket/list forms are actually needed in qi/list
Come up with a good way to validate the syntactic arguments to range using contracts.
Start organizing qi-lib into qi and qi/base collections
Publish qi/class in some form.
Implement DAG-like binding rules for branching forms [Syntax Spec parallel binding-spec PR]
Formalize Qi's semantics using Redex.
Incorporate effects and bindings into Qi's pen-and-paper semantic model.
Return to developing Qi's theory of effects, including accounting for binding rules.
Write a proof-of-concept for implementing code generation from abstractions of "flow" and "connective tissue" that are set by a context parameter.
Why is range-map-car slower against Racket following the Qi 5 release?
Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
Add reader flow syntax in #lang qi
Develop a backwards-incompatibility migration tool using Resyntax, to be used in the next Qi release.

Attendees

Dominik, Eutro, Sam, Sid

Qi Meeting Oct 10 2025 - drym-org/qi GitHub Wiki

Chai for Zero Or More

Summary

Background

Chai

Approach

Implementation

Constraint Solver

Form-specific Arity Inference

Nicer Error Messages?

Insights

Incorporating Into Qi?

Difficulties

Decomposing Components: Arity Inference

An Early Arity Pass?

An On-Demand Arity Inference Module?

Syntax Property vs IR

Arity API

Two Aspects of Arity Optimization

Information Loss?

Analysis vs Transformation

A Reconciled Solution

Decomposing Components: Connections

Decomposing Components: Code Generation

Plan for Chai

Diversions

Emacs

Gradschool

The Year of Syntax Spec

Musicians of the Racket Community

Next Steps

Attendees

⚠️ GitHub.com Fallback ⚠️

Qi Meeting Oct 10 2025 - drym-org/qi GitHub Wiki

Chai for Zero Or More

Summary

Background

Chai

Approach

Implementation

Constraint Solver

Form-specific Arity Inference

Nicer Error Messages?

Insights

Incorporating Into Qi?

Difficulties

Decomposing Components: Arity Inference

An Early Arity Pass?

An On-Demand Arity Inference Module?

Syntax Property vs IR

Arity API

Two Aspects of Arity Optimization

Information Loss?

Analysis vs Transformation

A Reconciled Solution

Decomposing Components: Connections

Decomposing Components: Code Generation

Plan for Chai

Diversions

Emacs

Gradschool

The Year of Syntax Spec

Musicians of the Racket Community

Next Steps

Attendees

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️