Qi Meeting Aug 20 2025 - drym-org/qi GitHub Wiki
Doing More With Less
Qi Meeting August 20 2025
Adjacent meetings: Previous | Up | Next
Summary
We debugged an issue blocking deforesting producers and found a way forward. We also discussed representing streams in the core language (aka "Qi Enterprise Edition").
Background
Dominik has been working on refactoring deforestation so that the stream semantics of each sequence operation are encapsulated in the definition, instead of being separately injected within the compiler by matching the name of the form. Refactoring transformers in this way went smoothly, but producers presented some challenges due to the need to inject implicit producers when there aren't any explicitly indicated.
Defining Producers
With recent progress, stream transformers are already well on their way to being uniformly defined with define-deforestable, and Dominik doesn't anticipate any special challenges to incorporating all remaining transformers. So, alongside making incremental progress on that, he's also been exploring stream producers as the next target for this kind of streamlining, to reveal and confront any unknowns there.
He first created a producer variant of define-deforestable, which now accepts a #:producer flag marking the definition as a producer through the various stages of compilation.
He then used this form to define the implicit producer, list->cstream, which converts a list into a stream. This form is injected into the syntax at the start of a deforestable sequence, if no explicit stream producer (such as range) is indicated.
So far, so good.
Deforestable Patterns
For singleton stream components that occur in isolation, there is no performance benefit to deforestation, and indeed, perhaps a cost, so we use the naive or "fallback" implementation in such cases. For any nontrivial sequence, of length > 1, we match it against the following comprehensive, yet simple, patterns, corresponding to 4 distinct cases.
- transformer*
- producer, transformer*
- transformer*, consumer
- producer, transformer*, consumer
The last of these represents the full lifecycle of a stream --- something produces it, it may undergo various transformations, and then something consumes it. When the input syntax matches this last pattern, nothing needs to be done aside from compiling these matched forms together to their implied fused stream (rather than separately to their respective fallback runtimes).
But in each of the former three cases, the stream is incomplete. Either a producer is missing, or a consumer, or both. In these cases, in order to have a well-formed stream, we must inject the implicit stream or unstream operation at either or both ends of the pipeline.
Formerly, this was being done entirely in the compiler by injecting a simple datum literal for either a producer or a consumer, and then generating the runtime from this well-formed stream.
But now that streams are officially (though minimally, at this time) part of the core language via the #%deforestable form, Dominik took it upon himself to explicitly define the stream and unstream operations as (trivial) deforestable operations, which simply convert streams to and from lists (their current behavior). Thus, the patterns in the compiler could match actual, formally complete, streams, instead of relying on crude datum matching on manufactured syntax.
This seemed simple enough.
The Right Place at the Wrong Time
Dominik first defined the implicit stream operators using define-deforestable, which expands to a desired #%deforestable. For the fallback implementation, he simply used identity. Technically, list->stream should probably produce some kind of stream implementation, which would require a corresponding stream->list to convert it back to a list, which, together should equate to the identity operation, but as we will likely be removing all instances of list->stream and stream->list during the deforestation pass, we do not expect these forms to appear in the compiled output, so using identity for a fallback that would be disregarded anyway was a natural choice.
Next, he modified syntax classes to match list->stream and stream->list literally instead of as datums. This presented some awkwardness as the deforestation pass of the compiler now needed to depend on these forms which are provided by the list.rkt module which isn't part of the compiler. But that may be OK for now --- we just want to get it working first, and can clean up the implementation once we have a good starting point.
But things rapidly got more complicated from there. The list.rkt module (corresponding to (require qi/list)) contains macros and not core forms. This meant that by the time the deforestation pass is in operation, the list->stream and stream->list forms have already been expanded to the corresponding #%deforestable syntax. So we cannot match simply list->stream but rather, need to pre-expand this form, on demand, and then match against the expanded syntax.
Initial efforts here led to some weird errors, partially stemming from the onion layers of expansion and compilation. We were dealing with problems straddling Qi expansion and compilation within the broader Racket expansion phase, which would be followed by Racket compilation, and some standard Racket utilities for expansion were producing strange errors within this setting. At this point, there were enough balls in the air that failures became hard to trace.
For one thing, all of the errors were being reported by "cstream->list," yet they didn't seem to have anything to do with this form. Dominik explained that this is being implicated simply because it's in the form position in the fused stream, and Racket assumes that the literal identifier at that position is the responsible party when reporting errors. So cstream->list was just in the right place at the wrong time, and wasn't usually the cause of these errors. This was just one of the issues making errors hard to trace.
A Series of Unfortunate Events
Right away, when Dominik had finished giving us the overview, we noticed a typo --- the stream producing macro was called list->stream, but the compiler was attempting to pre-expand list->cstream. There was also another typo, #%:deforestable instead of #%deforestable! Could it be as simple as that? No. Unfortunately, fixing these just led to a different error.
We soon noticed another problem in the pattern matching --- the define-deforestable #:producer syntax was now attaching a P tag to the syntax metadata to indicate to the compiler that it's a producer, but unlike matching transformers where we have a #:when clause to look for a T tag, there was no corresponding #:when clause to look for P in the producer class. We made that change, and that led to yet another error, but now we were getting somewhere, as the errors made more sense.
The new error was something to the effect of "qi-macro instance cannot be expanded." We realized that this was due to the use of local-expand to pre-expand the list->stream macro, in an unusual setting for the Racket expander where we are actually in the middle of Qi expansion. Sid recalled an on-demand Qi expansion utility we have that might be usable here instead, expand-flow from (submod flow/extended/expander invoke). We tried it and ran into a few more errors, but after phase shifting the import from the "syntax" to the "template" phase, the errors were gone!
Looking Ahead
We made some observations and identified some remaining issues to think about.
Nonideal Aspects
The current thrust on deforestation is just to get it to work properly in an end to end way, and we've already identified some aspects for improvement as we get further along, including:
- duplication of the match pattern in the new unified transformer syntax class (
fst-new) - duplication of the
#%deforestableclause parser in the deforestation pass and in code generation - dependency on pre-expansion
list.rktin the post-expansion compiler - attaching artificial names during expansion including "
cstream" just to be able to inject something that can be matched in the compiler and pass our current test suite!
These could be improved individually, but Sid felt that formally including streams in the core language would likely address all of these at one go.
Making Streams Integral in the Core Language
Although streams are formally part of the core language, they currently share a single core form, #%deforestable, that encodes all stream semantics. This was a gateway to including streams in the core language and thereby avoid having to match and rewrite Racket syntax.
Since that introduction, we've discussed broadening the core form representation of streams to explicit representatives for each stream type such as producers and consumers. This would allow us to define a simple stream "algebra" in the core language as normalization and optimization rules, distinct from their implementation. This ability to reason logically about streams during compilation could support advanced debugging tools and compiling to different backends (e.g., other than CPS lambdas), and also allow the use of this stream algebra to optimize other aspects of the language such as core values pipelines including >< and pass.
Based on the recent challenges, we discussed adding stream and unstream as core forms, as this would avoid the retrograde dependency on list.rkt while supporting stream and fallback semantics in a natural way (i.e., as stream composition and code generation, respectively).
All told, so far, based on past and present discussions, we've discerned the following candidate additions to the Qi core language:
stream--- stream constructor (generic, but initially tied to lists)producertransformerconsumerfused-streamunstream--- reconstruct the input data type (generic but initially tied to lists)
It will of course take further development to understand how to neatly fit these into the standard hierarchy of expansion and compilation, and assess the feasibility of this approach. To make progress in this direction, some questions we'd need to answer are:
- Are the above core forms sufficient?
- How can we parametrize these forms to derive all sequence operations (e.g., those in
racket/list)? - How to derive naive runtimes from the same parametrization?
- What are the rules of "stream algebra"?
As Dominik had anticipated early on, the more we work on stream implementations of the operations in racket/list (which we consider to be a sufficiently rich representation of sequence operations), the more it's revealing what the necessary primitive stream concepts are. Eventually, the formal inclusion of such concepts as core forms could significantly simplify our deforestation pipeline while at the same time making it more useful for diverse optimizations, fulfilling the dream of Qi Enterprise Edition™.
Validating range Input
Formerly, we were validating the flow inputs to range, which were received as ordinary function arguments. This was in some cases resulting in even better error messages than Racket itself was providing! But now that these "inputs" are provided syntactically, we no longer validate them. We will need to think about how to modify the implementation to ideally provide the former level of validation. Last week we had discussed matching the arguments and validating them in the syntax template, but this presents some challenges as these macros generate Qi rather than Racket, and contracts are naively a Racket facility. The contracts themselves would likely therefore need to be part of the Racket runtimes, which may then necessitate some way to encode these in the #%deforestable syntax.
Doing More With Less
Dominik pointed out that with recent work, the deforestation core is getting smaller even while doing more and gaining more features.
The machinery to fuse the stream is now completely decoupled from the definitions of fusable streams. The transformer runtime is already generic and not specific to lists.
The "prepare" step for stream components --- which does some limited normalization of the syntax and bundling of state, and also attaches contracts --- can also likely be further simplified.
Consumers?
With the work on producers unblocked, the plan is to keep going and see if there are any further challenges. Dominik plans to remove the old producer syntax classes next, retaining only the unified fsp-new. Once we're convinced that producers, like transformers, are manageable with the existing infrastructure (including any necessary modifications), then consumers would be the next target for investigation.
Top Chef, Etc.
In this week's edition, Stephen prepared some tasty pasta for dinner, with mushrooms and homegrown tomatoes. Sid made some upma for lunch after the meeting, with an extra dollop of ghee at the end for good measure.
Sid is going to Yosemite Valley this weekend. 🏞
Next Steps
(Some of these are carried over from last time)
- Come up with a good way to validate the syntactic arguments to
rangeusing contracts. - Define the implicit consumer using
define-deforestable - Incorporate effects and bindings into Qi's pen-and-paper semantic model.
- Implement the remaining producers in
racket/listforqi/list. - Implement the remaining transformers in
racket/listforqi/list. - Remove unused parsing code for deforestable forms (esp. for producers).
- Verify whether
qi/list'srangeworks the same asracket/list'srangewrt. reals and rationals. - Improve
qi/listand deforestation testing by writing a macro to simultaneously test the expansion and the semantics. - Attach a
deforestedsyntax property in the deforestation pass, and use it in compiler rules tests (instead of string matching). - Incorporate producers and consumers into
define-deforestable's unified approach for naive/deforested semantics. - Incorporate other transformers in the parametrization proof-of-concept to see if a runtime could be shared among all transformers.
- Review which
racket/listforms are actually needed inqi/list - Implement DAG-like binding rules for branching forms
- Implement
tryexception binding and start a PR for review. - Return to developing Qi's theory of effects, including accounting for binding rules.
- Write phase 1 unit tests for inlining.
- Create an issue for the general bindings syntax to get feedback on it.
- Formalize Qi's semantics using Redex.
- Start organizing
qi-libintoqiandqi/basecollections - Invite Sam to tell us about data sharing approaches used in the
ukelibrary - Fix linking of bindings issue in Qi docs.
- Ready the inlining PR to be merged and tag for code review.
- Publish
qi/classin some form. - Verify whether the
call-with-valuesperformance discrepancy exists on ARM machines, too. - Define the
define-producer,define-transformer, anddefine-consumerinterface for extending deforestation, and re-implement existing operations using it. - Review whether the deforested operations could be expressed using a small number of core forms like
#%producer,#%transformer,#%consumer, and#%stream. - Write more reliable nonlocal benchmarks (in
vlibench) - Undertake a "fantastic voyage" to get to the bottom of the performance puzzle in using
compose-with-values. [see also: PR #191] - Implement more fusable stream components like
drop,append, andmember. [issue #118] - Why is
range-map-carslower against Racket following the Qi 5 release? - Resolve the issue with bindings being prematurely evaluated. [issue syntax-spec#61]
- Fix the bug in using bindings in deforestable forms like
range[issue #195] - Write a proof-of-concept compiling Qi to another backend (such as threads or futures), and document the recipe for doing this.
- Review Cover's methodology for checking coverage (e.g. wrt. phases) [related issue: #189]
- Improve unit testing infrastructure for deforestation.
- Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
- Decide on whether there will be any deforestation in the Qi core, upon
(require qi)(without(require qi/list))
Attendees
Dominik, Sid, Stephen