Qi Meeting Aug 27 2025 - drym-org/qi GitHub Wiki
Qi Meeting August 27 2025
Adjacent meetings: Previous | Up | Next
Sam presented his dataframe and data processing libraries which reprovide Qi. Dominik shared updates on the deforestation refactor for producers. Eutro shared some initial reduction and congruence rules for formalizing semantics of core Qi forms. We discussed possible ways in which we might use Jacqueline's Resyntax library.
Dominik has been working on refactoring deforestation to streamline how operations are defined.
Sam teased us some time ago that he's been hacking on a functional dataframe library that uses Qi.
Eutro has been leading us on the road to formalizing Qi semantics.
Jacqueline recently added improvements to Resyntax that allow it to be used with any #lang.
Sam has been working on an immutable dataframe library for data science applications. He started down this road simply because he wanted to avoid making Excel spreadsheets, preferring to use a data science approach in his favorite programming language, instead. Today, he finally had a chance to stop by and show us what he's been working on.
And boy, was it worth the wait!
Sam quietly released Uke and Machete, by the looks of things, earlier this year. Uke is an immutable dataframe library, and Machete provides ways of slicing and manipulating dataframes.
To frame what these libraries are about, it's useful to know that some time ago, Alex HarsΓ‘nyi released a data-frame library providing a dataframe type along with efficient operations for manipulating such dataframes in place.
Then, a few years ago at RacketCon β as it happens, during the same session that Qi was introduced! β Tulip Amalie announced Sawzall, a library for functionally manipulating these dataframe objects using the threading macro. It leverages the observation that a series of functional transformations is a natural way to approach data science tasks.
Although Sawzall is incredibly expressive, Tulip mentioned at the time it was released that an issue to be addressed is the performance penalty from functionally transforming a mutable datatype (e.g., using the threading macro). They observed that this necessitates constructing a fresh, transformed, dataframe, after every operation, involving a lot of copying. They expressed a desire for it to be addressed down the line.
Uke and Machete follow in the tradition of data-frame and Sawzall. The essential difference is that Uke implements an immutable dataframe type. Functional operations on this type produce fresh values, yet, much of the underlying data is shared. For instance, obtaining a new dataframe containing only the first ten rows of an existing one is just a matter of making a new "index" with this bound β a constant time operation. Further slicing of the data may be done by composing such indexes rather than by constructing fresh dataframes.
The Machete library takes advantage of this property in providing a limited set of data wrangling utilities, drawing inspiration from Sawzall, but Uke's immutable design (which draws inspiration from Pandas) avoids the cost from copying that Tulip had identified. Machete also extends and reprovides Qi instead of the threading macro, enabling even more fluency in functional data transformations.
These are still at an early stage, of course, and likely much less comprehensive than Sawzall and data-frame at the moment. Perhaps there could be some profitable interactions with Sawzall and data-frame development here!
Sam showed us an example of representing data and then doing some aggregations, presenting the results using Dominik's uni-table library.
We noted that he was using Racket's map with a nested flow, and suggested using qi/list instead, which resulted in some simplication (and of course, is faster!).
We also noted some uses of lambda. Sid said that this was blasphemy, and that flow should be used there instead. But in that particular instance, the need for the lambda was to introduce and use various bindings, and Sam retorted that using as in Qi feels like another kind of blasphemy. TouchΓ©, Sam.
Sam mentioned that in some cases, a Machete user may choose to name an intermediate data frame. But in many cases they may prefer to just rely on a formulaic name autogenerated from the context, in terms of the names of the operations and the variables involved.
To do this, he considered many options and eventually seemed to converge on writing a version of a Qi macro-defining form that uses a compile-time datatype to store syntax from different stages of compilation, i.e., the name of the operation being used and the name of the variable being operated on. It then would use this to compose a name for the resulting data that is part of the produced expansion.
But after trying to extend the core qi-macro type in this way (soon after the meeting ended), he found he was unable to do it:
(begin-for-syntax
(struct my-qi qi-macro ()))
;; struct: parent struct type not defined;
;; identifier does not name struct type information in: qi-macro
Sid verified that the same code works with Qi 3.0 when qi-macro was defined by Qi, but it is now defined by Syntax Spec. This version is evidently offering a more limited extension interface, but, we wondered, by design or by accident? If there isn't a straightforward alternative way to write this macro, it might be worth investigating further and potentially reporting upstream.
Some time ago, Jair presented about the One True Model of Programming, Flow-Oriented Object Oriented Programming, or FLOOP. He had an almost identical observation about the use of Racket objects with Qi as Tulip had about mutable dataframes in the threading macro β it's inefficient for large scale tasks as every step involves making a copy. Could some of the approaches taken by Sam in Uke be applicable to writing an immutable object system for use in The One True Model of Programming, FLOOP?
We've often discussed compiling flows to alternative backends, leveraging different implementations of the core concept of "flow."
Strategies we've considered typically involve using an entirely different code generation module than the standard one. For instance, in place of the flow.rkt module that defines Qi's central flow macro, a new flow-alternate.rkt module could invoke the Qi expander and then compile the resulting expanded syntax using a distinct compiler, in defining an alternative flow macro. This has the benefit of sharing the common Qi expander but the drawback of duplicating compilation steps such as normalization.
We've also considered a language composition scheme that defines an expander, a compiler, and a codegen as composable units. Languages composed from such units are themselves composable if they share a common "base" language. This would allow the expander and optimizing compiler to be shared across variants, but each would employ a distinct code generation module.
Sam proposed yet another alternative some time ago, which we briefly discussed today:
- Implement code generation in terms of explicit abstractions of (1) a flow and (2) the "connective tissue" between flows (we'll see examples below)
- Obtain the implementation of those abstractions in the compiler from a context parameter (to use Rhombus's term for the more ambiguous "parameter" used in Racket).
- For the usual
(require qi), this abstraction would be implemented by functions andcall-with-values, respectively. - Other
qi/flow-alternatemodules could set this parameter appropriately for futures or channels or anything else.
This would allow the sharing of the entire Qi codebase, including the expander, the compiler, and even the codegen, across all of these distinct variants of Qi.
It sounds like a promising option. Writing a proof-of-concept would probably be a good step to explore it. One Interesting (but incidental) thing is that attempting to implement the standard flow connective tissue as call-with-values some time ago led to puzzling performance degradation, and it might be worth going on a "fantastic voyage" to investigate the cause, if we pursue this option.
Dominik shared an update on deforesting producers. It sounds like producers are now essentially done.
Specifically, the syntax classes for producers have been unified into a common one (placeholder name: fsp-new), similarly to transformers (fst-new), and the definition of range now explicitly includes both naive as well as stream semantics.
As part of getting it to work, we introduced a dependency of the compiler deforestation pass on the qi/list module, which is a list-oriented module currently bundled with qi-lib.
We discussed ways to address this now that the pipeline was working. One easy option is to simply move the definition of list->cstream to a different module. This would avoid the dependency on qi/list, which is especially undesirable, but it still needs to depend on a pre-expander definition in the compiler, which is the core problem.
There may be some good long term options for handling this (such as the proposed explicit representation of streams in the core language, aka Qi Enterprise Editionβ’), but for the short term, the main thing is to not introduce a dependency of the compiler on the expander. For this, one option we came up with was to expand the list->cstream definition by hand and then paste that expanded syntax in a new definition in a convenient compiler module. This new definition could then be used for matching purposes, e.g., aliased to list->cstream in a syntax class.
Dominik said he's ready to try consumers next. Unlike producers, there is only one attribute defined to differentiate consumers --- the end expression which is spliced into and wraps the full deforested pipeline. This should make consumers relatively easy to migrate ("Famous last words π" - Ben, on Discord).
Currently, consumers like car, cadr, caddr, and so on, all "internally expand" to a use of the common list-ref stream runtime. But their fallback runtimes are all distinct, directly leveraging the correspondingly named Racket forms.
We considered whether this could be simplified by defining these as ordinary Qi macros that expand to list-ref, which itself would be the only form defined using define-deforestable. That way, both stream and naive runtimes would be shared.
We observed that list-ref may not be as optimized as a form like car, so that this approach might entail a performance cost. Yet, optimizing (list-ref 0) to car is technically a concern of the Racket compiler. Some time ago we had discussed, based on input from Sam P and Sam T, that efforts in the Qi project should also consider the broader stack in which it participates, and we converged on the principle:
Whenever an optimization would help Qi, if it isn't specifically derived from Qi's theory of optimization, it should be considered a candidate for promotion to the Racket compiler.
So, perhaps, if the Racket compiler isn't already optimizing (list-ref 0) to car, then based on this principle, we should consider implementing this optimization in Racket rather than sidestep it at the Qi level. This might be worth asking about in Racket forums.
Jacqueline has recently made a big improvement to Resyntax that allows it to be used with any #lang. We discussed whether Qi could potentially make use of this powerful refactoring tool to help users write flows more effectively.
Sid discussed some options with her on Discord and brought them up in the meeting:
- A migration tool to ease backwards-incompatible transitions.
We occasionally need to make backwards-incompatible changes. But each time we do, it takes a lot of care and effort to work with community members, test existing codebases and migrate them if necessary, and coordinate across varying schedules of various contributors and users.
Of course, we are happy to make this necessary effort, each time, but it would make everyone's lives easier if we could automate this process somehow.
One option is to write a set of Resyntax rules for each such major transition which could be coupled to a purpose-built #lang, for instance, #lang qi5-to-6.
Then, we could write a simple tool to (1) rewrite all modules using Qi to use #lang qi5-to-6, (2) execute the Resyntax ruleset to migrate the code, (3) rewrite the modules back to their original #lang lines.
This would allow such transitions to be relatively painless for developers and users.
Eutro suggested that we could potentially even use a metalanguage prefix for this, resembling #lang qi5-to-6 racket. If it's possible to use such a prefix with Resyntax, then the wrapping "entry and exit" script could just use simple regexes and wouldn't need to store any information about lines changed. We felt we should discuss this possibility with Jacqueline.
- User-facing suggestions.
Eutro had the brilliant idea to implement an ASCIIβUnicode set of rules, encouraging translations like flow β β―, sep β β³, and amp β ><!
She was only kidding. While Jacqueline would love this idea, the textual names of the forms are, of course, just fine.
More seriously, as a simple example, (~> (~> f)) could be written as (~> f) which itself is just f. We have such rules in the normalization pass of the compiler, but it could also be useful to present these translations to users so that their own code is as simple as possible.
An even more useful example came up on Discord, where using in-range or Racket's range together with either Racket's or Qi's map, filter, etc., is suboptimal, as using Racket sequence operations with Qi ones would reduce the benefits gained by Qi's deforestation. Jacqueline pointed out that Resyntax could be used to suggest the use of the Qi forms in such cases, which aren't only a matter of style but also provide significant performance gains.
Unfortunately, as Resyntax's design is coupled to the #lang specifically, it cannot be modulated just by the presence of a (require qi) in an otherwise #lang racket (or whatever) module. This means that in order to present user-facing refactoring suggestions, we would need to use it as part of a #lang that integrates Qi out of the box.
The good news is that such languages are in the works in the #lang raqit laboratory. The bad news is, these languages are still in the early experimental stage and aren't likely to see the light of day anytime soon. We talked about the immediate needs of #lang raqit not long ago. Resolving them would certainly move the project along!
As an aside regarding user-facing refactoring rules, we discussed that while there are many such simplifications and equivalences that would be appropriate as refactoring rules, there are in principle even more of them that would come up in the compiler (e.g., in the normalization pass). This is both because patterns would be generated and encountered in the compiler that wouldn't be used by users, and also because the compiler can do strictly more powerful context-sensitive transformations. Therefore, we felt that we should expect Resyntax refactoring rules to be a strict subset of compiler rewrite (esp. normalization) rules.
On the other hand, it's possible that there could exist two flows fβ and fβ that are equivalent from the compiler's perspective, and fβ is faster than fβ, but fβ is simpler than fβ. Then, since the compiler rewrites fβ to fβ anyway, it would make sense, nevertheless, for a refactoring rule to recommend fβ β fβ!
So, although we expect refactoring rules to generally translate to optimization rules, it's possible that there are exceptions and that it isn't a strict subset relation.
We felt this theoretical relationship would be useful to keep in mind, as it would likely inform the development of both refactoring as well as optimization rules.
We felt that option (1) is something we could realistically aim to do as part of the next major release, Qi 6, and could spend a couple of meetings implementing it as part of release preparations, with Jacqueline's help. As far as a specific project to focus our efforts here, renaming Qi's values-oriented pass form to allow could be a good candidate.
For user-facing refactoring rules, it looks like that would need to be part of the longer-term #lang raqit effort.
Eutro showed some examples laying the groundwork for formalizing the language in a way that we could encode in Redex. This boiled down to specifying (1) reduction relations and (2) congruence relations, for each core form.
A reduction relation for ~> resembled:
(~> (v ...) f g ...)
β
(~> () (gen (f v ...)) g ...)
We noted that it's convenient to assume that (gen (values v ...)) is equivalent to (gen v ...), even though the former isn't actually supported by gen today. The main constraint we need to abide by here is that each expression must be expressed in terms of floes, and technically, the former gen isn't a valid floe. But we convinced ourselves this would be formally OK by writing a Qi macro resembling:
(define-qi-syntax-rule (gen-values vfloe ...)
(esc (Ξ» _ (append (call-with-values (Ξ» () vfloe) list) ...))))
β¦ which has the desired (though not actual) gen behavior, and which, as a Qi macro, is, in fact, formally a floe, whose semantics reduces to a formalization of esc. This characterization is sufficient for our purposes, formally, but we don't yet know if it's the best way to go about it.
In addition to reduction relations, Eutro established a congruence relation for a handful of forms, something like:
fβ β fβ
βββββββββββββββββββββββββββββββββββββββββ
(~> f ... fβ g ...) β (~> f ... fβ g ...)
A small number of such rules was sufficient to specify each form.
On Discord, Ben had also raised the possibility of a "point-free" formalization of the semantics β that is, formalization without reference to arguments.
I remain deeply curious to see if Qiβs semantics are specified primarily in the shape ((flow β¦) arg β¦) or just (flow β¦) π the former is about the behavior on concrete values, while the latter is basically comparing lambda terms, so proofs are more βinterestingβ (look up functional extensionality, which lets you say the terms are equivalent if they give you the same result for all inputsβback to treating values like the former again!).
Food for thought!
Every year or so, Dominik revisits his favorite rasterizer and reimplements it using some hot new technology. That time arrived this year when Matthew announced that the new "parallel threads" feature in Racket is ready for testing. This feature promises to exhibit the simple interface of ordinary threads with the parallel performance of futures.
Dominik translated the rasterizer implementation from using futures to using these new parallel threads, and got it to work.
He also did some benchmarking of the new feature against futures and established that there are at least no isolated crashes (as the BC implementation sometimes did), and that there is no performance degradation when compared with futures. To be more thorough, it would be necessary to test the new feature on a GUI application, which it sounded like Bogdan has already started doing.
Regarding whether these new parallel threads would always provide performance gains over traditional threads, Dominik said that it's more that it enables new possibilities for performance gains, and it would be useful to first reflect on whether data could be partitioned in some way to take advantage of these possibilities. "Once you can partition the data, you can do magic."
Eutro has been working on her init system for Emacs, "xup" (pronounced ex-up), and along the way discovered a bug in EIEIO, Emacs Lisp's Object-oriented system, related to object inheritance. She plans to submit a bug report on this decades-old codebase. That's how you know you're deep in a rabbit hole.
As part of this effort, she's developed an industry-grade logging system that color codes different tiers of logs (e.g., error, warning, etc.), and also provides the exact line number and column of their origin. Unfortunately, it turns out that Emacs only annotates syntax with such accurate source locations during the process of byte compilation, meaning that at present, these locations can only be reported for byte compiled code! Racket really spoils us.
She's also developed an elisp profiling tool and visualizer that promises to be very handy during development.
Sid mentioned that users of his Emacs tools have sometimes reported performance problems when they see these tools in the stack trace during a slow Emacs operation. The only surefire way to escape culpability in such cases was to avoid showing up in the stack trace at all (e.g., by avoiding using Emacs's advice for implementing user features). But with a visualizer like the one Eutro is developing, it would be easy β and more helpful β to enable users to identify the true cause of slowness in these cases.
Sid expressed the fervent hope that Eutro will emerge from at least some of these very interesting rabbit holes, so that he can use these promising tools!
(Some of these are carried over from last time)
- Develop a backwards-incompatibility migration tool using Resyntax, to be used in the next Qi release.
- Check with Jacqueline whether a metalanguage prefix could be associated with a Resyntax ruleset (as opposed to only a pure
#lang) - Follow up on whether
qi-macroshould be subtypable (e.g., for Uke/Machete). - Write a proof-of-concept for implementing code generation from abstractions of "flow" and "connective tissue" that are set by a context parameter.
- Come up with a good way to validate the syntactic arguments to
rangeusing contracts. - Define the implicit consumer using
define-deforestable - Incorporate effects and bindings into Qi's pen-and-paper semantic model.
- Implement the remaining producers in
racket/listforqi/list. - Implement the remaining transformers in
racket/listforqi/list. - Verify whether
qi/list'srangeworks the same asracket/list'srangewrt. reals and rationals. - Improve
qi/listand deforestation testing by writing a macro to simultaneously test the expansion and the semantics. - Attach a
deforestedsyntax property in the deforestation pass, and use it in compiler rules tests (instead of string matching). - Incorporate consumers into
define-deforestable's unified approach for naive/deforested semantics. - Incorporate other transformers in the parametrization proof-of-concept to see if a runtime could be shared among all transformers.
- Review which
racket/listforms are actually needed inqi/list - Implement DAG-like binding rules for branching forms
- Implement
tryexception binding and start a PR for review. - Return to developing Qi's theory of effects, including accounting for binding rules.
- Write phase 1 unit tests for inlining.
- Create an issue for the general bindings syntax to get feedback on it.
- Formalize Qi's semantics using Redex.
- Start organizing
qi-libintoqiandqi/basecollections - Fix linking of bindings issue in Qi docs.
- Ready the inlining PR to be merged and tag for code review.
- Publish
qi/classin some form. - Verify whether the
call-with-valuesperformance discrepancy exists on ARM machines, too. - Define the
define-producer,define-transformer, anddefine-consumerinterface for extending deforestation, and re-implement existing operations using it. - Review whether the deforested operations could be expressed using a small number of core forms like
#%producer,#%transformer,#%consumer, and#%stream. - Write more reliable nonlocal benchmarks (in
vlibench) - Undertake a "fantastic voyage" to get to the bottom of the performance puzzle in using
compose-with-values. [see also: PR #191] - Implement more fusable stream components like
drop,append, andmember. [issue #118] - Why is
range-map-carslower against Racket following the Qi 5 release? - Resolve the issue with bindings being prematurely evaluated. [issue syntax-spec#61]
- Fix the bug in using bindings in deforestable forms like
range[issue #195] - Write a proof-of-concept compiling Qi to another backend (such as threads or futures), and document the recipe for doing this.
- Review Cover's methodology for checking coverage (e.g. wrt. phases) [related issue: #189]
- Improve unit testing infrastructure for deforestation.
- Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
- Decide on whether there will be any deforestation in the Qi core, upon
(require qi)(without(require qi/list))
Dominik, Eutro, Sam, Sid