Qi Meeting Mar 7 2025 - drym-org/qi GitHub Wiki

The Best of Three Worlds

Qi Meeting Mar 7 2025

Adjacent meetings: Previous | Up | Next

Summary

We discussed what generic deforestation with Qi might look like, and did some light benchmarking of Racket's sequence type. We also revisited the age-old debate of list comprehensions vs higher-order functional sequences, and came upon some fresh insights.

Background

We've been looking into Clojure transducers more lately, and realizing that due to the DSL-based approach employed in Qi, if we are able to exhibit generic operation like transducers, we could potentially do it with a simpler interface. This could be a compelling feature that Qi could make available to Racket-based languages.

List Comprehensions vs Higher-Order Functions: Fight! (Again)

The eternal battle. Some people favor list comprehensions while others favor sequences of higher-order functional operations like map. Which approach is better?

Both are alternatives to specialized handwritten recursions that compute precisely the output you are interested in, and both introduce abstractions that make it possible to express the computations at a high level.

One main advantage of list comprehensions is that, as they are declarative, they typically perform much better than functional sequences, as they effectively compile to such handwritten recursions.

On the other hand, the main drawback is that they are "sugar that you can't compose" as you do functions. In order to add an operation to a comprehension, it's sometimes necessary to use a different form altogether (e.g., in Racket, use for/first instead of for/list), or modify the existing one in non-uniform ways (e.g., either add a new #:when condition in the bindings clause (to filter the input sequence), or transform the iterated expression (say, add an additional mapping function)).

Yet, despite this loss of uniform composability, many find list comprehensions to be an elegant linguistic style for describing such operations, a style which entails conservative comprehensions like Racket's for forms, to more luxurious instances like Common Lisp's LOOP.

For the composable higher-order function approach, a language like Ruby supports "fluent" chaining of operations:

1.upto(1_000).find_all{|i| i.even?}.map{|i| i*i}

The output at each stage is an "enumerable" (an iterator), so that, in principle, this could compute values one at a time as needed. But due to the constraints of honoring the invariants of a singular language (Ruby itself) and providing a consistent order of effects, it must fully build the intermediate collections at each stage, so that it cannot actually iterate through the input one at a time.

Clojure, via transducers, allows you to have composable operations of this kind that also avoid constructing intermediate representations, achieving the best of both worlds: the performance of a handwritten version and the composability of the functional approach. But once again, due to the constraint to honor language invariants, it must provide transducer semantics via a distinct interface, even though users widely advocate using it always. As a consequence, the idiomatic and pervasive threading macro does not exhibit this performant behavior, and instead behaves more like Ruby's fluent operations.

One of the big, as yet unexplored, possibilities of Racket's language-oriented programming approach is that different languages could have different invariants (that is, different guarantees about the meaning of programs) and at the same time coexist and interoperate. This allows a language like Qi to use ordinary, intuitive syntax to express a computation that nevertheless exhibits different, more efficient (e.g. deforested) semantics than the host language does in other settings. If this is done in a sound way, with clear boundaries defined with the host language, this could unlock new possibilities that are inaccessible to other language ecosystems.

Thus, the combination of Racket + Qi retains the best of three worlds — the composability of the functional style, the performance of a handwritten recursion, and while exhibiting a simple and idiomatic interface. Racket and Qi achieve this together because they are distinct languages.

Yet, it's early days, and we are still a long way from providing something as general and composable as transducers!

Extensible List Deforestation

The immediate next step on that long journey is to implement a simple extension interface for our existing deforested runtime.

We discussed whether adding new core forms like #%producer, #%transformer and #%consumer would help simplify the propagation of attributes from the syntax through to the compiler. It especially seems to hinge on whether there might be operations that don't fit neatly into one of these categories, or if they might fall into a different category depending on their position in the sequence, or if there are other such special cases.

One possible benefit we saw is that formally including specific deforestable semantics in the core language, via the addition of these forms, could perhaps allow us to naturally compile Qi core values operations (like (~> (pass odd?) (>< sqr))) to these forms in some cases, which we have talked about in the past as a way to address Qi's "core performance issue" in specific cases.

The right approach remains unclear at this time, so it likely makes the most sense to first implement the interface in terms of the existing #%deforestable form before attempting any such simplifications.

Generic Deforestation

Looking past that immediate horizon, we considered a few different options for achieving generic deforestation with Qi. The important thing to note at the outset is that Qi's fusion runtime operates in terms of streams of values and is therefore already generic. But today, it implicitly assumes a list input and output.

Explicit User Invocation

The user could explicitly indicate conversion to and from a list:

(~> (my-vector) vector->list (filter odd?) (map sqr) list->vector)

This has the advantage of explicitness and ease of implementation. We would just need to recognize vector->list and list->vector as a producer and a consumer, respectively, and then this would avoid any intermediate representations.

The drawback is that, from the user perspective, the fusion pipeline operates in terms of lists and is not generic.

Modules for Each Type

Another option is to provide modules like qi/vector, analogous to qi/list, which implicitly adds the vector ↔ list conversions, and further, provides forms like vector-map, vector-filter, etc., that compile to fusable streams (just like qi/list).

This would provide implicit deforestation of vectors and other types without requiring explicit conversion by the user, but the interface would still be type-specific rather than generic, and thus not as seamless.

A Generic `qi/sequence` Module

A third possibility is to piggyback off of the generic behavior already exhibited by Racket's sequence type, and provide a "type-specific" qi/sequence module specific to the sequence type, which itself is a generic type.

This library would include forms like sequence-map and sequence-filter, which would compile to fusable streams.

This option would exhibit generic behavior, but the interface would be verbose by expecting fully qualified, "sequence-" prefixed, generic names. We could potentially name the APIs map and filter, instead, to mitigate this issue. But there may be performance issues from using the generic sequence type in all cases (as we established later, see below).

Runtime Generic Dispatch

A final option is to offer a qi/seq module that offers generic operations named simply map, filter, and so on, and which implicitly infers appropriate type conversions on either end of the pipeline based on runtime generic dispatch. That is, something like this:

(cond [(list? v) list->cstream-next]
      [(vector? v) vector->cstream-next]
      ...
      [(sequence? v) sequence->cstream-next])

A few things to note:

As the dispatch is a one-time cost, it would not make a difference to asymptotic performance.
Yet, even though the operation is generic, the performance would retain characteristics of the input type. For instance, generic operation of Racket's sequence type may come with some overhead in producing individual values (which we quantify below) that a vector or list would not have, and that would reflect on the performance of the entire pipeline.
The type would only be known at the producer end, so we would need to have some syntactic mechanism by which to propagate this type information at compile time to subsequent stages of the pipeline through to the terminal stage, which may need to reconstruct a sequence of the source type.
As Racket's sequence type is the fallback used here, it means that we would inherit its full genericity, including being able to deforest operations on I/O ports. This would achieve a level of generality comparable to Clojure's tranducers.

This would likely be the most challenging to implement, as we would need to conditionally generate different fused operations based on the input type, and also propagate some information at compile time.

We felt that this option seemed the most compelling, but we wanted to run the interface by other Qi contributors, especially Ben, to see if there's anything about its behavior that a user may find surprising.

Benchmarking Racket's `sequence` Type

Racket's sequence type is generic and often lazy (a stream is a sequence). We wanted to get a basic sense of its performance, in order to help us choose the best option from the above list.

To do this, we used Qi's existing "smoke" benchmarks, which aren't suitable for precision benchmarking, but which offer a consistent rough idea of performance. These benchmarks compare Qi vs Racket implementations of various algorithms and operations.

We modified the filter-map-foldl implementation. The original Racket version is:

(define (filter-map-foldl lst)
  (foldl + 0 (map sqr (filter odd? lst))))

While the Qi version is:

(define-flow filter-map-foldl
  (~> (filter odd?) (map sqr) (foldl + 0)))

The results show:

 (filter-map-foldl 0.28)

That is, the Qi version is about 3-4x faster.

We first tried modifying the Racket implementation to:

(sequence-fold + 0 (sequence-map sqr (sequence-filter odd? (in-list lst))))

The result showed:

 (filter-map-foldl 0)

due to rounding tolerance, but based on the actual measured times of 38284 ms vs the original of 804 ms, we discerned that this resulted in a ~50x slowdown(!), so that the Qi version was now 150-200x faster.

We then tried:

(define (filter-map-foldl lst)
  (for/fold ([v 0])
            ([e (in-sequences (in-list lst))]
             #:when (odd? e))
    (+ v (sqr e))))

This explicitly (lazily) constructs a generic sequence from the input list, and should give us some idea of the cost of this. The results showed:

 (filter-map-foldl 0.35)

That is, we are now back at baseline performance of the Racket higher-order function version.

We then modified it by removing the explicit sequence construction and retaining just in-list. This should allow the list type to be inferred and avoid use of the generic sequence entirely, in the compiled version of the list comprehension:

(define (filter-map-foldl lst)
  (for/fold ([v 0])
            ([e (in-list lst)]
             #:when (odd? e))
    (+ v (sqr e))))

As expected, this brought the performance up to parity with the Qi version.

 (filter-map-foldl 1)

In comparison with the preceding version, that told us that the overhead of a generic sequence is responsible for the 3x slowdown from list-specific operation. We expect this to also characterize deforestation of sequences vs lists.

Next Steps

(Some of these are carried over from last time)

Define the define-producer, define-transformer, and define-consumer interface for extending deforestation, and re-implement existing operations using it.
Implement more fusable stream components like drop, append, and member. [issue #118]
Why is range-map-car slower against Racket following the Qi 5 release?
Why is the benefit of call-with-values in isolation not manifesting when incorporated into the Qi compiler? [PR #191]
Ensure (eventually) that both codegen and deforested runtimes are included in the info struct and not in the IR.
Resolve the issue with bindings being prematurely evaluated. [issue syntax-spec#61]
Fix the bug in using bindings in deforestable forms like range [issue #195]
Write a proof-of-concept compiling Qi to another backend (such as threads or futures), and document the recipe for doing this.
Review Cover's methodology for checking coverage (e.g. wrt. phases) [related issue: #189]
Improve unit testing infrastructure for deforestation.
Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))

Attendees

Dominik, Sid