20210408 DEV M1 R4 Manifests and DSLs - orbitalfoundation/wiki GitHub Wiki

M1 R4 Thoughts

In this revision we are down in the weeds on code. There are several technical observations. Manifests are a new insight. DSLs come back to the foreground.

Reviewing Product Stories

We're prototyping a mass market consumer experience that is a cross between a future oriented web browser and a desktop. End users can fetch apps over the web, have them fetched locally and run persistently with fine grained control over permissions. The use case is very much aimed at a spatial computing stack for say AR glasses and in fact HTML is not provided at all initially.

Let's talk it through in terms of a story in the agile sense (See: https://www.atlassian.com/agile/project-management/epics-stories-themes). To drill down on this a bit more - we currently imagine a user engaging with our product in this way:

  1. The user runs our product from their typical operating system or environment (running it like say anything other app, tool or service).

  2. Our app opens what appears to be a desktop within a desktop.

  3. The desktop view shows all sub-apps that a user has downloaded and allows fine grained controls over whats running and permissions per app.

  4. There is also a non-maskable bar that has a URL input box and a "return to desktop" button.

  5. Users can type in an URL and this fetches or updates any sub-app from that site and runs it.

  6. Our sub-apps may paint to a shared page. They may be relegated to a separate tab or page (depends on user intent).

Revisiting Microkernel story

This is a piece we have written and it runs but we still are not totally happy with it. I'd prefer a third party to own this.

To deliver the user experience we needed some kind of core or kernel that can load up wasm modules, persist them, run them, track them, permission them and so on. This presentation which touches on "WASMCloud" is worth watching: https://www.youtube.com/watch?v=vqBtoPJoQOE .

Currently our kernel can dynamically load and preemptively multitask portable "units of computation" or "services". The major pieces right now are:

  1. A bootstrapping entry point that kicks off a broker and all of the built in services.
  2. Broker service that implements a task registry and a pub/sub for other services using message channels.
  3. Rust services - these are built-in or native capabilities such as user input and display output.
  4. WASM services - these are WASM modules loaded dynamically at the users request and can drive other parts of the system.
  5. Display Service - although not part of the "core" strictly it is an important part of the product (discussed elsewhere).
  6. Desktop Service - discussed elsewhere

We continue to observe that other developers tend to think of wasm modules are just "accessing built in services" - they don't tend to think of these modules as a "computational sea" of modules talking to each other. They tend to anchor their work on an idea of a privileged space outside of the computational soup. We don't have that luxury.

MANIFESTS -> Packages & Modules

This is a new observation. I now see that I actually do not want to run wasm modules only. Rather I want to bootstrap up a pile of pieces and let those pieces drive what they load in turn.

User experiences on the web are typically delivered as a granular collection of layout, code, graphics; usually driven by Javascript, with some emerging WASM support lately. This is somewhat different from "applications" where typically the code is delivered as a blob and then the application may fetch additional pieces, typically graphics, later.

We're seeking to deliver user experiences in an app centric way (although nothing should prevent code from leaning on layout grammars such as HTML). Note that full persistence is not always entirely possible due to larger assets that are dynamically fetched - and that waste space on the client if not used. As well note developers will still want to use lightweight scripting languages as well (not everybody wants to program in Rust or some heavy duty language) so a package doesn't even necessarily have to have any WASM.

A browser typically loads and presents a collection of related files and resources. Traditional web-browsers often fetch a manifest in the form of an index.html document which can either declaratively or procedurally specify additional files to load. As well we see tools like Webpack and Web Bundling. Unity as well has a Unity Packaging System.

For us in an application centric paradigm it feels like the right way to tackle this is to have a similar and flexible manifest approach - which can then drive the loading of scripts and code and other assets. It will be nice to support both javascript and WASM entry points.

Scripting

This is a new observation. I see now that I actually do want lightweight scripting.

We will support javascript for scripting; developers are not forced to do everything in WASM blobs. Some places where this is being done are https://crates.io/crates/scriptit and https://deno.land/ . One question is should javascript be an entry point or should multiple entry points be supported - such as straight to a WASM module?

DSL and Display Rendering Story

This is a new observation. I see that DSLs are mandatory for displays at least.

Now in hindsight actually DSLs (domain specific languages) may actually be needed to be supported formally by the kernel. WebGPU and Makepad both effectively impose their own DSL. It looks like we have to endorse "some approach" - or else how do people paint stuff to a display at all? We need some agreement about "what to say to the GPU to make it paint a box or a line". By imposing any grammar we force all users to use that grammar...

This is a fun conversation that speaks to these issues: https://news.ycombinator.com/item?id=22941224

Display issues exist in a stack. I tend to see display as "just another service" but it's somewhat privileged and there may need to be some core arbitration over this important resource. App developers will want to somehow be able to compose full stacks like so:

  1. Widgets, buttons, user input handlers and high level layout capabilities
  2. 3d models, animation, geometry (load up glb files for example)
  3. Physics
  4. Collision
  5. WebXR abstractions for devices and displays
  6. Lower level rendering in a cross platform way

It is notable how DSL's can be used to throw work "over the fence" between units of computation. This is commonly seen with the use of shader languages to throw intention from the CPU to the GPU. We'll drill down on this more later but it's worth noting that we're using makepad::render right now and it's a good platform independent abstraction layer for us. It provides our widgets and UX and all interactions with the display hardware.

Camera Input Device

This is a summary of the issues we are facing with getting frames from the camera.

It has been a pain. I got it running but it was so much work and it doesn't bode well for writing other device drivers.

Ultimately I had to reach into objective-c from Rust to get the video system on the Apple to give me frames. There are serious issues with performance and bandwidth that remain unexamined.

It's very concerning how mediocre Rust tools and libraries are. Support for the Apple M1 is poor. Support for the Surface Go is poor. Support for webcams on Linux for Windows Subsystem is also poor.

Revisiting "how things bind to each other" (Messaging, IDLs and so on)

We want to let things call other things. This entangles both an idea of messages and an idea of formalizing interfaces, and how things are physically bound to each other.

Ongoing concerns:

  1. [Channels] We currently implement channels between WASM modules. This is "ok" but could be better; and we will need to do better for high performance applications. For example a camera app needs to throw one megabyte buffer to a computer vision algorithm at up to 120fp.

  2. [Dynamic linking] I can see a case where WASM modules may want to link directly against other WASM modules. Do we force the third party developers to do this at their end (fetching the amalgamated blob) or can we offer late-binding at our end? I do see some support for this in wasmtime. It's not critical immediately.

  3. WebIDL / Function endpoints for services. Wasmtime of course offers POSIX, and they speculate about other capabilities. But the entire philosophy of statically known services creates a need for some kind of central organization that reviews what is in the "core" and this will slow down release cycles. The main thing we want to do here is to avoid this "death by committee" pattern. We're looking for a late binding pattern that still allows fast passing of traffic between modules. Notably all services, especially devices, are threads, so any static function interface to capabilities (such as "open a display window" or "play a sound" or "save a file" will on the back end have to lock a mutex).

There's a lot of thinking about this - ranging from the very simple "How to communicate between Rust and WASM" to the more abstract "How to run WASM modules together in the cloud". Here are a few good links:

  1. https://alexene.dev/2020/08/17/webassembly-without-the-browser-part-1.html
  2. https://www.youtube.com/watch?v=vqBtoPJoQOE
  3. https://docs.wasmtime.dev/examples-rust-hello-world.html
  4. https://docs.wasmtime.dev/examples-rust-wasi.html
  5. https://docs.wasmtime.dev/examples-rust-multi-value.html
  6. https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface/
  7. https://labs.imaginea.com/talk-the-nuts-and-bolts-of-webassembly/
  8. https://kevinhoffman.medium.com/introducing-wapc-dc9d8b0c2223
  9. https://github.com/wasmCloud/wascap
  10. https://github.com/wasmCloud
  11. https://www.ralphminderhoud.com/blog/rust-ffi-wrong-way/
  12. https://doc.rust-lang.org/nomicon/ffi.html
  13. https://www.youtube.com/watch?v=B8a01m8B6LU
  14. https://rise.cs.berkeley.edu/projects/erdos/
  15. https://www.w3.org/2018/12/games-workshop/slides/08-web-idl-bindings.pdf

Desktop UX Thoughts

One of the embedded apps or services that we write is the desktop itself. It's basically "just another app". The desktop implements our browser UX. It's going to deliver services touched on above:

  1. Paints a URL input bar and lets the user type in an URL for a document (this currently is running).
  2. Load and run a WASM module on demand (also runs).
  3. Provide a nice UX for reviewing all fetched modules and set fine grained permissions as well as start/stop them.
  4. Support both a mixed-reality mode where apps can render to a shared view and as well layers to force apps to separate windows.
  5. Making sure to have an un-blockable area to prevent bad actors from obscuring the user display.

Friendfinder App Idea

I'm starting to think about richer user apps. We could have a real time friend recognizer using the code I have written already. This would be a WASM module service intended to be a demo to exercise computer vision and the system as a whole. This user loaded app fetches frames from the camera and then segments out faces and then paints the faces onto the display. It's meant to be more of a real world test of what this browser can be used for. This is what I am building towards in my test app currently.

Questions that are emerging

  1. We want to leverage third party tools, conventions and practices - but it is not entirely clear how to do so. For example what's the right way to be able to use capabilities such as WebRTC? What does it even mean to use the WebRTC.rs crate in a native rust app? Like; how would one use these capabilities, what would it mean? Would it provide a way to access a local webcam camera for example? How is it bound in Servo? Can it be used outside of a browser? How does this work in a multi-threaded environment?

  2. Should external capabilities be exposed to WASM modules using a WebIDL or WASI pattern of statically linked and pre-defined function calls? This runs contrary to the hope to support late binding new devices with novel capabilities (if all devices have to go through a kind of committee review process it could take years for new capabilities to emerge at the interface level - it would be better if novel device drivers could simply be dynamically bound and then messaged using some kind of messaging architecture).

  3. Is there a role for Javascript to be a kind of scripting/messaging glue that allows communication between WASM modules and as well drives WASM modules? Or can WASM modules avoid resorting to Javascript as a glue layer?

  4. What are the common patterns for how units of computation talk to each other in say a framework such as Servo?

  5. If we compare and contrast an engine such as Lumberyard or Unity or even Bevy against an engine such as Servo - what pressures are encouraging people to develop 3d apps on the web rather than say directly on dedicated high performance 3D engines?

  6. What are the right abstractions to underlying hardware services such as the GPU? Should we be using something like Vulcan as a grammar or should we provide a higher level DSL? If we provide our own DSL will anybody use it?