20210523 DEV M2 R5 Thoughts - orbitalfoundation/wiki GitHub Wiki

M2 R5

May 20th General Thoughts

We shipped M1 (a microkernel with a trivial app on top of it - in this case a camera that did some face segmentation). Now we are thinking about M2 goals - which is to deliver an experience that lets users participate, even novices. I am leaning towards a buildy-block-author-edit game like 3d world. Something crude but pretty that helps exercise the system.

One technical concern is I'd like to support WebXR and high level powers that we see in Servo. I'd like to have Javascript, WebGPU and so on. Can I leverage power tools like say Bevy3D or Piston or parts of Servo? This system starts to look a bit like a game engine itself; especially if I want to deliver a toy authoring environment which presumably has a 3d/networking/physics/authoring. So - how and where do I tie into third party tools? How easy can I make that?

Also I have better insight on "browsers" as a whole and what they do, their strengths and weaknesses. A browser is an agreement between many parties about how to interact, what is right or wrong, what is technically acceptable, what is socially acceptable. I am starting to see that there is a large ecosystem of distributed participants - that a governance or foundation structure are needed - not just a piece of code.

Reviewing what we accomplished

In our last iteration I was primarily focused on a pre-emptive multithreading kernel with a focus on inter-thread messaging based on top of WASM modules (with help from [email protected] ). As a use case test I added support for my macbooks webcam, and then bound that to a face-tracker. This was a pain; I had to bind rust to the libraries that apple provides - basically binding to objective-c and it was not well documented. This pushes me now to want to focus more on a purely virtual and more lightweight user participatory experience for this next iteration. I also feel like exercising a user story around authoring of rich content for novice users. Furthermore I recognize now that a browser is more than code, it is policies and ethics and will require more of a foundation approach. Therefore overall the next iteration should articulate both a "user story" and a "ecosystem story" and focus less on a "low level technical foundation".

PRESENT: A fresher / healthier understanding of the Web - that it is not just technical but also social, political and governance based:

"Web Browsing" as the act exists today has several subtle features that provide value for users in the west. Browser vendors and Standards Bodies work together. Like any industry there are highly technical and ongoing conversations, RFCs, conferences and so on that bring together hardware developers, software developers, consumer advocates and private interests to push forward browser technology as a whole.

Here is a sketch of the general state of affairs today:

  1. OSI layers -> formalizing and defining transport up to protocol (HTTP/DNS) critical for allowing open participation and interop.

  2. DNS specifically -> a single namespace turns out to be the absolutely critical invention for an open service that is not owned by one party (well other than ICANN).

  3. W3C, ISO, ANSI, IETF, EMCA -> Standards bodies that formalize standards such as HTML scene graph; critical for an open web. Notably there are new standards that escaped the control of these bodies (because these bodies are far too slow moving) such Bittorrent, Coil and Handshake /.

  4. Law / Ethics -> bodies such as Mozilla.org and EFF.org focused on a "healthy" Internet. This is different from top down censorship such as Facebook has to impose; basically communities need to self-police. It's worth looking at [REBOOT](https://thereboot.com/) for conversations here as well.

  5. Browsers -> Google, Apple, Mozilla, Microsoft, Puma, Brave.

  6. Content -> Blogs, Websites and now increasingly Facebook, Twitter, Instagram, Youtube and other large curated content providers.

  7. Tools -> A medley of various tools for authoring content and allowing novices to participate. We do see a concentration of tooling in "mega-apps" at the moment (Facebook) but it feels like the future is less about planetary scale apps, and more towards smaller "group apps" for specific groups of people to interact together; that move people together as a group, curate their experience; basically like games.

NEAR TERM FUTURE: Technical, Social, Political concerns of Browsers:

A "mature" web ecosystem continues to grow from today; probably coming back to center to dominate the world again IF it supports stronger models of computation.

Right now the web is a fragile service that is adjacent to walled gardens, and walled gardens dominate. As well the web primarily exists in the west, not in Asia. And there is also some concern that for the next billion users, as the web saturates third world countries that it will end up colonizing minds with the bulk of its existing content, use patterns and philosophies rather than being a fresh place where those communities can make it their own.

But I see pressures pushing it back to center - basically the need to be able to have conversations across the planet. If somebody in China cannot run an app that somebody in America wrote - there will be pressure to fix that. If somebody in America cannot pay somebody in China online for a story, app or experience, there will be pressure to fix that.

In my mind a "mature" web will therefore return to being a dominant force in the world, that is a common platform for human interaction.

However browser Vendors and Standards Bodies will to take a larger systems perspective that they do today, or they will be pushed aside by an uncontrolled and wild energy of the masses. Organizations today are fixated on protecting users from harm rather than on ecosystem health. An ecosystem health perspective is one that focuses on making sure artists get paid - not bubble wrapping users from any risk. Specifically the web will support many payment schemes as top tier market players lose dominance on payments. Either standards bodies will be a part of that or they will be pushed aside.

Apple (at the moment) can be called out specifically and uniquely as a runaway dominant platform provider and has an especially strong role in crippling the web ecosystem. Apple limits other competitors such as blocking user tracking in Facebook, and also limits competitors such as Epic. Apple also doesn't support micropayments in the browser. This all directs revenue to the Apple native walled garden. This dominance is likely to crumble under legal pressures and pressures from other vendors.

COMPUTATIONAL SEAS

An over-arching macrotrend driving the future, and all of us, is the rise of computation. Browsers are valuable insofar as they permit computation at the edge. Rendering text (or any specific user experience) is not important.

Probably the most significant validation of the thinking here is observing the rise of cell tower compute; basically a form of edge computing that brings computation "closer" to consumers in geo-regional islands. In the past we had this pattern:

ON DEVICE COMPUTE <------------------------> CLOUD COMPUTE

Now we are seeing this emerging:

ON DEVICE COMPUTE <----> EDGE COMPUTE <----> CLOUD COMPUTE

Computation will want to move as close to the users as possible for minimum latency. There may be a rise in geo-local or regional group applications as well; a new category of applications that are focused on geo-regional issues such as up to date SLAM maps for vehicle navigation, parking, package delivery, georegional games, consumer data caching, Strada like thin client gaming experiences, AI and so on where latency is critical. There will likely be many different vendors of edge compute hardware, Verizon, AT&T and other telecom providers. This edge compute hardware will want to dynamically move compute from cell tower to cell tower, or down to device, or up to remote clouds, and it will move from AMD to Intel to Apple hardware. There will be significant pressure to formally define permissions, throttling in a portable way. Docker will likely be used, but also there will be pressure for lighter weight threadlets that are focused on tiny tasks.

This applies to us because we want to be attached to those computational seas. We want to be able to have a cloud app move to a consumer device, run locally for lowest latency, and then move back to the edge or to the cloud based on load balancing and performance requirements.

COMPUTER VISION

Coming out of discussions with Gary Bradski @ OpenCV it looks like we may want to sketch out (in our examples) how we think a browser will store and share and persist context aware information about the world that it collects; how other modules running on that browser are permissioned to access and use it, what questions they can ask, and how we notify the users about those permissions.

In our conversations we looked a lot at near future farming; where individual farmers are modifying their field robots, teaching them to recognize new plants, new diseases, new phenomena; publishing that to their robots and their friends robots, and being dynamically responsive to granular issues on the ground in their geography.

Also, in other conversations I've been hearing a lot about delivery services, geo games, local ambient situational awareness in real time, real time 3d meshes of neighborhoods. This also looks like it is creating both a CV pressure and a low latency computation pressure.

Basically it looks like it is finally time for computer vision to be mature in the browser. This means that browsers will sense the world and provide context awareness. Future browsers are more like car computing platforms specifically - supporting realtime sensing, running multiple competing apps, digesting and performing semantic analysis, segmentation and partitioning. Acting as a user advocate, agent and proxy. It will not be possible to navigate future urban landscapes without a browser; the world will be too dangerous, and too invisible for unassisted humans to navigate. Street signage and lighting will go away and safe zones will be specified for un-augmented people within the decade. We already see that it is extremely difficult to navigate the modern freeway system without a device; a lyft of uber driver for example cannot acquire "the knowledge" fast enough to be an effective driver.

THINGS A FUTURE BROWSER COULD TACKLE (That we do not have bandwidth to do ourselves):

  1. Future Ethics -> See XRSI (https://xrsi.org). We have to think about protecting users in XR, in sensitive and vulnerable real world spaces, and we also have to think about the next billion users and how not to just end up colonizing their minds with our own noise. But as well, we cannot be the gate-keepers. People need to be allowed to make their own mistakes; and to be able to be more than just safely protected in our titanium bubblewrap; from what we think of as risks. This means identifying the stakeholders (extremely hard), making them a part of decision making bodies (also hard), and rotating the decision makers to prevent concentrations of power by the incumbents (also hard).

  2. Decentralized Storage, Computation, Identity -> Another critical trend is to encapsulate new network models such as web3, public ledgers, IPFS and so on. Identity will not be managed entirely by service providers but will be ultimately signed with keys that users own. It's worth looking at SOLID as well here.

  3. App Discovery, Certification -> There will need to emerge app scoring, categorization services that help filter noise. For example in an elder-care scenario a caretaker may wish to share a series of interactive experiences that are deemed appropriate, perhaps calming, not too intense; and they will want to be able to quickly discover and filter for a safe set of experiences as validated by an extended trust graph of peers.

  4. Trust Graphs -> There will need to emerge a richer capability for scoring and evaluating content and content emitters to help prevent spam and bad actors from injecting noise into social networks.

  5. Payments -> A mature Internet lets people get paid. It takes an ecosystem view. This has been a complete disaster on the part of the existing web and it's the main reason why the web is going to die if we don't fix it. Web Payments API is not good enough. We must support microtransactions.

WHAT WE WANT TO DO FOR M2

I want to focus on a user facing app for this next iteration. I am thinking of a single experience but one that lets multiple users participate. I will probably do a mish mash of these ideas:

  1. BLOX. I wrote https://github.com/anselm/blox a few years ago. A very lightweight script driven interactive tool could exercise many parts of an architecture and also be participatory for users.

  2. SHARED SPACE. A simple multiplayer shared space also would be a good exercise. Pick characters, walk around a virtual space. This would help answer questions and would be fun.

  3. CAMERA FILTERS. It could be fun to just let users apply dynamic filters to a camera especially if there was time of flight (painterly, skeleton segmentation, bokeh).

NEXT STEPS PRAGMATICALLY: WHAT TO CREATE / CODE / WRITE / BUILD

I see the next rev as consisting of these pieces:

  1. Microkernel more or less as exists? I am still not 100% pleased with this. There's still argument to switch to a third party system such as fastly or DENO or https://github.com/faasm/faasm . Existing systems have a conceptual design defect where they presume that the WASM blobs are primarily clients of an array of impressive external services; rather than as part of an ecosystem of services that talk to each other. Inter module messaging is emaciated, while messaging through a security layer to some magical undefined external set of services is beefed out.

  2. Dynamic Binding. May try add more robust formal signature based invocations between modules. Possibly even dynamic binding with native function calls being exposed to modules. Basically I don't have some magical all powerful "system" layer that modules talk to - modules only talk to each other. See: https://dl.acm.org/doi/abs/10.1145/3412841.3442045 for example.

  3. Javascript. I want javascript glue. I don't want to write fun stuff in Rust - I just want to play and mess around. So I need to include JS... this also leans back more to Servo.

  4. WebXR. I need to figure out how to include the WebXR stack. The question then becomes should I just include Servo?

  5. WebGPU. Another Servo related technology needs including... I think WebGPU is the right way to talk to the bottom end of the graphics hardware. The question then is do I bypass Makepad or do I have multiple display engines? (Makepad is currently my display engine). I am leaning towards multiple, and then talking through Makepad once they tack in WebGPU. See https://github.com/gfx-rs/wgpu-rs

  6. Scene State. While I had hoped to avoid any concept of a "scene" at the kernel level I am now leaning towards a service you can interrogate for semantic discovery of shared inter-app scene information - floors, walls and so on. I think this will be a module that other modules can register state with. It will probably be a retained scene graph and it will probably be in the display module. Effectively this acts like any ordinary scene graph except it may not have some of the fancier stuff such as behaviors. The role of a scene-graph here being not merely organizing state for human developer comprehension but also being a source of truth about the world and what is in it. Cameras can sense the world and populate this database and then other applications can rely on this state to add their own artifacts. There may be high level semantic intent expression as well - where applications can express what they "want" at a high level such as "I am a painting and I want to be parallel to a nearby unoccupied wall at eye level" or "I am an avatar and I want to start out on the ground near a launch point in an unoccupied area marked as walkable". I see this scene graph as containing typical concepts such as geometry, lights, cameras, colliders, physics constraints. Richer capabilities such as particle effects, rich behaviors may be worth exercising as separate modules. I think an ECS pattern such as Bevy3D uses may be useful.

  7. UX Toolkits. May not do. I want something like https://docs.microsoft.com/en-us/windows/mixed-reality/mrtk-unity/ and possibly https://flutter.dev . Rust is a terrible disaster as far as UX today however: - https://dev.to/davidedelpapa/rust-gui-introduction-a-k-a-the-state-of-rust-gui-libraries-as-of-january-2021-40gl

  8. Multiplayer. I think I will include multiplayer support "in the box". If I am providing a scene graph then I may as well provide physics and networking of state as a given. This does start to take on game like capabilities at the core.

  9. UX / MISSION STATEMENTS AND BOARD. The mission statement for the project needs to include forward facing ideas around a certification board (some group of people to certify content), trust graphs, ethics and payments. Not necessarily things we build "today" but that we want to set aside space for. This starts to look more like a foundation mandate. Also some pieces may be outside our scope (separate foundations) but I do want to set the "tone" of our work.

  10. Manifests -> We have to define not just a single wasm blob runner but an idea of an "application" consisting of many parts. These are also going to tie into package signing and certification and package discovery. Basically we need to make sure the ecosystem of tools is "safe" for adoption and use.

  11. Computer Vision Context Aware Computing Database -> We need some kind of service locally that can store world state - and other modules want to be able to interrogate it. This may be part of the display module scene graph idea.

⚠️ **GitHub.com Fallback** ⚠️