Notes - ddrago/Bose-Frames-Audio-AR Wiki

This section is intended to act as a journal of my progress during the project. It contains both problems encountered, solutions found and general thoughts.


Until now and still I can't get to have the Unity Bose Frames SDK work, so instead I am using the source code of this guy online that open sourced his stuff. Now I can access motion sensor data and gesture data.


Attempting to simulate gaze-tracking-controlled mouse cursor. In so far, with 2 hours in, we have a bit of progress. I realized thanks to a paper that we don't need to use the variable "roll" and that there is a handy function to replicate gaze tracking from head tracking. I still have to test it though. Furthermore, I am now testing just normal head tracking, in particular with just yaw, and I am using various formulas:

However, here I encountered a problem! Where the Frames place the 0 degree of yaw does not seem to be consistent! More experimenting needs to be conducted to understand if there is a criteria and whether I can somehow manually adjust it just like in the Unity Demo.

Extra problem. Apparently at least the yaw continuously increases over time despite the device not moving at all.

A new question came to my mind: throughout this time I assumed the ui to be egocentric. But how do we make the UI be smart and know where the body of the user is oriented towards? Because if we do not keep track of that, the egocentric display will either be arbitrarily placed at a random angle from the user's orientation or be surrounding the user, "wrapping" around them. Either option would be rather problematic were the user to use the UI while walking somewhere! How do we solve such an issue? TODO: check whether there are solutions to smartly re-orient the device as the user moves. TODO2: develop a conversational interface in the meanwhile, since this would not rely on user movement/orientation at all.


Meeting with Steve was really helpful. He made me notice how the current prototype and idea I was working on wasn't at all a failure! I assumed that, because you move around the city, having an exocentric/egocentric menu that the user can navigate through with via head-tracking (where the menu is at a fixed distance from the user but at a variable angle) would be useless, as, moving around the city, you also change the direction of your head, making head-tracking inconsistent. However, he argued that the interface might be recalibrated via a gesture, and that the interface might communicate its position to the user via the use of a 3D positioned-sound. If the menu is positioned in a 180 deg range quadrant of the user's 360 space, then the user might hear the menu-locator noise behind them if they were to head in the opposite direction the menu is currently orientate towards. Then, if the user were to act a particular gesture, the menu might reposition itself in front of the user. This is a great idea, and it may be realized using Google's Resonance SDK. Though I still need to test the latter to see if it works.

Furthermore, he approved my idea to focus my project on a particular goal: Develop 3 possible UIs: A Head-tracking-based one, a Conversational Interface-based one and a Gesture-based one. Then, test them and compare them, to highlight the possible advantages of each.

Finally, he recommended me to focus on both literature review and prototype development at the same time, rather than just doing the prototype as I have been.


During the week I have developed a functional HTI where the user can select a different action with a gesture based on the part of the screen they are looking at. This is a significant step forward, and I only have few tasks to complete in order to be able to functionally have a complete prototype. However, this depends on the project I want to tackle. If I want to compare menu types (i.e. HTI, CI, GBI), I need to still complete prototypes of the latter two.

A meeting today with Steve did make the situation potentially easier however. He told me that I could potentially focus my project on solving the following question: What is the most appropriate menu item and menu shape for a HTI? How should the menu items of a menu be arranged (Cross, Pie, Half-pie, screen, row, column, etc) to improve UX? How many items can we then fit/how big can items be?

Either way, now that I have the input of the HTI essentially solved, I need to focus on the output, which involves using the Resonance SDK.

In the meeting with Steve he also proposed a way to tackle recalibration of the device. In particular, he gave me a Google Pixel 4 android phone with which I can access the phone directional data. With this, Steve suggested, we might be able to understand/infer the current direction of the user without having to rely on the frames! These in fact are already using their sensor data to express the "cursor" through which the user navigates the menu/UI. By using the phone's directional data that way we can have a truly ego-centric UI that does not need to be recalibrated every time the user takes a turn! In summary: the frames' directional data would be used as a "Cursor" to navigate menu items with and the phone's directional data would be used to estimate the user's 3D current direction.


For the past weeks I have not been able to do much to be honest. However, I did briefly try to get Resonance to work and that failed. It was weird since it did not seems to be giving me any actual error, it's just that the audio did not play, does not matter the audio output. If Resonance does not work I am a bit fucked, since it is my only way that I know of for me to implement 3D audio and/or augmented reality. I did spend a relatively low amount of time though, and only tried on the web rather than on the android apps which it can also synergize with. More attempts to come.


Did almost nothing during the exam and coursework season. Only now started to play with Resonance again and currently Web Audio, as I am writing this. I still have not managed to get Resonance to work but Web Audio does indeed make a sound in the webpage! Chrome has a rather annoying (but useful, I guess) rule that it will not start a sound without a user interaction associated to it. So instead I performed this experiment on Mozilla. This is in and of itself is a very good start, as all the effects of Augmented audio I can ultimately just roughly replicate were I not to get Resonance and Unity to work!


Have been trying for the past week to get Web Audio to work. I'm not even stuck at the spatialization step, but even just at playing a single tune I get stopped. I get a cross-origin error when loading my file. This is true for both FireFox and Chrome. Apparently this happens when you try to access local files when on a local C:// page, which is the case. I now have to try to make an actual fucking website which I am not thrilled about, and is going to be its whole own can of worms. This rabbit hole keeps getting deeper and deeper. I just want to play a sound on a webpage.


Took me a while but I did manage to make a localhost webapp on node.js. So now I just have to load the wav file in order to see if I can play sounds and solve the Cross-origin error problem.

Although I still was not able to play an audio file, I can now play a synthetically generated sine wave sound from the web app. This is something at the very least. So far, Unity has failed, Resonance has failed, so now, since web audio has failed too, I am really not sure of what to do. I still have the chance to follow this one Web Audio tutorial on spatialization online but I am backed into a corner really. What do I do if I don't get to work either?


Synthesys of the problems I have with using Unity for the Bose_frames_sdk. The main problem is that the demos do not work, nor can I get them to work. In particular, it does not seem to connect to the position and gesture sensors. I am currently trying to edit the Wearable Control object to utilize one of the aforementioned providers as the Editor Default Provider as instructed in the documentation but I am not sure what to modify.

During Mark's meeting: Should keep in mind that there is a significant latency of around 200ms due to the bluetooth that other VR devices would not have. As such it can be optimal to just use the USB connector