Week 10 Diary - MoradEnCours/VR-AI-Driven-Therapy GitHub Wiki
Progress So Far
I have been thinking about looking into feedback modals and how they might affect peoples' perceptions when using this kind of system, as well as tried to think more about what research questions could be.
For my update so far, I have thought more about the delay in response time ever since you brought up the possible idea of the therapist looking like it's taking notes during delays.
Actually, it made think of a previous experiment which I helped Shaun with as a participant where he looked into different emotional modalities for a social robot called Qoobo. For this experiment, I thought: 'what about possibly focusing on the modal responses used while people are waiting for the therapist to complete its response?'
Since currently there isn't more convenient technology available that allows lightning fast responses from a generative kind of virtual agent, it could be interesting to look into users' perceptions on latency and see what feedback they prefer to have while waiting.
So for example:
Default condition (no feedback modals)** - No UI* / no sequence animation / no pre-recorded audio is provided to the user Condition A - UI shown / no sequence animation / no pre-recorded audio Condition B - UI shown / sequence animation shown / no pre-recorded audio Condition D - No UI / sequence animation shown / no pre-recorded audio . . . etc until Condition Last (all feedback modals) - UI shown / sequence animation shown / pre-recorded audio played
*UI indicator = for instance showing ChatGPT text which usually appears before the audio or having a spinner loading icon
**(I'm curious if just being in a virtual environment and being able to look around while having a VA to talk to is what some mainly might be concerned about so they might actually not even care about the virtual agent having a body)
One RQ then could be: "Do people consider the feedback given during the latency to have been useful/helpful or possibly the reverse?"
If they found it useful, another RQ could be: "What is their preference of modal ranked from best to worst?"
I'm not sure if this RQ and the above fall under the same banner but another question might be: "What other modals might people suggest could be useful in VR when interacting with a virtual agent?" (I don't know what keywords to look up to find it there's something done like this before).
One more RQ could try to evaluate peoples' trust in using a system that is connected using these sorts of APIs: "How much trust and how comfortable do people feel with using such a system? What sort of use cases / therapies would they maybe use or avoid using this kind of system for?"
That's what I have been thinking about since the last meeting. I'd appreciate any guidance, let me know what you think !
Next Steps
- Look into a research paper which Mathieu suggested
Questions or Blockers
Meeting Notes
A study investigating the perception of latency in agent feedback sounds very interesting.
A first reaction I have is that you have essentially a 2x2x2 experiment (so technically 8 conditions). This is quite a lot and therefore a between-subjects design might be too much (you would need too many subjects). At the same time, a within-subjects design (where every subject sees all or many conditions) poses the question of habituation and order effects (e.g. seeing a feedback UI first then no-UI might be odd).
Another comment I have is that if that is indeed the research question, then do we specifically need a virtual therapy setting? It might introduce some specific expectations. At the same time it gives a very concrete context which may be a good idea.
Finally, regardless of what you choose, ideally a little bit of focused literature review on that question might be useful. You should at least read up a little bit on the so-called “incremental” dialogue architectures (e.g. Paetzel, Maike, Ramesh Manuvinakurike, and David DeVault. "“So, which one is it?” The effect of alternative incremental architectures in a high-performance game-playing agent." Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2015.). Those attempt to resolve the issues in dialogue generation delay by processing user speech as soon as it is produced, compared to traditional dialogue management (like in LLMs) where the default is to wait until the user is completely finished to start processing their speech. What measures did authors use to evaluate the usefulness of incremental dialogue management? Your method probably will never be as satisfactory (it isn’t being very reactive in dialogue), but would your UI/animation/pre-recorded audio methods be sufficient to satisfy users’ expectations?