CUDA - makingthematrix/gailibrary GitHub Wiki

This is something I had in mind from the very beginning, but it's a complex subject. I want to try implementing it only when the rest of GAI is complete.

CUDA is a platform enabling us to use nVidia cards for general-purpose parallel computing. It uses a different approach than when we run concurrent code on CPU. Greatly simplifying things, it's something like that: In our standard approach we run many functions at once, and these functions may access the same data (which, hopefully, is constant). In the CUDA approach, we run one function at once only, but on many data structures. The data structures differ in their values, so the results of the computations will be different for each of them, but the algorithm is one and the same.

In order to use it in GAI, the main loop has to be reorganized. Instead of processing cells sequentially, and running functions in parallel, we will run functions in sequence, but on all cells at once. It also means that lazy vals are forbidden: all values used by a function have to be computed already, so functions tagged as lazy vals are run before the main functions, even if in the end they turn out not to be necessary. Temporary variables can be used instead. And still, some of the functions can't be run in the CUDA mode: eg. those that update the graphs. In the example with two soldiers attacking the player a part of their decision-making process was to tag the node they wanted to go to. Because we processed the cells in sequence, one of them did that first, and the other had to choose another node. It wouldn't work in the CUDA approach, because the function which does the tagging would run on both cells in the same time. So, the programmer would have to either think out something else, or set some sort of a "non-CUDA-compatible" flag for this function and then it would be run either before, or after the CUDA-compatible ones.

And even if the programmer manages to do it, she would probably still want to code another set of functions, used by GAI on computers without CUDA, because the functions prepared for CUDA will not be optimal on them. For example, it would be better to have some functions as lazy vals, because sometimes they would not have to be called at all.

So why to use this approach? Yes, for an FPS game where usually the player interacts with only a handful of NPCs at the time it might be a total waste of time. But consider games like Total War, or Starcraft. With CUDA every individual NPC on the scene could have its own cell in GAI - something that without CUDA would be too slow. Without CUDA the programmer would have to resort to grouping NPCs into units, and decisions would be made for the whole unit together, as if it had one mind. All soldiers in the unit attack at once. All of them retreat at once. During a maneuver they perform the same action, etc. With CUDA reactions may differ from one NPC to another even if there are hundreds of them on the battlefield. It makes the whole experience much more natural and may result in interesting emergent behaviours - not planned by the programmer, but still reasonable and intriguing from the player's point of view.