Sprint 35 - Adam-Poppenheimer/Civ-Clone GitHub Wiki
Goals
- A startup time for a Tiny map at max player count of no more than 45 seconds.
- A startup time for a Small map at max player count of no more than 45 seconds.
- A startup time for a Standard map at max player count of no more than 45 seconds.
- A startup time for a Large map at max player count of no more than 45 seconds.
- A startup time for a Huge map at max player count of no more than 45 seconds.
Risks and Mitigation
- Map generation is a major and CPU-intensive expense in the current implementation. Ostensibly multi-threading would help, but map generation wasn't built to handle multi-threading at all. Trying to parallelize that process could create huge problems, or simply eat up hours and hours of time.
- I don't know if there's a clear way to resolve this problem, if it even is a problem. I do know that map generation isn't in a brilliant place right now. It overproduces mountains, underproduces hills, and doesn't seem to handle arctic regions very well. It might be sensible to take another pass at the whole architecture of map generation, improving it not just for multithreading, but for other qualities, as well. Regardless, I'll need to take a careful look at it before I come to any conclusions. And I should avoid the instinct to immediately re-architect, since that's been a major problem in the past.
- It's been so long since I attended to map generation that I can't really remember how it works. That makes it much more likely that optimization attempts will break the system severely.
- Again, I'll need to take a close look at map generation and remind myself of how it functions. If it proves too obtuse, it might be a candidate for redesign.
- I've already done a lot to reduce memory requirements for my maps, and I suspect there's not a lot more that I can remove. If larger map sizes are too memory-hungry, it's not clear what else I can do to optimize.
- I know for a fact that there are a lot of redundant pieces of data hanging around. HexMeshes seem to instantiate unnecessary submeshes even when they have no assigned vertices. That might carry some expense with it. I do a lot of material instantiation, which might be memory-intensive. It probably makes the graphics card run slower, at least. I've no doubt that memory leak is also an issue, so perhaps I could save some disk space by fixing that. There might also be redundant components on terrains, chunks, and HexMeshes I can throw away. And there's always the possibility of tweaking the bake texture's resolution to save some space. I don't think I can do without the pair of bake textures, but if I can get away with just one that'll substantially decrease memory usage. Profiling the program will be useful here, I'm sure.
- Culling across my various render passes has proven quite expensive, perhaps because I'm spending a bunch of time culling chunks that couldn't possibly contribute to a particular render pass. I can think in the abstract of ways to speed up this culling process, but I'm not sure I can do it given the current way chunks are refreshed.
- Whether and how I can save time on culling depends on what, exactly, Unity is doing when it culls geometry. For instance, is it intelligent enough to toss aside entire objects whose AABB is outside of the camera's frustrum? If it's not, I could gain some speed by manually enabling the terrains I require, disabling them after I'm done. Then once the map's finished loading I can turn all my chunks back on (or the ones that are visible, at least). Another problem might be the technique I'm using to occlude water when doing terrain baking. Baking without rendering the terrain's geometry could save some time. I might be able to streamline the baking process further by being more careful about land and water.
Review
Unfortunately I waited a week between the end of this sprint and the beginning of its review process, that its events are not fresh in my mind.
This sprint ended up being somewhat unfocused, though I did make some meaningful progress. Partway through the sprint it occurred to me that framerate is probably a bigger issue than map loading time, especially given how poor the framerate is currently. And so I started drifting into performance more generally.
Before I did this, however, I continued to attend to setup times. I noticed that HexGrid.AttachChunksToCells was running incredibly slowly, taking 9-10 seconds on a small map and causing a huge amount of heap allocations. This turned out to be a dominant factor in the cost of map generation. This arose from a the naive method by which I checked for chunk/cell overlap. In an hour or two I managed to come up with a solution that much more carefully and intelligently paired chunks with cells. This ended up completely removing AttachChunksToCells as a performance bottleneck and substantially reduced the burden of map generation in general.
After that, I spent 2 1/2 days working on better chunk refreshing. I noticed that my program was spending a tremendous amount of time waiting. Semaphore.WaitForSignal and other locking methods were a consistently large contribution to the cost of loading chunks. I figured this was caused by CPU/GPU locking and looked for ways to prevent it.
This turned out to be quite a task. I started out by modifying OrientationBaker so that multiple chunks could be refreshed at the same time. This required creating OrientationSubBakers that served the needs of individual chunks, and also required chunks to wait for their assigned SubBaker to finish before attempting to create their heightmap tasks. I also made use of AsyncGPUReadback to avoid hangs from things like ReadPixels().
I then spent some time applying the new OrientationBaker paradigm to TerrainBaker so that multiple bake textures could be processed at once. That required a little bit more complexity because I needed to support multiple texture sizes, but ended up being fairly straightforward.
After that, I noticed that TerrainBaker had a ton of unnecessary culling tasks to perform because it was trying to render all of the bake contributions from all of the chunks on the map. So I used a simple layering technique to reduce the cost of culling considerably.
I ended the multithreading improvements by adding multithreaded alphamap calculations. This didn't take long, and it's not clear how useful it'll be given the relative ease of alphamap calculations. But hopefully it helps.
By the time I'd done all this, my maps were loading much faster. It's hard to say yet if I've achieved my performance goals, since things are still changing often, but I definitely got Small maps loading in under 45 seconds and Standard maps around that time.
After that, I started turning to the framerate problem, which had become the dominant problem with the codebase. I quite quickly discovered that culling was a huge expense, probably because I was trying to render every single terrain on the map all at once. With a bit of math I created an system that efficiently figures out which chunks are in the camera's frustum and which aren't. It enables everything visible and disables everything invisible. This helped considerably with culling.
During this time, I started to realize that updating terrain materials was a huge per-frame expense. I spent a bunch of time trying to figure out why that is and came to a number of conclusions. First of all, the number of textures painted onto a given terrain have a huge impact on draw calls. Every 4 textures demands an additional draw call for every single vertex on the terrain. I already knew this, but I didn't know that was inducing material updates. So I reduced the number of textures in my terrains to 8 and improved performance a bit. I'd imagine I can reduce this cost even further by handing terrains only the textures they need for the structures they have, but that's future work.
I also learned that there's a significant material-configuring overhead tied to the number of terrains. Apparently every new terrain needs to update all its splat materials, plus probably a bunch of other things, and that takes a lot of time. So I decided to double the dimensions of my terrain (and its supporting assets). That considerably improved per-frame performance, though I don't have a good grasp on how it might've affected other parts of the codebase.
After the above work, I ended up with a much more performant codebase that's starting to narrow in on my performance targets. Unfortunately, map creation is now so complex and spread across so many frames that it's very difficult to figure out what's going on. I'll need to make major organizational changes if I want to get Huge maps under control. And towards the end of the sprint I was finding myself at a loss as to what I should be doing. The sprint often lacked focus and discipline, and while I made a lot of progress, I also wasted a lot of time from being confused and without guidance.
Retrospective
What went well?
- I managed to use the frame debugger to successfully diagnose a complex and important problem. And in the process I've learned a decent amount about terrains and how they tend to function.
- I've substantially improved the startup and per-frame efficiency of my codebase.
- I successfully navigated through a fairly extensive and complicated CPU/GPU locking problem, which feels like a pretty significant multi-threading task. My ability to address it implies I'm learning much about how to manage both pieces of hardware simultaneously.
What could be improved?
- I'm having a hard time dividing performance goals into self-contained tasks, which is causing focus and morale problems. I need to find a better task paradigm than the one I've been using. Perhaps I could use hypothesis-based issues? Present a suggestion that may or may not work, and resolve the task when the suggestion's been implemented. That might help.
- I think I've lost focus on the long-term planning side of things. It's no longer clear if quick Huge map loading times are particularly important in the face of everything I need to do. Hell, I don't even have a clear sense of what still needs to get done. I should probably spend some time getting my long-term, big-picture priorities in line so that future sprints will be more fruitful.
- I bet I can reduce the number of draw calls by handing Terrains only the textures they need. Doing so will complexify alphamap calculations considerably, but I think it'll be worth it if I can get some chunks (especially water-heavy ones) rendering in half the draw count.
- It occurs to me that I can probably change the heightmap resolution of different chunks based on the complexity of their features. Chunks that have only flatland and water don't need as much heightmap resolution as those with hills and mountains, which probably don't need as much as those with rivers. Reducing heightmap resolution will substantially reduce the cost of calculating heightmaps and rendering the resulting terrain, which would be a major win.
- My map generation and rendering process is now so complex and spread across so many frames that it's become almost impossible to profile. I should consider pulling chunk refreshing away from coroutines and towards a dedicated class that can perform all the rendering calculations in a single frame if desired. I could probably take the IEnumerators produced for the coroutines and run through them manually. Or I could figure out how to aggregate contributions across multiple frames, perhaps with a new profiling tool. I just need to aggregate all the work so I can wrap my head around it.