08 ‐ Archives - DaymareOn/hdtSMP64 GitHub Wiki

The User Guide Changelog before the wiki

1.0.0: Initial version 1.1.0: Added

  1. the console commands,
  2. the section on what FSMP does and doesn’t do,
  3. the “About CUDA” section,
  4. the Congratulations section.

1.2.0: Added

  1. the mistakenly forgotten ousnius in the credits, kudos to him!
  2. The “how to update config quickly” section
  3. The definition of the actor states in the “smp list” and “smp detail” console commands.
  4. Auto determination of the Skyrim version
  5. FSMPM – The FSMP MCM
  6. PrivateProfileRedirector SE - Faster game start (INI file cacher)

1.3.0: Added

  1. Fixing the jittering on BHUNP

1.4.0: Updated

  1. Updated the compilation howto for 1.6.1130

1.5.0: Updated

  1. Updated the compilation howto for 1.6.1170

1.6.0: Updated

  1. Updated the compilation howto for Skyrim GOG 1.6.1179

FSMP and the previous ways to manage HDT-SMP

FSMP – Faster HDT-SMP

It has been forked of Karonar1’s source code, and has reintegrated Alandtse source code for VR.

  • Original upload: 22 October 2021
  • For 1.5.97, VR, AE

Alandtse source code for VR

Alandtse’s code, forked of Karonar1’s code.

Karonar1 source code

Karonar1’s code, forked of aers’s.

  • Last improvement: 1st June 2021

HDT-SMP (Skinned Mesh Physics)

Aers’s excellent mod, forked of hydrogensaysHDT source code.

hydrogensaysHDT source code

He is the original author.

Easy wind

It is a repackaging by yours truly of HDT SMP Wind which has been built from unavailable source code forked from aers code by nexusid1234.

CUDA

The CUDA-specific algorithms aren't supported anymore since 3.0.0. Although some parts could be faster on the GPU, in the end, the time cost of moving data from CPU to GPU and back was too high for this to be a speed improvement.

Here is the info on CUDA/Radeon that was previously in the Readme:

CUDA support

CUDA support is disabled by default, but can be enabled in configs.xml or from the console. It will automatically fall back to the CPU algorithm if you do not have any CUDA capable cards. However, it does not check capabilities of any cards it finds, so may crash if your card is too old. It was developed for a GeForce 10 series card, so should work on those or anything newer.

The absolute minimum required compute capability is 3.5, to support dynamic parallelism. However, the plugin is compiled for compute capability 5.0 for better performance with atomic operations. Therefore you will need at least a Maxwell card to use the stock plugin, or a late model Kepler card if you compile it yourself.

If you have more than one CUDA-capable card, you can select the one used for physics calculations in the configuration file. The code is designed to allow it to be changed at runtime, but there is currently no console command to do this. By default it will use device 0, which is usually the most powerful card available.

The following parts of the collision algorithm are currently GPU accelerated:

  • Vertex position calculations for skinned mesh bodies
  • Collider bounding box calculations for per-vertex and per-triangle shapes
  • Aggregate bounding box calculation for internal nodes of collider trees
  • Building collider lists for the final collision check
  • Sphere-sphere and sphere-triangle collision checks
  • Merging collision results (note that this may reduce performance for objects with lots of bones, as the merge buffer can get quite big - still working on this!)

The following parts are still CPU-based:

  • Converting collision results to manifolds for the Bullet library to work with
  • And, of course, the solver itself, which is part of the Bullet library, not the HDT-SMP plugin

This is still experimental, and may not give good performance. The old CPU collision algorithm was heavily optimized, so matching its framerate is not easy. You will need a high end GPU and a low end CPU to see any real performance benefits.

  • On a 6850K processor (6 cores, 3.6GHz) with a 1080Ti GPU, framerate in crowded areas is a little worse than with the CPU-only algorithm. Most of the time, both algorithms easily reach the framerate cap at 60fps.
  • On the same hardware with only two cores enabled, the total collision time is around 2-3 times faster on GPU than CPU. Of course, this translates to a less impressive framerate difference, because collision checking is only one part of the HDT-SMP algorithm.

Radeon support?

Nope, sorry. CUDA and nVidia cards are pretty much the industry standard for scientific computing, so that's what I use. In any case, I can't support GPU architectures that I don't have. The plugin should work fine in CPU mode with any type of card - I have tested it with Radeon 7850 and no nVidia drivers installed.

The plugin will use the most powerful available CUDA-capable card, regardless of whether it's being used for anything else. In theory, it's possible to have a Radeon as the primary graphics card, and an nVidia card in the same machine for CUDA and physics. I've tested this with two GeForce cards (a 1080Ti and a 1030) in the same system, but not a Radeon and a GeForce.

SMP console commands for the CUDA version

smp gpu

(For the cuda version only) Toggles the CUDA collision algorithm, if there is at least one CUDA device available. If there is no device available, it does nothing.

smp timing

(For the cuda version only) Starts a timing sequence for the collision detection algorithm. The next 200 frames will switch between CPU and GPU collision. Once complete, mean and standard deviation of timings for the two collision algorithms are displayed on the console.