Solver - nasa/gunns GitHub Wiki

Solver

Config Data

Just like links, the GUNNS solver has config data that is very important to set up correctly for your network. These parameters are set in the network's container shape in GunnsDraw.

convergenceTolerance

Only applicable to Non-Linear networks. This is the amount of change in node potential from one minor step to the next that all nodes have to drop below or equal to before the Non-Linear network solution will converge. For instance, if this value is 0.001, and a node's minor step potential solutions are {1.0, 1.1, 1.11, 1.111, ...} then that node will have converged on the 4th minor step because 1.111 - 1.11 = 0.001. The entire network converges on the first minor step where all the node's delta potentials are below this value. This delta carries over from the previous major step. So if the final solution of the previous major step was 1.111, and the first minor step of the next major step is 1.112, then that node is already converged on that minor step.

The network solution is not guaranteed to be any more accurate than this value. Smaller tolerance values result in better accuracy but cause the network to require more minor steps to converge, which uses more CPU. Excessively small values prevents the network from converging before one of the step limits have been exceeded (see below). Larger values need less minor steps to converge, cause less accurate solutions, and use less CPU.

So there is a trade-off between this value and the CPU cost of your network. Generally you should set this to the largest amount of inaccuracy you can get away with. In other words, if all of the measured outputs of your network are no more accurate than 1.0 (say because potential sensors have an accuracy of 1.0), then use a value of 1.0.

minLinearizationP

Some links, particularly fluid conductors and thermal radiation conductors, linearize a non-linear flow equation as part of their normal operation. This value sets the minimum delta-potential across the link at which the link will apply the linearization. At delta-pressures smaller than this value, the link will drop to a purely linear relationship, like a GunnsBasicConductor.

This is useful for minimizing noise in the solution. At very small delta-potentials, links that perform linearizations of a non-linear flow function tend to get very large conductance values, which increases noise.

For this term, pick a value that is less than the minimum delta-pressure across all conductors that you want to simulate. This is in the units of potential for your aspect: kPa for fluid, K for thermal, volts for electrical. A value in the range of 1e-3 to 1e-6 is usually safe guess.

minorStepLimit

Only applicable to Non-Linear networks. This is the maximum number of minor steps that can be performed during any major step. This limit exists to prevent a network from getting stuck in an infinite loop if it isn't converging, and to set a maximum CPU load that the network can use. This number must be > 0 and should be greater than the decompositionLimit. This allows for extra DELAY or REJECT minor steps after a network has converged.

decompositionLimit

Only applicable to Non-Linear networks. This sets the maximum number of Cholesky Decompositions that can be performed during a major step. This limits the CPU load of a network when it is failing to converge. This value must be > 0 and should be less than the minorStepLimit.

Solution Methods

Sparsity

Most GUNNS networks have a very "sparse":http://en.wikipedia.org/wiki/Sparse_matrix admittance matrix, with nodes averaging 2 or 3 conductances to other nodes. The most notable exception are some passive thermal networks that have radiative paths between almost every combination of nodes.

GUNNS takes advantage of sparsity in a few ways:

  • By using an Islands solving mode, we greatly speed up the Cholesky decomposition when the matrix sparsity leads to the existence of separate islands.
  • The upcoming GUNNS & ROSES upgrade includes an option to use hardware accelerated sparse solving on the GPU. However this will only be advantageous for very large networks.
  • Our Cholesky decomposition skips operations on zeroes in [A] in its innermost loop (where most of the CPU time is spent). We found that at low compiler optimization levels (none or -g), this makes the decomposition almost as fast as at (-O2), whereas normally -g would be 2-4 times slower.

A common approach to sparse matrices that GUNNS does not use is compression of the admittance matrix. We assume that the act of compressing the matrix is computationally expensive and only worth doing if it only has to be done once at initialization or offline, or rarely during run. But since GUNNS allows the network topography to change from pass to pass (which we count as one of its strengths), this changes the connections in [A] each pass, and re-compressing or optimizing the compression of [A] would be required every pass.

Debugging

Debug Admittance Matrix "Slices"

Whenever you look at the network's Admittance Matrix [A] directly in Trick View, you are seeing it in its decomposed state, which is not very useful. For debugging problems with the solution you usually want to see [A] prior to its decomposition.

The solver class contains several attributes that allow you to see the pre-decomposition [A]. For even moderate-sized networks, [A] is much too large to view as a whole. Instead, we can show you one "slice" at a time. A slice is either one row of [A] or its diagonal, saved into a one-dimensional array for viewing on Trick View, etc.

Rows are useful because each row corresponds to one node in the network, so a row of [A] shows all of the conductance effects incident on that node in [A].

The diagonal is useful because all capacitance effects show up in it, so you can see all of the node capacitances in one view.

In addition to specifying which slice you want to see, you can also specify which minor step to look at. For Non-Linear networks, this is important since [A] can be re-built and decomposed on every minor step, and you might be interested in the unique state of [A] on a specific minor step.

To use the slice feature, set these attributes of the solver object:

  • mDebugDesiredSlice: Controls which slice of the admittance matrix is recorded for debug. A valid row number records that row of the admittance matrix into mDebugSavedSlice. A value below zero or above the last row records the diagonal of the matrix.
  • mDebugDesiredStep: Controls which specific minor step (1, 2 ...) the admittance matrix is recorded on for debug. A value of 0 (the default) turns off this feature, and <0 records the last minor frame, whichever that ends up being on any particular major step. To save CPU time, only use this feature when needed for debugging.

When the above are set to a valid combination, the solver records that slice into the mDebugSavedSlice array every major step. This array has size mNetworkSize and starts counting from zero, so to view the entire slice in Trick View, set it to display [0 - mNetworkSize].

For example, to see the row in [A] for Node 3 on the final minor step of every major step, set mDebugDesiredSlice = 3, mDebugDesiredStep = -1. To see all the diagonals on minor step 1, set mDebugDesiredSlice = -1, mDebugDesiredStep = 1. To turn off this feature when not in use, set mDebugDesiredStep = 0.

Debug Nodes

This debug mode is useful for Non-Linear networks. This shows the node's potential solution values for each minor step. This is useful for seeing the converging or non-converging trend of the node's potentials during all the minor steps in a major step, and specific values at each minor step that might point to problems.

To use this feature, set the following:

  • mDebugDesiredNode: Controls which node's minor step potentials are being recorded into mDebugSavedNode. A value <0 or above the last row turns off this feature. The term is set to -1 to pause recording when that nodes fails to converge.

As long as the network is converging, setting the above to a valid node # stores its minor step potentials into the mDebugSavedNode array on every major step. As soon as the network fails to converge, this will freeze to keep showing the minor steps on the first major step that failed. The first position in the array, [0], records the node # being recorded. The subsequent array positions (1, 2 ... mMinorStepLimit) show that node's potential solution on those minor steps. To view the entire node history in Trick View, set it to display [1 - mMinorStepLimit].

To turn this feature off, set mDebugDesiredNode = -1.

Minor Step Log

This describes the updated step logger in v19.2.

The GUNNS solver contains an internal minor step log that can record the inputs and outputs of minor steps and save them to an output file. This is mainly useful for debugging non-linear networks, but it can be used for linear networks as well.

The logger is the .mStepLog object in the solver. This object is of type GunnsMinorStepLog, found in gunns/core/GunnsMinorStepLog.hh. Every solver has one, but by default it is inactive. The log takes a bit of set-up to use properly. This describes how to set it up and use it in a Trick sim.

Set up the asynchronous thread & job:

Add a Trick job to call the step logger's updateAsync() function, and assign this job to a Trick PROCESS_TYPE_ASYNC_CHILD thread. This is an asynchronous thread that restarts at a certain interval, but allows its jobs to run longer than that interval without interfering with Trick's real-time threads. The updateAsync() function does the actual file write -- we put it in an async thread to prevent interference with real-time, since file I/O can be slow. There are several files to update to make all this happen:

  • First add the job to your sim object's .sm file. This should be the same .sm that holds your network. The network already has init and scheduled run jobs, and possibly also a restart job. This new job will go along with them. For example in a .sm file, where 'threadAsync' is a thread identifier argument passed to the sim object, and RATE is a scheduled model execution rate for 'scheduled' job types (but will be ignored by Trick since we'll be in an async thread):
            /*------------------------------------ Async Jobs ------------------------------*/
            CthreadAsync (RATE, "scheduled") system.subsystem.network.netSolver.mStepLog.updateAsync();
  • Next, create a thread for such asynchronous models in your sim by adding a new line in the sim bus icd folder's thread_definition.icd file, like so:
ASYNC_THREAD     SystemSubsystemSimObject_async

  • Finally, use the simulation input file to tell Trick that the thread is to be asynchronous, with a restart interval. For example in an input file:
# Set thread 'ASYNC_THREAD' to be asynchronous and attempt to restart every 0.1 seconds:
trick.exec_set_thread_process_type(CTH.getThreadId("ASYNC_THREAD"), trick.PROCESS_TYPE_ASYNC_CHILD)
trick.exec_set_thread_async_cycle_time(CTH.getThreadId("ASYNC_THREAD"), 0.1) ;

# Optionally, set the CPU affinity of the thread.  In this example we use to the same cpu # as the thread ID.
trick.exec_set_thread_cpu_affinity(CTH.getThreadId("ASYNC_THREAD"), CTH.getThreadId("ASYNC_THREAD"))

Set up the unfreeze job:

The step logger has a updateFreeze() function which is used to re-size the log number of steps. This is done in freeze to avoid interfering with real-time, since this process deletes and re-allocates a lot of memory, which is slow. We actually use Trick's 'unfreeze' job type, which are jobs that are run on exit from Freeze and before entering real-time for Run.

You don't have to add this job, but the stop logger's log resize function won't work without it.

This job goes into the same .sm file as your network's other jobs. For example in a .sm file:

            /*----------------------------------- Freeze Jobs ------------------------------*/
            ("unfreeze") system.subsystem.network.netSolver.mStepLog.updateFreeze();

Configure initial logger state in input file:

With the above changes, you can re-build the sim. Prior to running the sim, add a few lines to your Trick input file to configure the step logger's initial state. There are 3 parameters to set:

  • mStepLog.mInputData.mPath: this is a text string of the folder path, relative to the Trick sim folder, where the output files will be written to. It should include the last folder symbol '/'. For example: './hsLog/'. We recommend putting the output files in the same folder as the Health & Status log. If you don't provide a path here, the logger will default it to './', which is the sim folder.
  • mStepLog.mInputData.mLogSteps: this is the number of minor steps that will be recorded in each log output file. This defaults to zero. It should be set to > 0 or else the logger won't output a file.
  • mStepLog.mInputData.mModeCommand: this is an enumeration of the initial command state of the logger. The options are:
    • trick.GunnsMinorStepLogInputData.PAUSE: this leaves the logger in a paused state, which won't record new data from the solver. This is the default state.
    • trick.GunnsMinorStepLogInputData.RECORD_AUTO: this causes the logger to record data from the solver, and automatically write an output file when the solver fails to converge in a major step.
    • trick.GunnsMinorStepLogInputData.RECORD_SNAP: this causes the logger to record data from the solver, but the logger will only write an output file when the user manually changes this to SNAP during run.

For example, here are three of the above set in an input file:

# logger is a reference to the step logger, for easy reference below:
logger = system.subsystem.network.netSolver.mStepLog
# Now configure the 3 terms as above:
logger.mInputData.mModeCommand = trick.GunnsMinorStepLogInputData.RECORD_AUTO
logger.mInputData.mLogSteps = 200
logger.mInputData.mPath = "./hsLog/"

You can also change those 3 values during run via Trick View. However, note that the mLogSteps won't take effect and resize the log until you go to freeze and back to run again, and you have set up the Freeze job as above.

Output files and contents:

The logger outputs a comma separated value .csv file. Each file is identified by the solver's variable name and the network Major Step count it was on when the file write was initiated. For example: system.subsystem.network.netSolver.mStepLog_522.csv.

The contents of the file are one row per solver minor step, with a header row at the top. The columns contain:

  • Major step count
  • Minor step count within the current major step
  • Decomposition count within the current major step
  • Overall network solution result: SUCCESS, CONFIRM, REJECT, DELAY, DECOMP_LIMIT, MATH_FAIL
  • Network node potentials of the minor step solution
  • Network node convergence (delta-potential between minor steps)
  • For each link in the network:
    • Link result of its confirmSolutionAcceptable call from the solver for the minor step (CONFIRM, DELAY, or REJECT)
    • Link's admittance matrix that went into this minor step's system of equations
    • Link's source vector that went into this minor step's system of equations