Constrain Solvers - ProkopHapala/FireCore GitHub Wiki

Constrains

ange-yaghi / egnine simulator

Ten Minutes Physics

Position Based Dynamics

Projective Dynamics

Diferentialbe Projective Dynamics

Projective Dynamics Description:

As is show in section 1.3.2 The Global Solve and 1.3.3 A Simple Example of the paper Scalable Partitioning for Parallel Position Based Dynamics,Fratarcangeli (2018) the final system $A x = b$ is as follows: $$A_{ii} = m_i/dt^2 + \sum_j k_{ij}$$ $$A_{ij} = k_{ij}$$

$$\vec b_i = (m_i/dt^2) \vec p_i + \sum_j k_{ij} \vec d_{ij}$$

where:

  • $dt$ is time-step.
  • $m_i$ is mass of point $i$.
  • $k_{ij}$ is stiffness of bond between points $i,j$.
  • $\vec d_{ij}=l_{ij}(\vec x_i - \vec x_j)/| \vec x_i - \vec x_j|$ is bond vector between points $i,j$ renormalized to its rest length $l_{ij}$.
  • $\vec p_i = \vec x_i + \vec v_i dt + \vec f^{ext}_i (1/m_i)dt^2$ is the predicted new position of point $i$ ignoring the constrains.

Very useful property of such matrix $A$ is that it usually $A$ does not change during simulation, since it depends only mass of points and stiffness of bonds which (i.e. the bonding topology of molecule or mesh and the force-field parameters does not change during simulation). Therefore $A$ can be pre-factored by Cholesky decomposition or we can develop efficient pivoting scheme for Gauss-Seidel (reordering the points), or pre-conditioning for Conjugate-Gradient.

Projective Dynamics: iterative solution vs verlet update

Let's assume we solve the system $A x = b$ by iterative method such as Jacobi or Gauss-Seidel. In this case we update the points positions by formula: $$\vec x_i^{new} = (\vec b_i - \sum_j A_{ij} \vec x_j)/ A_{ij}$$ which expands to $$\vec x_i^{new} = \frac{ (m_i/dt^2) \vec p_i + \sum_j k_{ij} ( \vec d_{ij} - \vec x_j) }{ m_i/dt^2 + \sum_j k_{ij} }$$

For preconditioning, pivoting and rationalization of the expression it is useful to divide both numerator and deminator by $K_i = \sum_j k_{ij}$.

$$\vec x_i^{new} = \frac{ ((m_i/K_i)/dt^2) \vec p_i + \sum_j (k_{ij}/K_i) ( \vec d_{ij} - \vec x_j) }{ (m_i/K_i)/dt^2 + 1 }$$

Now we can define nice substitution $c_i = (m_i/K_i)/dt^2$ and $w_{ij} = k_{ij}/K_i$ (notice that $\sum_j w_{ij} = K_i/K_i = 1$).

$$\vec x_i^{new} = \frac{ c_i \vec p_i + \sum_j w_{ij} ( \vec d_{ij} - \vec x_j) }{ 1 + c_i }$$

Now we can consider that change of position of point $i$ during iteration $\Delta \vec x_i = \vec x_i^{new} - \vec x_i$ during one step of the iterative algorithm (consider Jacobi update).

$$\Delta \vec x_i = \frac{ c_i \vec p_i + \sum_j w_{ij} ( \vec d_{ij} - \vec x_j) - \vec x_i (1 + c_i) }{ 1 + c_i }$$

$$\Delta \vec x_i = \frac{ c_i ( \vec p_i - \vec x_i) + \sum_j w_{ij} ( \vec d_{ij} - \vec x_j + \vec x_i ) }{ 1 + c_i }$$

Now we can express the equation in terms of corrected change of position of point $\Delta \vec x_i^{pred} = \vec p_i - \vec x_i = \vec v_i dt + \vec f^{ext}_i (1/m_i)dt^2$.

$$\Delta \vec x_i = \frac{ c_i \Delta \vec x_i^{pred} + \sum_j w_{ij} ( \vec d_{ij} - \vec x_j + \vec x_i ) }{ 1 + c_i }$$

Projected Gauss Seidel

// --- slide 37 in https://ubm-twvideo01.s3.amazonaws.com/o1/vault/GDC2014/Presentations/Coumans_Erwin_Physics_for_Game.pdf
for (int k = 0; k <numIterations; k++){
  for (i = 0; i <numRows; i++){
    delta = 0.0f;
    for (j = 0; j<numRows; j++){ if(j!=i) delta += A(i,j) * x[j]; }
    float aDiag = A(i,i);
    x [i] = (b [i] - delta) / A(i,i);
    // Projected Inequality constrains - this separates Projected Gauss Seidel from normal Gauss-Seidel
    if (x[i]<lo[i]) x[i]=lo[i];
    if (x[i]>hi[i]) x[i]=hi[i];
  }
}

PGSImpulse Solver

Parallelization of Gauss-Seidel

Conjugate Gradient Solver:

Sparse Cholesky decomposition

The matrix A (in system A*x=b) in Projective Dynamics constriant solver depends only on the topology (connectivity of the vertexes), therefore it is efficient to pre-factor it using Cholesky decomposition. On top of that the matrix is sparse. We would like to exploit this sparsity and prevent desifying of the matrix during the factorization.

Numerical stability

  • Backward substitution is said to be more stable than forward Substitution. Therefore it is recommanded to use 2 backward substitutions rather than forward and backward substitution in solving LL*x=b (Cholesky).
  • LDL Cholesky Decomposition is generally considered more numerically stable.
for j=1:n
  for i = 1:j-1
    v[i] = A[j,i] * A[i,i]
  end
  A[j    ,j] = A[j,j] - A[j,1:j-1]*v[1:j-1]
  A[j+1:n,j] = ( A[j+1:n,j] - A[j+1:n,1:j-1]*v[1:j-1] )/A[j,j]
end 

Ensamble of multiple molecules and pre-factorized direct solver

  • The matrix $A$ can be pre-calculated individually for every molecule in the system.
    • The inversion $A^-1$ or the Cholesky factorization $L*L^T = A$ can be also pre-calculated for each molecule independently (since there are no hard bond) between the molecules.
    • We can efficinetly apply the matrix $A^-1$ or solver $L y = b$ and $L^T x = y$ by backward substitution and forward substitution in parallel for multiple molecules, resp. for multiple replicas of the system (if we run multiple replicas on GPU), because these are also completely independent.
      • What is more, since the matrix $L$ is the same for every replica of the system (resp. for every molecule of the same kind), we can efficiently paralelize these operation with SIMD operation.

In particular, consider expression for back-or-forward-substitution corresponding to atom $i$ in the molecule.
$$\vec x_i = ( \vec y_{i} - \sum_j L_{ij} \vec x_{j} ) / u_{ii}$$ Normally we would consider vectors $\vec x_i$ to be 3D vectors $(x_i,y_i,y_j)$ of position of atom $i$ in single instance (replica) $\mu$ of the given molecule. However, we can trivially parallelize this by packing all instances of that atom in a single vector forming $3m$-dimensional vector where $m$ is the number of replicas or instances of the molecule in the system.

  • cache-locality and local memory - to efficiently exploit parallelism on GPU or using SIMD operations, we should co-locate the data corresponding to the instances of the same atom in the memory. This means that we should re-shuffle the arrays from apos[ia,isys] to apos[isys,ia] for this purpose. Nevertheless, this is not optimal structure for evaluation of non-covalent interactions (which loads to local memory atoms correspodning to the same system isys). Therefore we need to do this re-shuffle (back-and-forth) of atomic position at every step in the assembling kernel.
ToDo paper: Fast parallel configuration sampling of systems of small molecules by projective dynamics