Running Simulations - Xorgon/DEMOranges GitHub Wiki

Running simulations is obviously an important part of this whole project. To facilitate running simulations easily, and without a lot of code repetition, a simulation runner has been developed that takes all of the simulation parameters and runs the simulation. This can be found in sims/simRunner/.

The following sections explain how simRunner is structured and how it works.

Program Flow

Variable declarations
- All variables must be declared at the beginning of the simulation, many are taken from the input parameters.
- All of the buffer objects are prefixed with g to separate them from their host counterparts.
Build kernels
- Kernels are built using the getKernelWithUtils utility function from util/clUtils/. This builds the kernels with all the necessary kernel utilities.
- Iterate particles is built by the simulation program and passed into the simulation runner to allow for greater control of how particles are iterated.
Create control volumes
- All of the control volume related arrays and buffers are initialized with the required values.
Create wall collisions
- There is no broad phase collision detection for walls so all particle-wall collisions are made before running the simulation.
Create remaining buffers
- Any buffers not related to control volumes are created here.
Set kernel arguments
- All kernel arguments are set, this relates buffers and variables to the relevant kernels.
- Note that a couple of these have to be reset every loop because the pointers change.
Log particles and setup data
Run main simulation loop
Log particles at end time

Main Simulation Loop

This section describes how the main simulation loop is structured. The key parts of the loop are mentioned here and the relevant host code snippets are included.

The number of particles in each control volume are counted. The number of particles in each control volume is stored in the entries of particle_count_array. The count is used to sort the particles and set the appropriate arrays. This code passes the array to the GPU, runs the appropriate kernel, and then copies it back.

    if (VERBOSE) printf("    Counting particles per CV\n");
    memset(particle_count_array, 0, sizeof(cl_int) * NUMCVS); // Reset counts to 0.
    ret = intArrayToDevice(queue, gparticle_count_array, &particle_count_array, NUMCVS);
    ret = clEnqueueNDRangeKernel(queue, assign_particle_count, 1, NULL, &NUMPART, 0, NULL, NULL, NULL);

The particles are then assigned to an array sorted by control volume (gcv_pids, control volume particle IDs) and the indexing array (cv_start_array). This step sorts all the particles and allows collisions to be created.

    ret = intArrayToHost(queue, gparticle_count_array, &particle_count_array, NUMCVS);
    set_array_starts(particle_count_array, cv_start_array, NUMCVS);
    ulongArrayToDevice(queue, gcv_start_array, &cv_start_array, NUMCVS);
    memset(input_count_array, 0, sizeof(cl_int) * NUMCVS); // Reset counts to 0.
    intArrayToDevice(queue, ginput_count_array, &input_count_array, NUMCVS);
    if (VERBOSE) printf("    Assigning particles to CVs\n");
    ret = clEnqueueNDRangeKernel(queue, assign_particles, 1, NULL, &NUMPART, 0, NULL, NULL, NULL);

The number of collisions that could occur are then determined based on the spatial zoning technique.

    if (VERBOSE) printf("    Counting collisions\n");
    collision_count = 0;
    clEnqueueWriteBuffer(queue, gcollision_count, CL_TRUE, 0, sizeof(cl_ulong), &collision_count, 0, NULL, NULL);
    ret = clEnqueueNDRangeKernel(queue, count_pp_collisions, 1, NULL, &NUMCVS, 0, NULL, NULL, NULL);
    ret = clEnqueueReadBuffer(queue, gcollision_count, CL_TRUE, 0, sizeof(cl_ulong), &collision_count, 0, NULL,
                              NULL);
    NUMPPCOLS = collision_count;

If there are any collisions (which is almost always) the collision array (pp_cols) is created and all of the collisions are added. It is important to note that, as the number of collisions changes every loop, the collision array must be re-created based on the collision count in every loop. This includes re-creating the device memory buffer and then re-setting the associated make_pp_collisions kernel argument as the memory location will have changed. Once all of the collisions are created the kernel code to calculate the collision physics is run.

    if (NUMPPCOLS > 0) {
        // Make collisions.
        if (VERBOSE) printf("    Making %llu collisions\n", collision_count);
        if (gpp_cols != 0) {
            ret = clReleaseMemObject(gpp_cols);
        }
        gpp_cols = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(pp_collision) * NUMPPCOLS, NULL, &ret);
      
        collision_count = 0;
        clEnqueueWriteBuffer(queue, gcollision_count, CL_TRUE, 0, sizeof(cl_ulong), &collision_count, 0, NULL,
                             NULL);

        clSetKernelArg(make_pp_collisions, 4, sizeof(cl_mem), &gpp_cols);

        clEnqueueNDRangeKernel(queue, make_pp_collisions, 1, NULL, &NUMCVS, 0, NULL, NULL, NULL);

        // Calculate collisions.
        if (VERBOSE) printf("    Calculating collisions\n");
        ret = clSetKernelArg(calculate_pp_collision, 0, sizeof(cl_mem), &gpp_cols);
        ret = clEnqueueNDRangeKernel(queue, calculate_pp_collision, 1, NULL, &NUMPPCOLS, 0, NULL, NULL, NULL);
    }

If there are walls in the simulation the particle-wall collisions will have all been set before the simulation loop is run. This is because there is no broad phase collision detection performed on particle-wall collisions as there are relatively few (typically ~10 walls colliding with each particle). If a lot more walls were to be created then broad phase collision detection should be added for particle-wall collisions.

    if (NUMWALLS > 0) {
        ret = clEnqueueNDRangeKernel(queue, calculate_pw_collision, 1, NULL, &NUMPWCOLS, 0, NULL, NULL, NULL);
    }

Finally, the particles are iterated and the loop begins again.

    if (VERBOSE) printf("    Iterating particles\n");
    ret = clEnqueueNDRangeKernel(queue, iterate_particle, 1, NULL, &NUMPART, 0, NULL, NULL, NULL);