Google Summer of Code (GSoC) 2021 - STEllAR-GROUP/hpx GitHub Wiki

Introduction

Welcome to the Google Summer of Code (GSoC) page for the HPX project. Here you can find information about student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started with HPX. The STE||AR Group will apply as an organization and our goal is to have at least five funded students working on HPX related projects.

Requirements

Students must submit a proposal and. A template for the proposal is available here. Find hints for writing a good proposal here.

We strongly suggest that students interested in developing a proposal for HPX discuss their ideas on the IRC channel or the mailing list to help refine the requirements and goals. Please see [this page] for information on how to access IRC or the mailing list this page. Students who actively plan and discuss projects with developers are generally ranked before those that do not.

We have intentionally left the descriptions of these projects vague and open to interpretation because we expect students to develop their proposals' requirements by doing initial background research on the topic and interacting with the community. In addition, it is important to note that the suggested projects on this page are not binding -- if you have an interest in parallel task-based programming and have an idea for a project that would either improve HPX or demonstrate how well it applies to your problem, then feel free to suggest your idea as a project and write a proposal for it. We will be glad to help you with project goals to improve your proposal if you have ideas, so do not leave them until the last minute.

We will expect students to demonstrate that they have the required level of C++ and CMake knowledge by showing us some of their previous work (e.g., a GitHub repository), or preferably, by them making a small demonstration program using HPX that shows a simple example of something they have created themselves.

Potential Additional Funding

For students who perform at or above expectations on both GSoC evaluations, the Center of Computation and Technology (CCT) at Louisiana State University (LSU) may fund up to an additional four weeks' work on the project for no more than the GSoC rate of pay. This funding is not guaranteed and is independent of the GSoC program. Students accepted for additional funding will be paid through LSU for the additional weeks and affiliated with LSU during that time. Additional paperwork through LSU will be required.

Tips for Prospective Students

Some of our former GSoC students that still contribute to our projects have put together the following list. All of them had to go through the same learning experience. Prospective students most probably face this challenge now, so the list provides pointwise help to get into HPX smoothly.

The first thing we suggest is to build HPX from the source using the CMake build system. An example guide to build HPX is here. Various ways of building HPX (e.g., memory allocators, OTF2 traces, CUDA support) will enable you to understand the capabilities of HPX as a runtime.
Once you're acquainted with the build system, we suggest you read our docs/wiki and try to familiarize yourself with the basic terminology (e.g., locality, LCO, futurization, etc.).
Next, we suggest you watch talks on HPX on YouTube. Doing so should give you a brief overview of the motivations and implementation design of the components within HPX.
At this point, try building and playing with the examples in HPX. Furthermore, we have a basic tutorial that takes you through the features and their usage with code examples.
Going through the examples may be an overwhelming experience, so we suggest you become familiar with our way of writing code through our summer lecture series. (Hint: Pay attention to Lecture #4)
When you're familiar with basic usage, we suggest you try writing demo HPX programs (e.g., matrix-matrix multiplication). Go through our Issue tracker and see if you can find an issue you would like to investigate. Working on bugs is the easiest way to dive into the code base and contribute to HPX.
Dig into our currently active GSoC issues and Pull Requests relevant to them. Furthermore, leave comments and discuss with the corresponding authors.
We highly recommended joining our IRC channel, #ste||ar, on Libera.chat, where you can ask questions, discuss issues, pull requests, and your potential GSoC project. Remember, questions are the key to start contributing!

2021 HPX Project Ideas

There are new projects this year, and also ones revamped from previous years that are still of interest. These projects have mentors ready and waiting to help students.

Core HPX Projects

These are projects that involve making changes/improvements/extensions to the core HPX library.

Disentangle segmented algorithms
Separate the datapar algorithms
Implement shift_left and shift_right parallel algorithms
(Re-)Implement executor API on top of sender/receiver infrastructure
Add Vectorization to the par_unseq Implementations of the Parallel Algorithms
Create Generic Histogram Performance Counter
Range based Parallel Algorithms
Implement missing Parallel Algorithms
Conflict (range-based) Locks

HPX User Projects

These are projects that improve code that uses HPX. In general, the primary goal with these projects is to improve user uptake of HPX by demonstrating its use in other projects, and only minor fixes/changes/extensions should be necessary for the main HPX library itself.

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance
UI Improvements for Performance Visualization
Large File Support for HPX OTF2 Trace Visualization
Test Framework for Phylanx Algorithms
Optimizer implementation for Phylanx
Working on Blaze Tensor
Distributed solver and load balancing for Peridynamics using asynchronous parallelism

Legacy Project Ideas

These are project ideas from previous Summer of Code years that we are still interested in working on, but it might be harder to find a mentor willing to supervise a student. Therefore, we would expect only very self-motivated and capable students to select a project from the legacy category. We cannot guarantee that we will select a project from this list unless we are quite satisfied that the student can complete the work.

Add Mask Move/Assign Wrappers for Vectorization Intrinsics
Implement Your Favorite Parcelport Backend
Implement A Faster Associative Container for GIDs
Port HPX to iOS
Create A Parcelport Based on WebSockets
Script Language Bindings
All to All Communications
Distributed Component Placement
Resumable Function Implementation
Coroutine-like Interface
Port Graph500 to HPX
Port Mantevo MiniApps to HPX
Create an HPX Communicator for Trilinos Project
Add More Iterative Solvers in Phylanx
Bug Hunter
Project Template

We are looking to fund work on a number of different kinds of proposals (for more details about concrete project ideas, see below):

Extensions to existing library features,
New distributed data structures and algorithms
Multiple competing proposals for the same project

Implement Iterative Solvers

Abstract: In Phylanx, we have implemented a set of iterative linear system solvers intended for use, including conjugate gradient (CG), preconditioned CG, BiCGSTAB, preconditioned BiCGSTAB, Arnoldi, Lanczos, and GMRES with the blaze library, a high-performance C++ linear algebra library. For this project, a student can implement more iterative solvers. See BlazeIterative for more information on the existing solvers.
Difficulty: Easy-Medium
Expected result: Implement at least one solver
Knowledge Prerequisite: C++
Mentor: Nanmiao Wu () and Patrick Diehl ()

Task Bench A Parameterized Benchmark for Evaluating Parallel Runtime Performance

Abstract: This project aims to add HPX to the suite of runtimes examined in the [Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance](paper https://arxiv.org/abs/1908.05790). A comparison of HPX with other runtimes is useful for our own benchmarking to see how well we compare to other runtimes, and it is also useful for publicity and attracting users if the numbers show that HPX performs well. This project has an issue on GitHub. The student should integrate HPX into the benchmarking suite and ideally create a "Superproject" capable of downloading, building, running, testing, benchmarking all (or as many as possible/reasonable) of the runtimes in one go. This might involve using a CMake super-project capable of fetching versions of each runtime from the web, performing the build, etc. The student will need to optimize the benchmark for HPX by choosing/testing the right executors, parallel algorithms, and the like to get the best performance possible.

Conflict (Range-Based) Locks

Abstract: Some multi-threaded algorithms may require resources that must be held using a lock, but the locking mechanism may be range-based rather than absolute. Consider a large array of N items where a task requires some small subset of the items to be locked while a second task requires a second range. If these tasks are placed into a DAG so that task2 can only run once task1 has been completed, it will be inefficient when the range of items used by task2 does not overlap the range from task1. When many tasks operate on the range, with randomly overlapping or non-overlapping regions, DAG-based task scheduling leads to a highly inefficient strategy. We need a range based lock that can be templated over <items>, and that can then be locked/unlocked on ranges (of those items) and interact with our future<> based scheduling so that items will become ready when the range they need has no locks outstanding, and so that when a task releases a lock, any other tasks that overlap the range are in turn signaled as possibly ready. (For an example of how this is used in conventional HPC programming, look up Byte Range locks in MPI for Parallel IO to a single file). A successful implementation can be extended to multi-dimensional locking *2D/3D etc., ideally templated over dimensions and types).
Difficulty: Medium/Hard
Expected result: A test application that creates arrays of items and randomly assigns tasks to operate on regions of those items with locking and schedules the tasks to operate in a non-conflicting way.
Knowledge Prerequisite: Thread-safe programming. Futures.
Mentor: John Biddiscombe ()

Create Generic Histogram Performance Counter

Abstract: HPX supports performance counters that return a set of values for each invocation. We have used this to implement performance counters collecting histograms for various characteristics related to parcel coalescing (such as the histogram of the time intervals between parcels). The idea of this project is to create a general-purpose performance counter which collects the value of any other given performance at given time intervals and calculates a histogram for those values. This project could be combined with Add more arithmetic performance counters.
Difficulty: Medium
Expected result: Implement a functioning performance counter which returns the histogram for any other given performance counter as collected at given time intervals.
Knowledge Prerequisite: Minimal knowledge of statistical analysis is required.
Mentor: Hartmut Kaiser () and Mikael Simberg ()
See issue #2237 on HPX bug tracker

Disentangle segmented algorithms

Abstract: Currently, the segmented algorithms are part and parcel of implementing the parallel algorithms. This project is centered around the idea of separating the segmented algorithms from the base parallel algorithms.
Difficulty: Medium/Hard
Expected result: The result should be functioning segmented parallel algorithms separated from the core algorithms.
Knowledge Prerequisite: Parallel algorithms, iterators.
Mentor: Hartmut Kaiser () and Marcin Copik ()
See issue #5156 on HPX bug tracker

Separate the `datapar` algorithms

Abstract: Currently, parallel algorithms support being used with the datapar execution policy. This project is meant to separate the datapar implementations and expose them through separate algorithm specializations and adapt the implementation to support (and rely on) the data-parallel Types introduced by N4755, section 9.
Difficulty: Medium/Hard
Expected result: The result should be functioning vectorized parallel algorithms separated from the core algorithms.
Knowledge Prerequisite: Parallel algorithms, iterators, vectorization.
Mentor: Hartmut Kaiser () and Marcin Copik ()
See issue #5157 on HPX bug tracker

(Re-)Implement executor API on top of sender/receiver infrastructure

Abstract: P0443 will most likely be accepted for C++23. Our executor API (customization points) currently dispatch to an executor interface defined by wg21.link/p0443R3. All HPX facilities related to scheduling tasks (algorithms, future, dataflow, async, etc.) rely on the executor customization points to perform their operations.
Difficulty: Medium
Expected result: The result should be functioning executor customization points built upon senders/receivers.
Knowledge Prerequisite: Parallel algorithms.
Mentor: Hartmut Kaiser () and Marcin Copik ()
See issue #5219 on HPX bug tracker

Implement `shift_left` and `shift_right` parallel algorithms

Abstract: P0766 was accepted for C++20. This adds the shift_left and shift_right algorithms. We should implement those as part of our parallel algorithm module.
Difficulty: Medium
Expected result: The result should be functioning parallel shift_left and shift_right algorithms.
Knowledge Prerequisite: Parallel algorithms.
Mentor: Hartmut Kaiser () and Marcin Copik ()
See issue #3706 on HPX bug tracker

Add Vectorization to `par_unseq` Implementations of Parallel Algorithms

Abstract: Our parallel algorithms currently don't support the par_unseq execution policy. This project is centered around the idea to implement this execution policy for at least some of the existing algorithms (such as for_each and similar).
Difficulty: Medium/Hard
Expected result: The result should be functioning parallel algorithms when used with the par_unseq execution policy. The loop body should end up being vectorized.
Knowledge Prerequisite: Vectorization, parallel algorithms.
Mentor: Marcin Copik ()
See issue #2271 on HPX bug tracker

UI Improvements for Performance Visualization

Abstract: Traveler-Integrated is a platform for visualization of parallel runtimes performance, such as HPX. The interface allows accessing multiple datasets from different executions. However, several improvements can be made to the interface in managing large numbers of files (e.g., from historical regression runs) and organizing windows to handle the comparison of runs. In this project, you will work on the Javascript front end to implement this interface, refining the design through user feedback.
Difficulty: Easy-Medium
Expected result: Traveler-Integrated will have a newly designed interface for managing data of multiple runs.
Knowledge Prerequisite: Javascript.
Mentor: Kate Isaacs ()

Large File Support for HPX OTF2 Trace Visualization

Abstract: HPX traces are collected with APEX and written in as OTF2 files with extensions. These trace files are typically visualized using a Gantt chart or collection of timelines. The present implementation reads the entirety of the trace file before generating the visualization. However, the OTF2 interface has support for partial reading of the file and a parallel backend. This project would modify the Gantt chart backend (C++) to utilize these features, thus supporting larger files. The project could also modify the front end to use WebGL (Javascript) when the number of data items is large.
Difficulty: Medium-Hard
Expected result: Files that require more memory than on a single machine can be run from that machine. The time from program-start to visualization is decreased due to the use of large file features.
Knowledge Prerequisite: C++, Javascript.
Mentor: Kate Isaacs ()

Port HPX to iOS

Abstract: HPX has already proven to run efficiently on ARM-based systems. This has been demonstrated with an application written for Android tablet devices. A port to handheld devices running with iOS would be the next logical step! To run HPX efficiently on there, we need to adapt our build system to be able to cross-compile for iOS and add a code to interface with the iOS GUI and other system services.
Difficulty: Easy-Medium
Expected result: Provide a prototype HPX application running on an iPhone or iPad
Knowledge Prerequisite: C++, Objective-C, iOS
Mentor: Hartmut Kaiser () and Thomas Heller ()

Create A Parcelport Based on WebSockets

Abstract: Create a new parcelport that is based on WebSockets. The WebSockets++ library seems to be a perfect starting point to avoid having to dig into the WebSocket protocol too deeply.
Difficulty: Medium-Hard
Expected result: A proof of concept parcelport based on WebSockets with benchmark results
Knowledge Prerequisite: C++, knowing WebSockets is a plus
Mentor: Hartmut Kaiser () and Thomas Heller ()

Script Language Bindings

Abstract: Design and implement Python bindings for HPX, exposing all or parts of the HPX functionality with a 'Pythonic' API. This should be possible as Python has a much more dynamic type system than C++. Using Boost.Python seems to be a good choice for this.
Difficulty: Medium
Expected result: Demonstrate functioning bindings by implementing small example scripts for different simple use cases
Knowledge Prerequisite: C++, Python
Mentor: Hartmut Kaiser ()

All to All Communications

Abstract: Design and implement efficient all-to-all communication LCOs. While MPI provides mechanisms for broadcasting, scattering and gathering with all MPI processes inside a communicator, HPX currently misses this feature. It should be possible to exploit the Active Global Address Space to mimic global all-to-all communications without actually communicating with every participating locality. Different strategies should be implemented and tested. A first and very basic implementation of broadcast already exists which tries to tackle the above-described problem. However, more strategies for granularity control and locality exploitation need to be investigated and implemented. We also have the first version of a gather utility implemented.
Difficulty: Medium-Hard
Expected result: Implement benchmarks and provide performance results for the implemented algorithms
Knowledge Prerequisite: C++
Mentor: Thomas Heller () and Andreas Schaefer ()

Distributed Component Placement

Abstract: Implement an EDSL to specify the placement policies for components. This could be done similar to [Chapels Domain Maps] (http://chapel.cray.com/tutorials/SC12/SC12-6-DomainMaps.pdf). In Addition, allocators can be built on top of those domain maps to use with C++ standard library containers. This is one of the key features to allow users to efficiently write parallel algorithms without having them worried too much about the initial placement of their distributed objects in the Global Address space
Difficulty: Medium-Hard
Expected result: Provide at least one policy that automatically creates components in the global address space
Knowledge Prerequisite: C++
Mentor: Thomas Heller () and Hartmut Kaiser ()

Resumable Function Implementation

Abstract: Implement resumable functions either in GNU g++ or Clang. This should be based on the corresponding proposal to the C++ standardization committee (see N4286. While this is not a project directly related to HPX, having resumable functions available and integrated with hpx::future would improve the performance and readability of asynchronous code. This project sounds to be huge - but it actually should not be too difficult to realize.
Difficulty: Medium-Hard
Expected result: Demonstrating the await functionality with appropriate tests
Knowledge Prerequisite: C++, knowledge of how to extend Clang or GCC is advantageous
Mentor: Hartmut Kaiser ()

Coroutine-like Interface

Abstract: HPX is an excellent runtime system for doing task-based parallelism. In its current form, however, the results of tasks can only be expressed in terms of returning from a function. However, there are scenarios where this is not sufficient. One example would be lazy ranges of integers (For example, Fibonacci, 0 to n, etc.). For those, a generator/yield construct would be perfect!
Difficulty: Easy-Medium
Expected result: Implement yield and demonstrate on at least one example
Knowledge Prerequisite: C++
Mentor: Hartmut Kaiser () and Thomas Heller ()

Port Graph500 to HPX

Abstract: Implement Graph500 using the HPX Runtime System. Graph500 is the benchmark used by the HPC industry to model important factors of many modern parallel analytical workloads. The Graph500 list is a performance list of systems using the benchmark and was designed to augment the Top 500 list. The current Graph500 benchmarks are implemented using OpenMP and MPI. HPX is well suited for the fine-grain and irregular workloads of graph applications. Porting Graph500 to HPX would require replacing the inherent barrier synchronization with asynchronous communications of HPX, producing a new benchmark for the HPC community as well as an addition to the HPX benchmark suite. See http://www.graph500.org/ for information on the present Graph500 implementations.
Difficulty: Medium
Expected result: New implementation of the Graph500 benchmark.
Knowledge Prerequisite: C++
Mentor: Patricia Grubel (), and Thomas Heller ()

Port Mantevo MiniApps to HPX

Abstract: Implement a version of one or more mini-apps from the Mantevo project (http://mantevo.org/ "Mantevo Project Home Page") using HPX Runtime System. We are interested in mini-applications ported to HPX that have irregular workloads. Some of these are under development, and we will have access to them in addition to those listed on the site. On the site, MiniFE and phdMESH would be good addition to include in HPX benchmark suites. Porting the mini-apps would require porting the apps from C to C++ and replacing the inherent barrier synchronization with HPX's asynchronous communication. This project would be a great addition to the HPX benchmark suite and the HPC community.
Difficulty: Medium
Expected result: New implementation of a Mantevo mini-app or apps.
Knowledge Prerequisite: C, C++
Mentor: Patricia Grubel () and Thomas Heller ()

Create An HPX Communicator for Trilinos Project Teuchos Subpackage

Abstract: The Trilinos project (http://trilinos.org/) consists of many libraries for HPC applications in several capability areas (http://trilinos.org/capability-areas/). Communication between parallel processes is handled by an abstract communication API (http://trilinos.org/docs/dev/packages/teuchos/doc/html/index.html#TeuchosComm_src) which currently has implementations for MPI and serial only. Extending the implementation with an HPX backend would permit any of the Teuchos enabled Trilinos libraries to run in parallel using HPX in place of MPI. Of particular interest is the mesh partitioning library Zoltan2 (http://trilinos.org/packages/zoltan2/ "Zoltan2 - A Package for Load Balancing and Combinatorial Scientific Computing") which would be used as a test case for the new communications interface. Some new collective HPX algorithms may be required to fulfill the API requirements (see all-to-all-communications project above).
Difficulty: Medium-Hard
Expected result: A demo application for partitioning meshes using HPX and Zoltan.
Knowledge Prerequisite: C, C++, (MPI)
Mentor: John Biddiscombe () and Thomas Heller ()

Add Mask Move/Assign Wrappers for Vectorization Intrinsics

Abstract: Vectorization is a key technique to leverage the full potential of modern CPUs. LibFlatArray is a C++ library that helps with transitioning scalar numerical algorithms on objects to vectorized implementations. It comes with expression templates that enable the user to write code that encapsulates vector intrinsics but appear to the user like standard mathematical data types and operations. These templates (dubbed short_vec in LibFlatArray) currently lack a mechanism to selectively set certain lanes of the vector registers via conditional masks. With this functionality, we can represent if/then/else constructs more idiomatically. Intrinsics for mask generation/application are readily available in all current vector instruction sets (Intel/ARM/IBM); we simply lack convenient/efficient wrappers to utilize them.
Difficulty: Medium
Expected result: Wrapper functions for comparison (to generate masks) and conditional assignment (using masks)
Knowledge Prerequisite: basic C++, vectorization via SSE, AVX/AVX2/AVX512
Mentor: Andreas Schaefer ()

Implement Your Favorite Parcelport Backend

Abstract: The HPX runtime system uses a module called Parcelport to deliver packages over the network. An efficient implementation of this layer is indispensable and we are searching for new backend implementations based on CCI, ucx or libfabric. These mentioned abstractions over various network transport layers offer the ability to do fast, one-sided RDMA transfers. The purpose of this project is to explore one of these and implement a parcelport using it.
Difficulty: Medium-Hard
Expected result: A proof of concept for a chosen backend implementation with performance results
Knowledge Prerequisite: C++, Basic understanding of Network transports
Mentor: Thomas Heller ()

Implement a Faster Associative Container for GIDs

Abstract: The HPX runtime system uses Active Global Address Space (AGAS) to address global objects. Objects in HPX are identified by a 128-bit unique global identifier, abbreviated as a GID. The performance of HPX relies on fast lookups of GIDs in associative containers. We have experimented with binary search trees (std::map) and hash maps (std::unordered_map). However, we believe that we can implement a search data structure based on n-ary trees, tries, or radix trees that exploit the structure of GIDs such that it allows us to have faster lookup and insertion.
Difficulty: Medium-Hard
Expected result: Various container approaches to choose from together with realistic benchmarks to show the performance properties
Knowledge Prerequisite: C++, Algorithms
Mentor: Thomas Heller ()

Working on Blaze Tensor

Abstract: Blaze Tensor implements 3D data structures (tensors) that integrate well with the Blaze library. From the Blaze website: "Blaze is an open-source, high-performance C++ math library for dense and sparse arithmetic." With its state-of-the-art Smart Expression Template implementation, Blaze combines the elegance and ease of use of a domain-specific language with HPC-grade performance, making it one of the most intuitive and fastest C++ math libraries available. Blaze Tensor follows the design goals of Blaze, so all of the above applies to it as well. There are many tasks one could work on for Blaze Tensor. For an initial list, please see here.
Difficulty: Medium-Hard
Expected result: Finish implementation (and corresponding Doxygen based API documentation) for the selected set of features
Knowledge Prerequisite: CMake and C++
Mentor: Hartmut Kaiser ()

Range based Parallel Algorithms

Abstract: This project requires creating new algorithm implementations that expose a range-based interface as proposed by the Ranges-TS - Edit: these have been added to C++20 (latest draft). We have already added several range-based parallel algorithms, but many are still missing (see here for a list of what algorithms require work). Usually, the range-based algorithms are almost trivial to implement as they can just forward to the existing iterator-based versions, especially after those have been adapted to support ranges (see Parallel Algorithms and Ranges). However, some algorithms will require API changes to the existing iterator-based code (adding projections and providing sentinels as end iterators, mostly).
Difficulty: Easy-Medium
Expected result: Implement as many range-based parallel algorithms as possible
Knowledge Prerequisite: CMake and C++
Mentor: Hartmut Kaiser ()

Implement missing Parallel Algorithms

Abstract: C++17 has added a set of parallel algorithms to the standard library. In HPX, we have implemented almost all of those; however, we still miss a handful. Please see the corresponding ticket that lists the algorithms that have been implemented and those still up for grabs. For this project, a student would implement one or more of the remaining parallel algorithms in HPX.
Difficulty: Medium-Hard
Expected result: Implement some or all of the missing parallel algorithms
Knowledge Prerequisite: CMake and C++
Mentor: Hartmut Kaiser ()

Bug Hunter

Abstract: In addition to our extensive ideas list, several active tickets are listed in our issue tracker which are worth tackling as a separate project. Feel free to talk to us if you find something interesting. A prospective student should pick at least one ticket with medium to hard difficulty and discuss how to resolve it.
Difficulty: Medium-Hard
Expected result: The selected issues need to be fixed
Knowledge Prerequisite: C++
Mentor: Thomas Heller ()

Distributed solver and load balancing for Peridynamics using asynchronous parallelism

Abstract: Peridynamics is a reformulation of classical continuum mechanics (e.g., linear elastodynamics). The internal force at any point in the solid results from the interaction of that point with neighboring points within some distance ϵ. Typically, ϵ is much larger than the mesh size. As a result, the computation is more intensive and introduces more substantial data dependencies when partitioning the domain for parallel implementation. This project aims to develop and implement a distributed solver for Peridynamics in an existing codebase [1]. This project will benefit from the last year's GSoC student's effort on a similar goal but for a simplified nonlocal model [2]. In [2], several challenges associated with the parallelization of nonlocal models are highlighted, and algorithms are developed to address the challenges. In this project, we will apply techniques in [2] to the Peridynamics problem; first, we will implement the distributed solver; second, we will optimize the code so that compute node does the information exchange and calculation on the free degree of freedoms (DoFs) simultaneously to minimize the wait time. Finally, if possible, we will add the load balancing algorithm [2]. Here for the given compute node, free DoFs are those DoFs that do not depend on the data owned by other compute nodes. After GSoC, we intend to write a workshop paper based on this project's efforts and possibly present it at a computer science conference.

[1] https://github.com/nonlocalmodels/NLMech

[2] https://arxiv.org/abs/2102.03819

Difficulty: Medium-Hard
Expected result: Extend the existing shared memory code to a distributd code
Knowledge Prerequisite: C++
Mentor: Patrick Diehl () and Prashant K. Jha ()

Project: Template

Abstract:
Difficulty:
Expected result:
Knowledge Prerequisite:
Mentor: