GSoC 2016 Project Ideas - STEllAR-GROUP/hpx GitHub Wiki
Table of Contents:
- Trace Generation for HPX Tasks
- Integration of the cuBLAS library within hpxcl
- Extension/Evaluation of LibGeoDecomp::Region as an alternative adjacency container for Boost Graph Library
- Add mask move/assign wrappers for vectorization intrinsics
- Newtonian physics sandbox
- Implement your favorite Parcelport backend
- Implement a faster associative container for GIDs
- HPX in the Cloud
- Hybrid Cloud-Batch-Scheduling
- Work on parallel algorithms for HPX
- Port HPX to IOS
- Augment csv Files
- Implement a Map/Reduce Framework
- Implement a Plugin Mechanism for Thread schedulers
- Create parcelport based on websockets
- Script language bindings
- All to All Communications
- Distributed Component Placement
- A C++ Runtime Replacement
- A Free Resumable functions implementation
- Coroutine like Interface
- Bug Hunter
- Port the Graph500 Benchmark to HPX
- Port Mantevo Mini App(s) to HPX
- Create an HPX communicator for Trilinos project
- Implement an HPX debugger
- Project Template
Welcome to the HPX home page for Google Summer of Code (GSoC). This page provides information about student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with HPX. This page is also used to collect project ideas for the Google Summer of Code 2016. The STE||AR Group will apply as an organization and our goal is to get at least two students funded.
We are looking to fund work on a number of different kinds of proposals (for more details about concrete project ideas, see below):
- extensions to existing library features,
- new distributed data structures and algorithms, and
- multiple competing proposals for the same project.
Students must submit a proposal. A template for the proposal can be found here. Hints for writing a good proposal can be found here.
We strongly suggest that students interested in developing a proposal for HPX discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.
If the descriptions of these projects seem a little vague... Well, that's intentional. We are looking for students to develop requirements for their proposals by doing initial background research on the topic, and interacting with the community on the HPX mailing list ([email protected]) to help identify expectations.
- Trace Generation for HPX Tasks
- Integration of the cuBLAS library within hpxcl
- Extension/Evaluation of LibGeoDecomp::Region as an alternative adjacency container for Boost Graph Library
- Add mask move/assign wrappers for vectorization intrinsics
- Newtonian physics sandbox
- Implement your favorite Parcelport backend
- Implement a faster associative container for GIDs
- HPX in the Cloud
- Hybrid Cloud-Batch-Scheduling
- Work on parallel algorithms for HPX
- Port HPX to IOS
- Augment csv Files
- Implement a Map/Reduce Framework
- Implement a Plugin Mechanism for Thread schedulers
- Create parcelport based on websockets
- Script language bindings
- All to All Communications
- Distributed Component Placement
- A C++ Runtime Replacement
- A Free Resumable functions implementation
- Coroutine like Interface
- Bug Hunter
- Port the Graph500 Benchmark to HPX
- Port Mantevo Mini App(s) to HPX
- Create an HPX communicator for Trilinos project
- Implement an HPX debugger
- Project Template
- Abstract: Debugging task based applications is hard, one method that is used to help assess performance of an application is "trace visualization" where each task is shown on a plot as a coloured bar, with connections between bars normally indicating a communication path between processes and the size of the bar indicating duration (see image search for examples). The concept has been used for tracing MPI based applications for years, but is much more difficult with fine grained tasks as the tools used to generate traces are not designed for the job. Recently, support for generation of some task based traces was added to HPX as part of the APEX project. The objective of this project is to use libraries such as APEX, the Open Trace Format, Extrae to improve support for trace generation and analysis at task level (i.e. individual async calls). Exploration of task dependencies (DAG) and timings should be made possible.
- Difficulty: Medium
- Expected result: Generation of traces at the async task level in HPX
- Knowledge Prerequisite: C++, STL
- Mentor: John Biddiscombe (biddisco at cscs . ch) and others TBD
-
Abstract: N4310 is a C++ standardization proposal which is very likely to be accepted as a TS (technical specification) in 2015. It provides an abstraction of C++ parallel algorithms well aligned with the existing STL algorithms. HPX sorely misses a parallel and distributed algorithm module. This project should implement such a module on top of HPX. Some work in this direction has already been finished (see #1141 for our progress). While there is still some work to be done for #1141, we're especially interested in extending those parallel algorithms to work with distributed data structures, like
hpx::vector
. Some minimal work has been done in this direction as well (see #1338) but we look for interested students to continue implementing the distributed parallel algorithms. - Difficulty: Medium-Hard
- Expected result: Implementations of various parallel and distributed algorithms for HPX
- Knowledge Prerequisite: C++, STL
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)
- Abstract: HPX has already proven to run efficiently on ARM based systems. This has been demonstrated with an application written for Android tablet devices. A port to handheld devices running with IOS would be the next logical steps! In order to be able to run HPX efficiently on there, we need to adapt our build system to be able to cross compile for IOS and add a code to interface with the IOS GUI and other system services.
- Difficulty: Easy-Medium
- Expected result: Provide a prototype HPX application running on an iPhone or iPad
- Knowledge Prerequisite: C++, ObjectiveC, IOS
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)
- Abstract: A counter destination .csv file with header that contains small counter labels such that results of multiple samples with multiple input parameters can be logged with counters, but does not work correctly when counters are queried for multiple OS threads. This should be a simple fix and should be extended to multiple localities as well. Make additions to hpx that would add user define parameters to the counter destination file. This would enable a user to have their information along with the HPX counter info in one csv file. This could include input parameters as well as output such as Execution Time, or other pertinent output from the application, or the runtime (ie, number of threads etc.). Then write some python/pandas or R to do statistical processing and/or plots. Database ideas welcome, or get familiar with the APEX and Tau interfaces with HPX to do some data processing visualization using them and write up examples for users.
- Difficulty: Easy
- Expected result: .Augment csv files with user defined parameters and fix hpx-counters for multiple OS threads and interval, perform statistics and plot capabilites.
- Knowledge Prerequisite: familiarity and willingness to work with C++, python and pandas
- Mentors: Patricia Grubel (pagrubel at nmsu.edu), Hartmut Kaiser (hartmut.kaiser at gmail.com)
- Abstract: Map/Reduce frameworks are getting more and more popular for big data processing (for example Hadoop). By utilizing the unified and standards conforming API of the HPX runtime system, we believe to be able to perfectly repesent the Map/Reduce programming model. Many applications would benefit from direct support in HPX. This might include adding Hypertable or similar libraries to the mix to handle the large data sets Map/Reduce is usually used with.
- Difficulty: Medium-Hard
- Expected result: A propotypical implementation running on an order of 1000 compute nodes
- Knowledge Prerequisite: C++
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Andreas Schäfer (andreas.schaefer at fau.de)
- Abstract: Revise the thread scheduling subsystem of HPX such that it is more pluggable and allows for more fine grain control over what scheduler to use for the execution of a particular section of the user code. The proposal to the C++ Standards committee proposing work executors (see N3562) seems to provide a good starting point for a possible interface. Also, some initial work has been done already. However, more work needs to be applied to have everything working in an acceptable manner. One of the big advantages of executors is to express locality of where tasks are supposed to run. Examples for this are interactions with GUI libraries like Qt and proper NUMA memory placement.
- Difficulty: Easy-Medium
- Expected result: All existing schedulers need to converted to the plugin system and at least one example code needs to show the advantages of executors.
- Knowledge Prerequisite: C++
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Patricia Grubel (pagrubel at nmsu.edu)
- Abstract: Create a new parcelport which is based on websockets. The Websockets++ library seems to be a perfect starting point to avoid having to dig into the websocket protocol too deeply.
- Difficulty: Medium-Hard
- Expected result: A proof of concept parcelport based on websockets with benchmark results
- Knowledge Prerequisite: C++, knowing websockets is a plus
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)
- Abstract: Design and implement Python bindings for HPX exposing all or parts of the HPX functionality with a 'Pythonic' API. This should be possible as Python has a much more dynamic type system than C++. Using Boost.Python seems to be a good choice for this.
- Difficulty: Medium
- Expected result: Demonstrate functioning bindings by implementing small example scripts for different simple use cases
- Knowledge Prerequisite: C++, Python
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Adrian Serio (aserio at cct.lsu.edu)
- Abstract: Design and implement efficient all-to-all communication LCOs. While MPI provides mechanisms for broadcasting, scattering and gathering with all MPI processes inside a communicator, HPX currently misses this feature. It should be possible to exploit the Active Global Address Space to mimic global all-to-all communications without the need to actually communicate with every participating locality. Different strategies should be implemented and tested. A first and very basic implementation of broadcast already exisits which tries to tackle the above described problem, however, more strategies to granularity control and locality exploitation need to be investigated an implemented. We also have a first version of a gather utility implemented.
- Difficulty: Medium-Hard
- Expected result: Implement benchmarks and provide performance results for the implemented algorithms
- Knowledge Prerequisite: C++
- Mentor: Thomas Heller (thom.heller at gmail.com) and Andreas Schäfer (andreas.schaefer at fau.de)
- Abstract: Implement a EDSL to specify the placement policies for components. This could be done similar to [Chapels Domain Maps] (http://chapel.cray.com/tutorials/SC12/SC12-6-DomainMaps.pdf). In Addition, allocators can be built on top of those domain maps to use with C++ standard library containers. This is one of the key features to allow users to efficiently write parallel algorithms without having them worried to much about the initial placement of their distributed objects in the Global Address space
- Difficulty: Medium-Hard
- Expected result: Provide at least one policy which automatically creates components in the global address space
- Knowledge Prerequisite: C++
- Mentor: Thomas Heller (thom.heller at gmail.com) and Hartmut Kaiser (hartmut.kaiser at gmail.com)
- Abstract: Turn HPX into a replacement for the C++ runtime. We currently need to manually "lift" regular functions to HPX threads in order to have all the information for user-level threading available. This project should research the steps that need to be taken to implement a HPX C++ runtime replacement and provide a first proof of concept implementation for a platform of choice.
- Difficulty: Easy-Medium
- Expected result: A proof of concept implementation and documentation on how to run HPX application without the need of an hpx_main
- Knowledge Prerequisite: C++, Dynamic Linker, Your favorite OSes ABI to start programs/link executables
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)
-
Abstract: Implement resumable functions either in g++ or clang. This should be based on the corresponding proposal to the C++ standardization committee (see N4286. While this is not a project which directly related HPX, having resumable functions available and integrated with
hpx::future
would allow to improve the performance and readability of asynchronous code. This project sounds to be huge - but it actually shouldn't be too difficult to realize. - Difficulty: Medium-Hard
-
Expected result: Demonstrating the
await
functionality with appropriate tests - Knowledge Prerequisite: C++, knowledge of how to extend clang or gcc is clearly advantageous
- Mentor: Agustín Bergé (k at fusionfenix.com) and Hartmut Kaiser (hartmut.kaiser at gmail.com)
- Abstract: HPX is an excellent runtime system for doing task based parallelism. In its current form however results of tasks can only be expressed in terms of returning from a function. However, there are scenarios where this is not sufficient. One example would be lazy ranges of integers (For example fibonacci, 0 to n, etc.). For those a generator/yield construct would be perfect!
- Difficulty: Easy-Medium
- Expected result: Implement yield and demonstrate on at least one example
- Knowledge Prerequisite: C++
- Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)
- Abstract: In addition to our extensive ideas list, there are several active tickets listed in our issue tracker which are worth tackling as a separate project. Feel free to talk to us if you find something which is interesting to you. A prospective student should pick at least one ticket with medium to hard difficulty and discuss how it could be solved
- Difficulty: Medium-Hard
- Expected result: The selected issues need to be fixed
- Knowledge Prerequisite: C++
- Mentor: and Thomas Heller (thom.heller at gmail.com)
- Abstract: Implement Graph500 using the HPX Runtime System. Graph500 is the benchmark used by HPC industry to model important factors of many modern parallel analytical workloads. The Graph500 list is a performance list of systems using the benchmark and was designed to augment the Top 500 list. The current Graph500 benchmarks are implemented using OpenMP and MPI. HPX is well suited for the fine-grain and irregular workloads of graph applications. Porting Graph500 to HPX would require replacing the inherent barrier synchronization with asynchronous communications of HPX, producing a new benchmark for the HPC community as well as an addition to the HPX benchmark suite. See http://www.graph500.org/ for information on the present Graph500 implementations.
- Difficulty: Medium
- Expected result: New implementation of the Graph500 benchmark.
- Knowledge Prerequisite: C++
- Mentor: Patricia Grubel (pagrubel at nmsu.edu), and Thomas Heller (thom.heller at gmail.com)
- Abstract: Implement a version of one or more mini apps from the Mantevo project (http://mantevo.org/) using HPX Runtime System. We are interested in mini applications ported to HPX that have irregular workloads. Some of these are under development and we will have access to them in addition to those listed on the site. On the site, MiniFE and phdMESH would be a good additions to include in HPX benchmark suites. Porting the mini apps would require porting the apps from C to C++ and replacing the inherent barrier sycnhronization with HPX's asynchronous communication. This project would be a great addition to the HPX benchmark suite and the HPC community.
- Difficulty: Medium
- Expected result: New implementation of a Mantevo mini app or apps.
- Knowledge Prerequisite: C, C++
- Mentor: Patricia Grubel (pagrubel at nmsu.edu) and Thomas Heller (thom.heller at gmail.com)
- Abstract: The trilinos project (http://trilinos.org/) consists of many libraries for HPC applications in several capability areas (http://trilinos.org/capability-areas/). Communication between parallel processes is handled by an abstract communication API (http://trilinos.org/docs/dev/packages/teuchos/doc/html/index.html#TeuchosComm_src) which currently has implementations for MPI and serial only. Extending the implementation with an HPX backend would permit any of the Teuchos enabled Trilinos libraries to run in parallel using HPX in place of MPI. Of particular interest is the mesh partitioning library Zoltan2 (http://trilinos.org/packages/zoltan2/) which would be used as a test case for the new communications interface. Note that some new collective HPX algorithms may be required to fulfill the API requirements (see all-to-all-communications project above).
- Difficulty: Medium-Hard
- Expected result: A demo application for partitioning meshes using HPX and Zoltan.
- Knowledge Prerequisite: C, C++, (MPI)
- Mentor: John Biddiscombe (biddisco at cscs.ch) and Thomas Heller (thom.heller at gmail.com)
- Abstract: It is currently unreasonably hard to debug large scale HPX applications. This is mainly due to the fact that debuggers don't understand user-level threads. In addition, extracting useful information out of the current runtime state proves to be incredibly hard with gdb or lldb. In order to remedy this shortcoming, HPX users need to have the ability to 1) easily attach a debugger to a running large scale application on a supercomputer and 2) Implement pretty printers and intrinsic runtime capabilities to help debugging running HPX applications. A first implementation of this is available at (https://github.com/STEllAR-GROUP/hpx/tree/master/tools/gdb)
- Difficulty: Easy-Medium
- Expected result: Major improvement of the debugging experience of HPX applications.
- Knowledge Prerequisite: C++, Python, gdb
- Mentor: Thomas Heller (thom.heller at gmail.com) and Zach Byerly (zbyerly at gmail.com)
- Abstract: The cuBLAS library is the GPU accelerated version of Basic Linear Algebra Subroutines like solving linear equation systems. This tasks are of high importance in different applications for numerical simulations. During this project the student has the possibility to obtain insights in the basic linear algebra and the integration of acceleration cards to utilize the full capability of modern super computers. The main task of the project is the wrapping of the cuBLAS subroutines inside hpx actions to integrate them in the asynchronous execution graph of hpx. Some research is invested in the applicability of the existing data structures of cuBLAS to the serialization part. Depending on this result additional wrapping functionality for the data structures need to be provided.
- Difficulty: Easy-Medium
- Expected result: Tight integration of the cuBlas functionality into hpx::future for the asynchronous integration in the hpx execution graph
- Knowledge Prerequisite: C++ and CUDA
- Mentor: Patrick Diehl (diehl at ins.uni-bonn.de) and Thomas Heller (thom.heller at gmail.com)
Extension/Evaluation of LibGeoDecomp::Region as an alternative adjacency container for Boost Graph Library
- Abstract: The Boost Graph Library (BGL) offers a set of data structures to store various kinds of graphs, together with generic algorithms to operate on these. For certain classes of graphs, which are relevant to high performance computing (HPC), the adjacency information could be stored more efficiently via a data structure we have developed for LibGeoDecomp: the Region class. Region stores basically a set of 1D/2D/3D coordinates with run-length compression. A set of 2D coordinates is equivalent to a set of directed edges of a graph. For this project we'd be interested in an adaptation of the Region interface to make it usable within BGL. The expected interface of adjacency classes is well defined within BGL.
- Difficulty: Medium
- Expected result: Adapter class or extended Region class for use in BGL, evaluation via set of relevant benchmarks
- Knowledge Prerequisite: basic C++ and basic graph theory
- Mentor: Andreas Schaefer (gentryx at gmx.de)
- Abstract: Vectorization is a key technique to leverage the full potential of modern CPUs. LibFlatArray is a C++ library which helps with transitioning scalar numerical algorithms on objects to vectorized implementations. It comes with expression templates that enable the user to write code that encapsulate vector intrinsics but appear to the user like standard mathematical datatypes and operations. These templates (dubbed short_vec in LibFlatArray) currently lack a mechanism to selectively set certain lanes of the vector registers via conditional masks. If we had this functionality we'd be able to represent if/then/else constructs way more idiomatically. Intrinsics for mask generation/application are readily available in all current vector instruction sets (Intel/ARM/IBM), we simply lack convenient/efficient wrappers to utilize them.
- Difficulty: Medium
- Expected result: Wrapper functions for comparison (to generate masks) and conditional assignment (using masks)
- Knowledge Prerequisite: basic C++, vectorization via SSE, AVX/AVX2/AVX512
- Mentor: Andreas Schaefer (gentryx at gmx.de)
- Abstract: Develop a tool which allows users to create physical simulations of simple interacting solid objects (spheres, tetrahedrons) with little effort. A force-based, time discrete algorithm would be best suited for this. Ideally users could model a scene in Blender, export it to a text file, import that file in the tool and simulate it with LibGeoDecomp. LibGeoDecomp (https://www.libgeodecomp.org) is a library for scalable computer simulations. It can leverage HPX for parallelization, which is extremely relevant in this context to even out the computational load. LibGeoDecomp has been used in a number of science projects, but the complexity of these means they're not good to introduce new users. The purpose of this project is to show the capabilities of LibGeoDecomp as well as to provide a non-trivial, yet manageable example code. The focus of this project is the physical modelling (not too hard) and embedding it in LibGeoDecomp. Stretch goals could be in situ visualization via the existing VisIt interface, more complicated object geometries, or output to POVRay. A rudimentary sketch of what we'd like to accomplish can be seen in this video: https://www.youtube.com/watch?v=x8tQGzJQmbo
- Difficulty: Easy-Medium
- Expected result: Command line tool for force-based. time-discrete simulation of Newtonian physics of simple solids based on LibGeoDecomp
- Knowledge Prerequisite: pysical modelling, basic C++
- Mentor: Andreas Schaefer (gentryx at gmx.de)
- Abstract: The HPX runtime system uses a module called Parcelport to deliver packages over the network. An efficient implementation of this layer is indispensable and we are searching for new backend implementations based on CCI (https://github.com/CCI/cci), ucx (https://github.com/openucx/ucx) or libfabric (https://github.com/ofiwg/libfabric). All of these mentioned abstractions over various network transport layers offer the ability to do fast, one-sided RDMA transfers. The purpose of this project is to explore one of these and implement a parcelport using it.
- Difficulty: Medium-Hard
- Expected result: A proof of concept for a chosen backend implementation with performance results
- Knowledge Prerequisite: C++, Basic understanding of Network transports
- Mentor: Thomas Heller (thom.heller at gmail.com)
- Abstract: The HPX runtime system uses a Active Global Address Space (AGAS) in order to resolve global objects. Those objects are identified by 128bit unique global identifiers (GID). The performance of HPX relies on fast lookups of GIDs in associative containers. We currently experimented with binary search trees (std::map) and hash maps (std::unordered_map). However, we believe that we can implement a search data structure based on n-ary trees, tries or radix trees that exploit the structure of GIDs such that it allows us to have faster lookup and insertion.
- Difficulty: Medium-Hard
- Expected result: Various different container approaches to choose from together with realistic benchmarks to show the performance properties
- Knowledge Prerequisite: C++, Algorithms
- Mentor: Thomas Heller (thom.heller at gmail.com)
- Abstract: The HPX runtime system permits the dynamic (re-)allocation of computing resources during execution. This feature is especially interesting for dynamic execution environments, i.e., Cloud Computing, as the availability of resources is not necessarily statically defined before the deployment of a computational task. Yet, the implementation of a demonstrator is not merely restricted to writing source code, but as there is currently little experience with dynamic resizing and redistribution of tasks during runtime we also want to develop a set of heuristics for HPX on dynamic execution environments.
- Difficulty: Medium-Hard
- Expected result: Sample Benchmarks and Implementations for dynamic scaling using HPX
- Knowledge Prerequisite: C++, MPI, virtualization technologies (VMs and containers)
- Mentor: Alexander Ditter (alexander.ditter at gmail dot com)
- Abstract: Currently HPX is mostly executed on conventional batch scheduled HPC systems, using tools like SLURM, TORQUE or PBS for the deployment of jobs. Along with the trend towards more dynamic execution environments, i.e., Cloud Computing, grows the need to supply available resources from existing cluster systems in order deploy software in virtual machines or containers. For this reason we want to provide an intermediate scheduler that allows the concurrent use of batch scheduling and cloud middleware on the same physical infrastructure. This meta scheduler receives requests for the deployment of virtual machines and maps them onto the batch scheduling systems that manages the cluster infrastructure. An existing approach may be extended, refactored or just considered for inspirational purposes.
- Difficulty: Medium-Hard
- Expected result: An extensible open source meta scheduler for the concurrent use of cloud middleware (e.g., Opennebula or Open Stack) on top of batch systems.
- Knowledge Prerequisite: Basic knowledge on virtualization (libvirt), software engineering and scripting language(s) - e.g., Python
- Mentor: Alexander Ditter (alexander.ditter at gmail dot com)
- Abstract:
- Difficulty:
- Expected result:
- Knowledge Prerequisite:
- Mentor: