GSoC 2014 Project Ideas - STEllAR-GROUP/hpx GitHub Wiki

Table of Contents:

  1. Introduction
  2. Requirements
  3. Project Ideas

Introduction

Welcome to the HPX home page for Google Summer of Code (GSoC). This page provides information about student projects, proposal submission templates, advice on writing good proposals, and links to information on getting started writing with HPX. This page is also used to collect project ideas for the Google Summer of Code 2014. The STE||AR Group will apply as an organization and our goal is to get at least two students funded.

We are looking to fund work on a number of different kinds of proposals (for more details about concrete project ideas, see below):

  • extensions to existing library features,
  • new distributed data structures and algorithms, and
  • multiple competing proposals for the same project.

Requirements

Students must submit a proposal. A template for the proposal can be found here. Hints for writing a good proposal can be found here.

We strongly suggest that students interested in developing a proposal for HPX discuss their ideas on the mailing list in order to help refine the requirements and goals. Students who actively discuss projects on the mailing list are also ranked before those that do not.

If the descriptions of these projects seem a little vague... Well, that's intentional. We are looking for students to develop requirements for their proposals by doing initial background research on the topic, and interacting with the community on the HPX mailing list ([email protected]) to help identify expectations.

Project Ideas

Create an HPX backend for the ISPC Compiler

  • Abstract: The Intel ISPC (SPMD) compiler is a compiler for a variant of the C programming language with extensions for "single program, multiple data" (SPMD) programming. The language follows a similar programming model than CUDA or OpenCL but is solely targeted at SIMD capable CPUs. It uses clang as a frontend and therefore LLVM as the backend to generate code. One important feature of the language is the availability to spawn asynchronous tasks. Those asynchronous tasks align very well with the HPX programming model. Fortunately, the API that ISPC uses to invoke new asynchronous tasks is well documented (more information here). To leverage the possiblities provided by this compiler, the purpose of this project is to provide an implementation of that API using HPX functionalities. Moreover, we'd be interested in utilizing HPX's capabilities for distributed applications.
  • Difficulty: Easy-Medium
  • Expected result: The minimal expectations for that project is to have a functional HPX backend for ISPC. Benchmark results need to be presented showing the impact of the proposed backend.
  • Knowledge Prerequisite: C++
  • Mentor: Thomas Heller (thom.heller at gmail.com)

Create an HPX backend for Thrust

  • Abstract: Thrust is a library resembling the C++ Standard Template Library (STL). It provides an abstraction of C++ containers and parallel algorithms over different backend implementations (such as CUDA, OpenMP, and CPU). HPX sorely misses a parallel and distrbuted container and algorithm module. This project should use Thrust to implement such a module on top of HPX.
  • Difficulty: Medium-Hard
  • Expected result: A proof of concept backend for Thrust with benchmark results
  • Knowledge Prerequisite: C++, CUDA or STL
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)

Create an HPX backend for the Intel OpenMP Runtime library

  • Abstract: OpenMP is one of the most important entry level tools to parallelism: It provides an easy to use pragma based API to add parallelism to almost any kind of code. The Intel OpenMP Runtime Library provides a documented API to hook any parallel runtime system (in our case HPX) to the compiler generated parallel code. The runtime library interface is supported by almost any major compiler (Intel Compiler collection, Gnu Compiler Collection and Clang). The goal of this project is to allow HPX users to write OpenMP code without having to leave the comfortzone of the HPX runtime.
  • Difficulty: Medium-Hard
  • Expected result: A proof of concept backend for OpenMP with benchmark results
  • Knowledge Prerequisite: C++, preferrably some knowledge about OpenMP
  • Mentor: Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu)

Optimize the BlueGene/Q port of HPX

  • Abstract: The BlueGene/Q (BG/Q) is a supercomputer architecture from IBM which is equipped with an embedded PowerPC CPU (SoC). One of the key features of this architecture is a in-order CPU Design with a total of 64 hardware threads providing fast context switches between threads. In addition, IBM equipped the BG/Q with a network interface on chip which comes with an Active Message library (PAMI). Active Messages and one sided communication are one of the key concepts within HPX. Fast context switches and a networking layer taylored towards the needs of HPX lead to the fact that this system is a perfect match for HPX. In order to fully utilize such a machine, a fast user level context switching as well as a parcelport based on the PAMI library need to be provided. Access to a BG/Q will be provided.
  • Difficulty: Medium-Hard
  • Expected result: Provide benchmark results which show the benefits of the performance optimizations
  • Knowledge Prerequisite: C++, Assembly (preferrably experience with the PowerPC architecture)
  • Mentor: Thomas Heller (thom.heller at gmail.com) and Hartmut Kaiser (hartmut.kaiser at gmail.com)

Port HPX to IOS

  • Abstract: HPX has already proven to run efficiently on ARM based systems. This has been demonstrated with an application written for Android tablet devices. A port to handheld devices running with IOS would be the next logical steps! In order to be able to run HPX efficiently on there, we need to adapt our build system to be able to cross compile for IOS and add a code to interface with the IOS GUI and other system services.
  • Difficulty: Easy-Medium
  • Expected result: Provide a prototype HPX application running on an iPhone or iPad
  • Knowledge Prerequisite: C++, ObjectiveC, IOS
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)

Implement a Map/Reduce Framework

  • Abstract: Map/Reduce frameworks are getting more and more popular for big data processing (for example Hadoop). By utilizing the unified and standards conforming API of the HPX runtime system, we believe to be able to perfectly repesent the Map/Reduce programming model. Many applications would benefit from direct support in HPX. This might include adding Hypertable or similar libraries to the mix to handle the large data sets Map/Reduce is usually used with.
  • Difficulty: Medium-Hard
  • Expected result: A propotypical implementation running on an order of 1000 compute nodes
  • Knowledge Prerequisite: C++
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com)

Implement a Plugin Mechanism for Thread schedulers

  • Abstract: Revise the thread scheduling subsystem of HPX such that it is more pluggable and allows for more fine grain control over what scheduler to use for the execution of a particular section of the user code. The proposal to the C++ Standards committee proposing work executors (see N3562) seems to provide a good starting point for a possible interface. Also, some initial work has been done already. However, more work needs to be applied to have everything working in an acceptable manner. One of the big advantages of executors is to express locality of where tasks are supposed to run. Examples for this are interactions with GUI libraries like Qt and proper NUMA memory placement.
  • Difficulty: Easy-Medium
  • Expected result: All existing schedulers need to converted to the plugin system and at least one example code needs to show the advantages of executors.
  • Knowledge Prerequisite: C++
  • Mentor: Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu) and Patricia Grubel ([email protected])

Create parcelport based on websockets

  • Abstract: Create a new parcelport which is based on websockets. The Websockets++ library seems to be a perfect starting point to avoid having to dig into the websocket protocol too deeply.
  • Difficulty: Medium-Hard
  • Expected result: A proof of concept parcelport based on websockets with benchmark results
  • Knowledge Prerequisite: C++, knowing websockets is a plus
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)

Script language bindings

  • Abstract: Design and implement Python bindings for HPX exposing all or parts of the HPX functionality with a 'Pythonic' API. This should be possible as Python has a much more dynamic type system than C++. Using Boost.Python seems to be a good choice for this. A similar thing could be done for Lua. We'd suggest to base the Lua bindings on LuaBind, which is very similar to Boost.Python.
  • Difficulty: Medium
  • Expected result: Demonstrate functioning bindings by implementing small example scripts for different simple use cases
  • Knowledge Prerequisite: C++, Python or Lua
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com)

All to All Communications

  • Abstract: Design and implement efficient all-to-all communication LCOs. While MPI provides mechanisms for broadcasting, scattering and gathering with all MPI processes inside a communicator, HPX currently misses this feature. It should be possible to exploit the Active Global Address Space to mimic global all-to-all communications without the need to actually communicate with every participating locality. Different strategies should be implemented and tested. A first and very basic implementation of broadcast already exisits which tries to tackle the above described problem, however, more strategies to granularity control and locality exploitation need to be investigated an implemented.
  • Difficulty: Medium-Hard
  • Expected result: Implement benchmarks and provide performance results for the implemented algorithms
  • Knowledge Prerequisite: C++
  • Mentor: Thomas Heller (thom.heller at gmail.com)

Distributed Component Placement

  • Abstract: Implement a EDSL to specify the placement policies for components. This could be done similar to [Chapels Domain Maps] (http://chapel.cray.com/tutorials/SC12/SC12-6-DomainMaps.pdf). In Addition, allocators can be built on top of those domain maps to use with C++ standard library containers. This is one of the key features to allow users to efficiently write parallel algorithms without having them worried to much about the initial placement of their distributed objects in the Global Address space
  • Difficulty: Medium-Hard
  • Expected result: Provide at least one policy which automatically creates components in the global address space
  • Knowledge Prerequisite: C++
  • Mentor: Thomas Heller (thom.heller at gmail.com) and Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu)

A C++ Runtime Replacement

  • Abstract: Turn HPX into a replacement for the C++ runtime. We currently need to manually "lift" regular functions to HPX threads in order to have all the information for user-level threading available. This project should research the steps that need to be taken to implement a HPX C++ runtime replacement and provide a first proof of concept implementation for a platform of choice.
  • Difficulty: Easy-Medium
  • Expected result: A proof of concept implementation and documentation on how to run HPX application without the need of an hpx_main
  • Knowledge Prerequisite: C++, Dynamic Linker, Your favorite OSes ABI to start programs/link executables
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)

A Free Resumable functions implementation

  • Abstract: Implement resumable functions either in g++ or clang. This should be based on the corresponding proposal to the C++ standardization committee (see N3858. While this is not a project which directly related HPX, having resumable functions available and integrated with hpx::future would allow to improve the performance and readability of asynchronous code. This project sounds to be huge - but it actually shouldn't be too difficult to realize.
  • Difficulty: Medium-Hard
  • Expected result: Demonstrating the __resumable/__await functionality with appropriate tests
  • Knowledge Prerequisite: C++, knowledge of how to extend clang or gcc is clearly advantageous
  • Mentor: Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu)

Add a mechanism to integrate C++AMP with HPXCL

  • Abstract: The HPXCL project strives to build an infrastructure integrating GPGPUs (CUDA and OpenCL based) into HPX by allowing to manage those tasks transparently together with 'normal' HPX thread asynchrony. We would like to do the same (or similar) with C++AMP. There are two implementatons of C++AMP itself available, the original Microsoft implementation in Visual C++ and the other one supported by the HSA Foundation (as announced here)
  • Difficulty: medium-hard
  • Expected result: Demonstrating a nicely integrated C++AMP kernel with a simple HPX program
  • Knowledge Prerequisite: C++, C++AMP, GPGPUs (CUDA or OpenCL experience might be helpful)
  • Mentor: Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu), Thomas Heller (thom.heller at gmail.com), and Artur Laksberg (arturl at microsoft.com)

Coroutine like Interface

  • Abstract: HPX is an excellent runtime system for doing task based parallelism. In its current form however results of tasks can only be expressed in terms of returning from a function. However, there are scenarios where this is not sufficient. One example would be lazy ranges of integers (For example fibonacci, 0 to n, etc.). For those a generator/yield construct would be perfect!
  • Difficulty: Easy-Medium
  • Expected result: Implement yield and demonstrate on at least one example
  • Knowledge Prerequisite: C++
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu)

Parallel Sparse Matrix Container

  • Abstract: Implement an efficient sparse matrix container. LibGeoDecomp takes advantage of HPX for parallelizing computer simulations. HPX is extraordinarily well suited for irregular workloads, but LibGeoDecomp's current containers rather address regular, dense models. A suitable algorithm needs to address vectorization on both, multi-cores and accelerators, e.g. this one. An implementation would need to add hooks for LibGeoDecomp's domain decomposition infrastructure.
  • Difficulty: Medium
  • Expected result: A functional object container for sparse matrices, readily usable for LibGeoDecomp+HPX
  • Knowledge Prerequisite: C++, basic knowledge of numerical analysis and performance tuning is a plus
  • Mentor: Andreas Schäfer (andreas.schaefer at fau.de)

Bug Hunter

  • Abstract: In addition to our extensive ideas list, there are several active tickets listed in our issue tracker which are worth tackling as a separate project. Feel free to talk to us if you find something which is interesting to you. A prospective student should pick at least one ticket with medium to hard difficulty and discuss how it could be solved
  • Difficulty: Medium-Hard
  • Expected result: The selected issues need to be fixed
  • Knowledge Prerequisite: C++
  • Mentor: Hartmut Kaiser (hartmut.kaiser at gmail.com) and Thomas Heller (thom.heller at gmail.com)

Port Graph500 Benchmark to HPX

  • Abstract: Implement Graph500 using the HPX Runtime System. Graph500 is the benchmark used by HPC industry to model important factors of many modern parallel analytical workloads. The Graph500 list is a performance list of systems using the benchmark and was designed to augment the Top 500 list. The current Graph500 benchmarks are implemented using OpenMP and MPI. HPX is well suited for the fine-grain and irregular workloads of graph applications. Porting Graph500 to HPX would require replacing the inherent barrier synchronization with asynchronous communications of HPX, producing a new benchmark for the HPC community as well as an addition to the HPX benchmark suite. See http://www.graph500.org/ for information on the present Graph500.
  • Difficulty: Medium
  • Expected result: New implementation of Graph500 benchmark.
  • Knowledge Prerequisite: C++
  • Mentor: Patricia Grubel ([email protected]) and Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu)

Port Mantevo mini App(s) to HPX

  • Abstract: Implement a version of one or more mini apps from the Mantevo project (http://mantevo.org/) using HPX Runtime System. We are interested in mini applications ported to HPX that have irregular workloads. Some of these are under development and we will have access to them in addition to those listed on the site. On the site, MiniFE and phdMESH would be a good additions to include in HPX benchmark suites. Porting the mini apps would require porting the apps from C to C++ and replacing the inherent barrier sycnhronization with HPX's asynchronous communication. This project would be a great addition to the HPX benchmark suite and the HPC community.
  • Difficulty: Medium
  • Expected result: New implementation of a Mantevo mini app or apps.
  • Knowledge Prerequisite: C, C++
  • Mentor: Patricia Grubel ([email protected]) and Bryce Adelstein-Lelbach (blelbach at cct.lsu.edu)

Project: Template

  • Abstract:
  • Difficulty:
  • Expected result:
  • Knowledge Prerequisite:
  • Mentor:
⚠️ **GitHub.com Fallback** ⚠️