Applying to GSOC 18 - grasseau/HAhRD GitHub Wiki

Expected results from the student

  • The student work will be devoted to build a part of a software machinery using CNN to identify clusters of pixels, the long term objective being to associate the clusters to the detector (HGCAL) in-going particles. The different developed processes must combine the efficiency in time and the quality of the classification/identification. The programming language to handle this machinery is Python (C++ embedded routines could be developed). The student evaluation will focus on the development part and not on scientific aspects or the classification quality.

Proposed scheduling

Based on our GSOC'18 proposal and the guidelines below, we (applicants and mentors) must provide a detail scheduling during the 'coding period' with the technology used and the expected production deadline. A first proposal is expected from you before the March 5th, next we will iterate and refine the coding period program during the discussion period until the March 12.

For more about the scheduling please see the GSOC Scheduling

Guidelines for the scheduling:

  • Since the heterogeneous hexagonal meshes doesn't fit well with the CNN standard tools, the HGCAL mesh must be interpolate on a 3D-rectangular mesh. This a delicate task which must be carefully validated with different resolution of the 3D-rectangular mesh (we will use the old clustering method to validate this step). The estimate time to have an efficient interpolation is at least 2 weeks.
  • coding the filtering part from the events input file with different criteria : particle type, energy, position, direction, area of the detector (select a kind of clusters)
  • Dealing with coding synthetic event (synthetic event must be generated to learn the CNN)
  • coding all the software chain (convolution, pooling, ReLu, NN) and coding the automatic learning criteria
  • using GPUs platforms / optimization (C++ embedded)
  • visualizing the different stages of the chain
  • coding the module which computes the quality of the classification (compare with the current method used)
  • learning step on simple user-cases (for example photons with high energy or synthetic events)
  • Test / Evaluating the quality of the pipeline on the simple user-cases
  • Adjusting the Software chain

Other possibilities

According to the progress status (result quality, time efficiency, ..) several ways will be investigated:

  • adapt the software machinery to deal with more complex events (improvement of event filter or the event generator)
  • take into account other sensors of the HGCAL detector (scintillator region)
  • improve the visualization of the intermediate states of the CNN (how get relevant information from the different modules of the CNN to improve it)

Evaluating the candidates

Most of you are graduated with a Bachelor's degree Computer Science and undergraduate students in Master's degree. You all know deeply Python, C++, and so on, and providing you a test on CNN, fixing tools/environments, that could disadvantage the students which are unfamiliar with the given tool-set/environment, is not the best way to select candidate. We expect from you to run the starting code, to understand globally the project, to get your feedback (discussion with us) on the project (what is in our own opinion the best tool suite to implement CNN, ...) and finally to make a scheduling proposal for the coding period with the different tasks and the estimated work time (according to your skills). At final, we must provide to GSOC organizers an ordered list of 3 candidates who match well with our proposal.

Be sure being at full time work during the coding period May 14 - August 16

If not, we will not validate the concerned period. It is not an internship.