Home - Spark-clustering-notebook/coliseum GitHub Wiki

Welcome to the coliseum wiki!

Welcome to the Wiki webpage of Spartakus (Spark-clustering-notebook)! This Wiki webpage introduces somme clustering algorithms and describes its current implementation in the software using since 2012 Spark and Spark-notebook. This notebook has a dual purpose: teaching and research

The Wiki page is currently under initial construction, so come back soon. If you are interested in improving Spartakus (Spark-clustering-notebook) right now, contact us.

Getting started (https://github.com/Spark-clustering-notebook/coliseum)

Team

  • Mustapha LEBBAH. Resp. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13,

  • Hanane Azzag. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Tarn Duong. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Tugdual Sarazin. Lead Data Engineer

  • Mohammed Ghesmoune. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Gael Beck. Phd student Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Doan Nhat Quang. ICT Lab, University of Science and Technology of Hanoi

  • Several students :

  • Quan Cao Anh (USTH of Hanoi, 2016), Omar Masmoudi (Tunis, 2015), Hugo Driviere (IUTV, 2016), Oscar ODIC (IUTV, 2013), Camille Gerin-Roze (2013), Victor Duvert (IUTV 2013), Aissa El Ouafi (IUTV 2013).

  • Amine Chaibi, Phd, Data scientist at Carrefour

    Thanks to Kensu (Andy Petrella and Xavier Tordoir) to help us to package the algorithms on spark-notebook

Publications

  • Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah: A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework. BigData 2017: 911-916
  • Tarn Duong, Gael Beck, Hanene Azzag, Mustapha Lebbah. Nearest neighbour estimators of density derivatives, with application to mean shift clustering. Pattern Recognition Letters (2016). http://dx.doi.org/10.1016/j.patrec.2016.06.021
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanane Azzag. state-of-the-art on clustering data stream (invited paper). Big Data Analytics journal, 2016
  • G. Beck, T. Duong, H. Azzag and M. Lebbah, "Distributed mean shift clustering with approximate nearest neighbours," 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, 2016, pp. 3110-3115. doi: 10.1109/IJCNN.2016.7727595. http://ieeexplore.ieee.org/abstract/document/7727595/
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanane Azzag. A new growing neural gas for clustering data streams. Neural Networks, Special Issue on Neural Network Learning in Big Data, 2016. http://dx.doi.org/10.1016/j.neunet.2016.02.003
  • Mohammed Ghesmoune, Mustapha Lebbah, Hanene Azzag. Micro-Batching Growing Neural Gas for Clustering Data Streams using Spark Streaming. Procedia Computer Science journal (2015) pp. 158-166. Doi 10.1016/j.procs.2015.07.290. Paper presented at INNS Conference on Big Data, 8-10 August 2015 – San Francisco, USA)
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. Clustering over data streams based on growing neural gas. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining. PAKDD (2) 2015: 134-145.
  • Tugdual Sarazin, Mustapha Lebbah, and Hanane Azzag. Biclustering using spark- mapreduce. In 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, October 27-30, 2014, pages 58–60, 2014.
  • Tugdual Sarazin, Hanane Azzag, and Mustapha Lebbah. 2014. SOM Clustering Using Spark-MapReduce. In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW '14). IEEE Computer Society, Washington, DC, USA, 1727-1734. DOI=10.1109/IPDPSW.2014.192

French speaking conferences

  • Gaël Beck, Hanane Azzag, Mustapha Lebbah, Tarn Duong. Mean-shift : Clustering scalable et distribué, pp.415-425. EGC 2018
  • Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah. Nouveau Modèle de Sélection de Caractéristiques basé sur la Théorie des Ensembles Approximatifs pour les Données Massives, pp.377-378. EGC 2018 (Poster)
  • Mohammed Ghesmoune, Mustapha Lebbah and Hanane Azzag. G-Stream : une approche incrémentale pour le clustering de flux de données. In SFC 2015, 09-11 Septembre 2015, Nantes.
  • Mohammed Ghesmoune, Hanane Azzag and Mustapha Lebbah. Une nouvelle méthode topologique pour le clustering de flux de données. In COSI 2015, Coloque sur l’optimisation et les systèmes d’information, Oran, 01-03 Juin 2015.
  • Mohammed Ghesmoune, Mustapha Lebbah, Hanane Azzag. Clustering topologique pour le flux de données. In EGC 2015, vol. RNTI-E-28, pp.137-142.
  • Tugdual Sarazin, Hanane Azzag, Mustapha Lebbah. Modèle de Biclustering dans un paradigme "Mapreduce". In EGC 2015, vol. RNTI-E-28, pp.467-468