Home - Spark-clustering-notebook/coliseum GitHub Wiki
Welcome to the coliseum wiki!
Welcome to the Wiki webpage of Spartakus (Spark-clustering-notebook)! This Wiki webpage introduces somme clustering algorithms and describes its current implementation in the software using since 2012 Spark and Spark-notebook. This notebook has a dual purpose: teaching and research
The Wiki page is currently under initial construction, so come back soon. If you are interested in improving Spartakus (Spark-clustering-notebook) right now, contact us.
Getting started (https://github.com/Spark-clustering-notebook/coliseum)
Team
-
Mustapha LEBBAH. Resp. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13,
-
Hanane Azzag. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13
-
Tarn Duong. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13
-
Tugdual Sarazin. Lead Data Engineer
-
Mohammed Ghesmoune. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13
-
Gael Beck. Phd student Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13
-
Doan Nhat Quang. ICT Lab, University of Science and Technology of Hanoi
-
Several students :
-
Quan Cao Anh (USTH of Hanoi, 2016), Omar Masmoudi (Tunis, 2015), Hugo Driviere (IUTV, 2016), Oscar ODIC (IUTV, 2013), Camille Gerin-Roze (2013), Victor Duvert (IUTV 2013), Aissa El Ouafi (IUTV 2013).
-
Amine Chaibi, Phd, Data scientist at Carrefour
Thanks to Kensu (Andy Petrella and Xavier Tordoir) to help us to package the algorithms on spark-notebook
Publications
- Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah: A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework. BigData 2017: 911-916
- Tarn Duong, Gael Beck, Hanene Azzag, Mustapha Lebbah. Nearest neighbour estimators of density derivatives, with application to mean shift clustering. Pattern Recognition Letters (2016). http://dx.doi.org/10.1016/j.patrec.2016.06.021
- Mohammed Ghesmoune, Mustapha Lebbah, and Hanane Azzag. state-of-the-art on clustering data stream (invited paper). Big Data Analytics journal, 2016
- G. Beck, T. Duong, H. Azzag and M. Lebbah, "Distributed mean shift clustering with approximate nearest neighbours," 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, 2016, pp. 3110-3115. doi: 10.1109/IJCNN.2016.7727595. http://ieeexplore.ieee.org/abstract/document/7727595/
- Mohammed Ghesmoune, Mustapha Lebbah, and Hanane Azzag. A new growing neural gas for clustering data streams. Neural Networks, Special Issue on Neural Network Learning in Big Data, 2016. http://dx.doi.org/10.1016/j.neunet.2016.02.003
- Mohammed Ghesmoune, Mustapha Lebbah, Hanene Azzag. Micro-Batching Growing Neural Gas for Clustering Data Streams using Spark Streaming. Procedia Computer Science journal (2015) pp. 158-166. Doi 10.1016/j.procs.2015.07.290. Paper presented at INNS Conference on Big Data, 8-10 August 2015 – San Francisco, USA)
- Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. Clustering over data streams based on growing neural gas. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining. PAKDD (2) 2015: 134-145.
- Tugdual Sarazin, Mustapha Lebbah, and Hanane Azzag. Biclustering using spark- mapreduce. In 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, October 27-30, 2014, pages 58–60, 2014.
- Tugdual Sarazin, Hanane Azzag, and Mustapha Lebbah. 2014. SOM Clustering Using Spark-MapReduce. In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW '14). IEEE Computer Society, Washington, DC, USA, 1727-1734. DOI=10.1109/IPDPSW.2014.192
French speaking conferences
- Gaël Beck, Hanane Azzag, Mustapha Lebbah, Tarn Duong. Mean-shift : Clustering scalable et distribué, pp.415-425. EGC 2018
- Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah. Nouveau Modèle de Sélection de Caractéristiques basé sur la Théorie des Ensembles Approximatifs pour les Données Massives, pp.377-378. EGC 2018 (Poster)
- Mohammed Ghesmoune, Mustapha Lebbah and Hanane Azzag. G-Stream : une approche incrémentale pour le clustering de flux de données. In SFC 2015, 09-11 Septembre 2015, Nantes.
- Mohammed Ghesmoune, Hanane Azzag and Mustapha Lebbah. Une nouvelle méthode topologique pour le clustering de flux de données. In COSI 2015, Coloque sur l’optimisation et les systèmes d’information, Oran, 01-03 Juin 2015.
- Mohammed Ghesmoune, Mustapha Lebbah, Hanane Azzag. Clustering topologique pour le flux de données. In EGC 2015, vol. RNTI-E-28, pp.137-142.
- Tugdual Sarazin, Hanane Azzag, Mustapha Lebbah. Modèle de Biclustering dans un paradigme "Mapreduce". In EGC 2015, vol. RNTI-E-28, pp.467-468