Backlog - ParallelComputing2017/CNN GitHub Wiki

CNN

Create abstract factory for layer creation
Refactor Tensor to class
MNIST class
Replace vector with Neural Network class
Implement Downpour SGD
To makefile project

Pthreads

Parallel testing

OpenMP

Parallel testing

CUDA

Create array wrapper ( size 1 to n)
Use unified memory (see CUDA introduction)
Create device pointers in conv layer constructor
Free device memory on conv layer destructor
Update parameters on active function of conv layer
Full conv layer on device
Allocate device memory on cudaTensor constructor

OpenCL

Prepare program arguments
Implement Conv layer activation
Read kernel from file

MPI

Run the sequential version on each host
Run with mini batchs
Receive the trained model from each host

Experiment

Test Error vs weights updates (iterations)
Test Error vs epochs
CUDA vs OpenGL
Max speed up by implementation

Paper

Add link to source code repository
Review more references (8/10)

⚠️ GitHub.com Fallback ⚠️