Training a new model - 3C-SCSU/Avatar GitHub Wiki

This information is a bit vague, but it should give you a good start with training a new model.

Ensure that the data you're using to train the model is clean (sometimes a few rogue files seem to make it in there)
- Feel free to use file-shuffler/utilities/label_counter.py to find any non-CSV files that may be polluting your data. It will also count the columns contained by each CSV file so you can verify that they all share the same column names
Once you've ensured the data's integrity, make a script that will read in the CSV files and train the model (Stick as close to the script at server/413_will_sara_training_nb.ipynb to ensure your model works)
- Read in CSV data from each folder
  - Make sure you skip the first row, since that just contains the headers
  - Something like this is best for reading one CSV file at a time: df = pd.read_csv(filename, header=None, skiprows=1, names=[f'_c{x}' for x in range(32)] + ['label'])
  - Note the names we are giving to the columns. This is important, as your model may not work if you use different column names
- Make sure you label each piece of data according to what action they correspond to
  - Ex: df['label'] = "backward" # Use this for data from the backward folder
- Concatenate all the dataframes you loaded into one big dataframe
- Encode the labels with scikit learn
  - Ex: df['label'] = LabelEncoder().fit_transform(df['label'])
- Split the data for training and testing
  - Ex: train_test_split(df, test_size=0.2)
- Convert the training and testing datasets into tensorflow datasets
- Train the model with tensorflow random forest model
- Fit the model with the training data
- Evaluate the model with the testing data to find how effective it is