Reverse Indexing - byuawsfhtl/RLL_computer_vision GitHub Wiki

Upload files to Reverse Indexing

How to Upload Data to Reverse Indexing.pdf

Get finished files from Dropbox: (Timothy Brown)

Download 'finished-' and 'ri- files' (if there) and put them in: V:\FHSS-JoePriceResearch\data\computer_vision\reverse_indexing_results

Run through Programs:

reverse_indexing_results\python_files\RI_Pipeline.py

(Adjust variables for how sensitive/specific you would like the finished products to be, as well as new names of the files.)

**Outputs should be: **

  • Training CSV (not finished-still needs images)
    • This should include dummies for if there was a (-3,-1,1, or 3) in v_5 (that indicates how well the program did at identifying the word)
  • CSV containing the list of images that are 100% correct
    • This list of words will be taken out of the website because they are finished.
  • CSV containing the list of images that are 100% incorrect
    • This list of words will be removed until the model is updated and then re-inserted.
  • CSV containing a list of words that the model is good at recognizing (80%-95% accuracy?)
  • CSV containing a list of words that the model is bad at recognizing (0-5% accuracy (depending on how much we want to account for behavioral economic mistakes)

THEN: Pair the training data with the images: Program?

Hopefully, this is finished (When it is, come up with an automated way to do this)

Odds are, we will have to upload everything to the supercomputer for Denmark.

Send Outputs:

Denmark: Finished training data set containing: [FILE PATH IN SUPERCOMPUTER]

  • image .tar
  • training.csv

BYU reverse indexing dudes:

  • Handwriting Training Data (same that Denmark gets) if they would like it
  • CSV’s with info on pictures that the algorithm doesn’t need to keep looking at
  • as well as CVS of words that the algorithm is very accurate at recognizing (95% accurate? Example: M vs F, the number 0 etc.).

Wait for more Dropbox files to be sent over! (potentially remind the CS lab to create those if you haven't heard from them in 30+ days)

Overarching Goal:

This reverse indexing process is a really smart way to have people around the world continuously update our model on their own time. The goal of this side of the project is to make a continuously rolling wheel so that every month, we are able to get the data that people are creating and feed it back into the algorithm as smoothly as possible. Then the Reverse Indexing and Denmark teams will be able to make their models and website continuously better as the years progress.