Supercomputer segmentation - byuawsfhtl/RLL_computer_vision GitHub Wiki

Segmentation

Segmentation is cropping an image to small text boxes (snippets). This is done to isolate single bits of text: a name on the top of the page, the date next to it, and the title of the document can all be segmented into snippets.

Running a model

See either the files in the GitHub or on the super computer (at ~/fsl_groups/fslg_census/tools/segmentation, there's a readme there that has this same information). tableSegmentation.py is a work in progress.

Before use:

  • get a model trained on your segmentation task
  • get images to run segmentation on

To use:

  1. copy the python / sh files to your project directory
  2. edit the variables at the start of segmentation.py/tableSegmentation.py to have file paths / parameters that match your project
  3. edit jobSegment.sh to have your email (for email notifications)
  4. run sbatch jobSegment.sh

.

.

.

Troubleshooting Detectron2

If you are hitting issues with the current Detectron2 environment provided by the scripts, try using a different one, making a new one, or fixing the existing one (carefully though).

There are existing environments that should be able to run Detectron2.

Using an existing environment

Use one of the following environments:

~/fsl_groups/fslg_census/compute/projects/segment_env
~/fsl_groups/fslg_death/compute/new_death_env
~/fsl_groups/fslg_census/compute/envs/census_box_env

If activating these environments do not work, or a new environment is needed, the steps below should help you create a new environment.

Creating an environment

module load python/3.8
virtualenv env_name

You can activate your environment by running source env_path/bin/activate

Downloading packages for Detectron2

After activating your environment, you can begin downloading packages. Install open cv with pip install opencv-python. To get detectron2 follow the instructions here. This table tells you the command to run to get a specific version of detectron2 given some CUDA version (you can find this with nvidia-smi) and a torch version (get the latest compatible version from here).

Your environment should be able to run Detectron2. If there are additional packages you need, activate your environment and download the necessary packages into it (most likely by using pip).

.

.

.

OLD INSTRUCTIONS

This isn't the best method anymore, but it can help you understand older projects if they are using this method.

Create necessary directories

Create necessary directories

You will need to make the proper directories to hold the images, weights, scripts, etc. before starting the project.

You can easily make these directories by using the mkdir command:

mkdir dir_name

Example: mkdir new_project

In order to have all the proper directories:

  1. Make a general project directory (i.e. ohio_death_3). Make sure you are already in the directory that you want the general project directory to be sorted into (i.e. fslgroups/fslg_death/compute/projects/ohio_death_3)
  2. Within the general project directory make an images directory, and a bounding boxes directory (i.e. ohio_death_3/imgs, ohio_death_3/bounding_boxes)
  3. Within the images directory make an original images directory and a segmented images directory (i.e. ohio_death_3/imgs/orig, ohio_death_3/imgs/segmented)
  4. (Optional) Within the original images directory make a decompressed and compressed directories (i.e. ohio_death_3/imgs/orig/decompressed, ohio_death_3/imgs/orig/compressed)
  5. (Optional) Within the segmented images directory make a snippets directory (i.e. ohio_death_3/imgs/segmented/snippets)

Create Necessary Scripts

You'll need to make copies of, and modify the following scripts to use the proper directories.

  • album_labels.txt (does not need to be modified)
  • snippet_mkdir.py and resturcture_year.py
  • segment.sh
  • project_segment.py Some examples of what your scripts might look like after editing end:
  • ohio_segment.py (example of project_segment.py)
  • ohio_segment.sh (example of segment.sh)

Very old, not up to date

These files can be found in the Files section in Microsoft Teams (warning: deprecated), in the segmentation folder. You can download them by clicking on the three dots next to the name of the file. You can also view the file by clicking on its name.

Alternatively, you can use the cp command on the supercomputer to copy an existing file to a new location. You can also copy existing files to use as a base, and then edit them on the supercomputer.

cp location/of_old_file/filename.end location/you_want/

Example: cp ./album_labels.txt ../../data/texas

This may be the easier option.

The majority of modifications to the scripts will be adjusting for proper filepaths and adjusting for proper number of classes. (The changes made will be similar to the changes that will be made to the Google Colab scripts)

Move Files Over

You will now want to start making sure you have all of the proper scripts, weights, etc. in your directories.

You can move your files over from the home desktop you are using by using the scp command:

scp filename.end [email protected]:/fslhome/username/fsl_groups/fslg_groupname/compute/desired_location/of_file

Example: scp segmentation.py [email protected]:~/fsl_groups/fslg_death/compute/projects/ohio_death_3

You will need to enter your password and a duo authentification number every time you move something over to the supercomputer.

Alternatively, you can create new files on the supercomputer by using the touch command and then copying and pasting the modified script using a text editor like nano or vim.

touch my_script.py

nano my_script.py

Example: nano new_segmentation.py

Your general project directory should contain: your images directory, your bounding boxes directory, album_labels.txt, your version of segment.sh, your version of project_segment.py, and your weights that you trained.

Your images directory should contain: your original images directory, your segmented images directory, directory_prep.py or your snippet_mkdir.py and your resturcture_year.py.

You should move your images you are running segmentation on to either your original images directory or the decompressed images directory. Depending on what project you are doing images could be stored in different locations, so be sure to know where to find the images you need to move over. (If you already have created weights for segmentation, you likely know the location of the images and have probably moved some, if not all of them over.)

Your snippets will either be held within your segmented images directory or the snippets directory within it. (These won't appear until a few more steps are completed.)

Example file structure:

project
|- album_labels.txt
|- bounding boxes
   |- box1
   |- box2
...
|- imgs
   |- directory_prep.py
   |- restructure_year.py
   |- orig
      |- pic1.jpg
      |- pic2.jpg
...
   |- segmented
      |- snippets
         |- snip1.jpg
         |- snip2.jpg
...

Segmentation

run directory_prep.py or restructure_year.py and snippet_mkdir.py on the directory holding the images you will be segmenting

python restructure_year.py original_images_dir - Runs the resturcture_year.py on the original images directory to split images into 1000 subdirectories

(Make sure that the images you want to divide are in another folder inside of the original folder or the script will not work)

python snippet_mkdir.py snippet_dir - Runs the snippet_mkdir.py on the snippet directory created inside of the segmented directory

python directory_prep.py directory_name - Runs directory_prep.py on the given directory

python directory_prep.py 1927 - Runs directory_prep.py on directory 1927

This will create subdirectories so that multiple images can be processed and segmented simultaneously, speeding up the process.

Next, run your version of segment.sh on the directory of images you want to segment (usually a year) by using a sbatch command

sbatch segment.sh image_directory - Runs segment.sh on all the images in the directory listed

sbatch segment.sh 1927 - Runs segment.sh on all the images in 1927

This sbatch command will activate your environment and then run your python segmentation script on all of the subdirectories created by directory_prep.py simultaneously

Continue this process until you have run the script on all directories containing the images you need segmented. (If you check a slurm file and identify an error is occurring in the sbatch script or the .py file use scancel and the numbers in the slurm file to cancel the job e.g. scancel 12345678)

You can check on the progress of the segmentation

  1. squeue -u your_username
  2. check one of the slurm files by vim slurm-12345678_100.out (change to the appropriate numbers of the files you want to check)

Often times, you will not have enough filespace to run segmentation on every directory you need. You will need to compress (and remove) files that you have completed segmentation for. This is covered in more detail in the next section.

General File Management

Because the supercomputer has a limited amount of space, we often will be decompressing and compressing files to make sure we have enough room to run segmentation. After segmenting a folder, you will want to compress it (and then remove the old files) immediately afterwards in order to conserve space on the supercomputer.

To compress:

tar -cvf new_filename.tar directory_to_compress

To decompress:

tar -xvf compressed_file.tar

Check filespace:

fslquota