Installing TextZoom - lmmx/devnotes GitHub Wiki

Models

To quickly download TextZoom's required models:

git clone [email protected]:JasonBoy1/TextZoom.git
git clone [email protected]:meijieru/crnn.pytorch.git
git clone [email protected]:Canjie-Luo/MORAN_v2.git
git clone [email protected]:ayumiymk/aster.pytorch.git

Create and run aster.pytorch/download_model.sh:

wget https://github.com/ayumiymk/aster.pytorch/releases/download/v1.0/demo.pth.tar
mkdir models
mv demo.pth.tar models/

Create and run crnn.pytorch/download_model.sh:

wget --content-disposition "https://www.dropbox.com/s/dboqjk20qjkpta3/crnn.pth?dl=1"
mv "crnn.pth" models/

Create and run MORAN_v2/download_model.sh:

wget "https://docs.google.com/uc?export=download&id=1IDvT51MXKSseDq3X57uPjOzeSYI09zip" -O "moran.pth"
mv moran.pth models/

Then change TextZoom/src/config/super_resolution.yml lines 39-41 to:

    rec_pretrained: '../../../aster.pytorch/models/demo.pth.tar'
    moran_pretrained: '../../../MORAN_v2/models/moran.pth'
    crnn_pretrained: '../../../crnn.pytorch/models/crnn.pth'

Also change the number of GPUs if you don't have 4, line 14:

  ngpu: 1

...and the model setup is done!

Additionally, in this issue thread the author JasonBoy1 says

A buddy uploaded the best model weights after 42000 epochs of training on TextZoom data: https://drive.google.com/file/d/1j-g17V5kBmqS8giWNZuHe2d_GefAepCs/view?usp=sharing

So also make one for TextZoom/download_model.sh

wget "https://docs.google.com/uc?export=download&id=1j-g17V5kBmqS8giWNZuHe2d_GefAepCs" -O "textzoom.pth"
mkdir models
mv textzoom.pth models/

To get it to run on a single GPU, you need to change the section at line 157 of interfaces/base.py (H/T yustiks) from:

                if self.config.TRAIN.ngpu == 1:
                    model.load_state_dict(torch.load(self.resume)['state_dict_G'])

to:

                if self.config.TRAIN.ngpu == 1:
                    state_dict = torch.load(self.resume)['state_dict_G']
                    from collections import OrderedDict
                    new_state_dict = OrderedDict()

                    for k, v in state_dict.items():
                        if 'module' not in k:
                            k = 'module.'+k
                        else:
                            k = k.replace('features.module.', 'module.features.')
                        new_state_dict[k]=v

                    model.load_state_dict(new_state_dict)
                    # model.load_state_dict(torch.load(self.resume)['state_dict_G'])

Requirements

From a quick look with grep, the requirements are:

torch
torchvision
IPython
numpy
pillow
PyYAML
easydict
editdistance
scipy
tqdm
cv2
opencv-python
lmdb
six
thop
matplotlib
  • Note that this means pytorch comes conda but thop comes from pip...
conda create -n textzoom
conda activate textzoom
conda install cudatoolkit=11.0.3 -c conda-forge
conda install pytorch=1.7.1 torchvision=0.8.2 -c pytorch # also gets python=3.8.5
pip install $(cat requirements.txt | grep -Ev "^(torch(vision)?)$")

Lastly, ensure the environment variable CUDA_VISIBLE_DEVICES is set, else add to your .bashrc:

export CUDA_VISIBLE_DEVICES="0"

Demo

Run the pre-trained model by creating the directory images/ in the repo's top directory (alongside src, not inside it) then cd src/:

python3 main.py --demo --demo_dir='../images/' --resume='../models/textzoom.pth' --STN --mask

/home/louis/dev/sr/TextZoom/src/interfaces/base.py:155: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if self.resume is not '':
/home/louis/dev/sr/TextZoom/src/interfaces/base.py:206: SyntaxWarning: "is not" with a literal. Did you mean "!="?
  if index is not 0:
/home/louis/dev/sr/TextZoom/src/model/recognizer/stn_head.py:79: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if self.activation is 'none':
loading pre-trained model from ../models/textzoom.pth 
load pred_trained aster model from ../../aster.pytorch/models/demo.pth.tar
  0%|                                                                                                                                                                                                                   | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 44, in <module>
    main(config, args)
  File "main.py", line 16, in main
    Mission.demo()
  File "/home/louis/dev/sr/TextZoom/src/interfaces/super_resolution.py", line 299, in demo
    images_sr = model(images_lr)
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/louis/dev/sr/TextZoom/src/model/tsrn.py", line 66, in forward
    block = {'1': self.block1(x)}
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 4, 9, 9], expected input[1, 5, 32, 256] to have 4 channels, but got 5 channels instead

TODO: patch, debug further or wait for patches (seems to be under active development)

  • Perhaps report here

  • Reminder to self: conda environment is textzoom

⚠️ **GitHub.com Fallback** ⚠️