Installing TextZoom - lmmx/devnotes GitHub Wiki
To quickly download TextZoom's required models:
git clone [email protected]:JasonBoy1/TextZoom.git
git clone [email protected]:meijieru/crnn.pytorch.git
git clone [email protected]:Canjie-Luo/MORAN_v2.git
git clone [email protected]:ayumiymk/aster.pytorch.git
Create and run aster.pytorch/download_model.sh
:
wget https://github.com/ayumiymk/aster.pytorch/releases/download/v1.0/demo.pth.tar
mkdir models
mv demo.pth.tar models/
Create and run crnn.pytorch/download_model.sh
:
wget --content-disposition "https://www.dropbox.com/s/dboqjk20qjkpta3/crnn.pth?dl=1"
mv "crnn.pth" models/
Create and run MORAN_v2/download_model.sh
:
wget "https://docs.google.com/uc?export=download&id=1IDvT51MXKSseDq3X57uPjOzeSYI09zip" -O "moran.pth"
mv moran.pth models/
Then change TextZoom/src/config/super_resolution.yml
lines 39-41 to:
rec_pretrained: '../../../aster.pytorch/models/demo.pth.tar'
moran_pretrained: '../../../MORAN_v2/models/moran.pth'
crnn_pretrained: '../../../crnn.pytorch/models/crnn.pth'
Also change the number of GPUs if you don't have 4, line 14:
ngpu: 1
...and the model setup is done!
Additionally, in this issue thread the author JasonBoy1 says
A buddy uploaded the best model weights after 42000 epochs of training on TextZoom data: https://drive.google.com/file/d/1j-g17V5kBmqS8giWNZuHe2d_GefAepCs/view?usp=sharing
So also make one for TextZoom/download_model.sh
wget "https://docs.google.com/uc?export=download&id=1j-g17V5kBmqS8giWNZuHe2d_GefAepCs" -O "textzoom.pth"
mkdir models
mv textzoom.pth models/
To get it to run on a single GPU, you need to change the section at line 157 of interfaces/base.py
(H/T yustiks) from:
if self.config.TRAIN.ngpu == 1:
model.load_state_dict(torch.load(self.resume)['state_dict_G'])
to:
if self.config.TRAIN.ngpu == 1:
state_dict = torch.load(self.resume)['state_dict_G']
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
if 'module' not in k:
k = 'module.'+k
else:
k = k.replace('features.module.', 'module.features.')
new_state_dict[k]=v
model.load_state_dict(new_state_dict)
# model.load_state_dict(torch.load(self.resume)['state_dict_G'])
From a quick look with grep
, the requirements are:
torch
torchvision
IPython
numpy
pillow
PyYAML
easydict
editdistance
scipy
tqdm
cv2
opencv-python
lmdb
six
thop
matplotlib
- Note that this means pytorch comes conda but thop comes from pip...
conda create -n textzoom
conda activate textzoom
conda install cudatoolkit=11.0.3 -c conda-forge
conda install pytorch=1.7.1 torchvision=0.8.2 -c pytorch # also gets python=3.8.5
pip install $(cat requirements.txt | grep -Ev "^(torch(vision)?)$")
Lastly, ensure the environment variable CUDA_VISIBLE_DEVICES
is set, else
add to your .bashrc
:
export CUDA_VISIBLE_DEVICES="0"
Run the pre-trained model by creating the directory images/
in the repo's top directory (alongside
src
, not inside it) then cd src/
:
python3 main.py --demo --demo_dir='../images/' --resume='../models/textzoom.pth' --STN --mask
⇣
/home/louis/dev/sr/TextZoom/src/interfaces/base.py:155: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if self.resume is not '':
/home/louis/dev/sr/TextZoom/src/interfaces/base.py:206: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if index is not 0:
/home/louis/dev/sr/TextZoom/src/model/recognizer/stn_head.py:79: SyntaxWarning: "is" with a literal. Did you mean "=="?
if self.activation is 'none':
loading pre-trained model from ../models/textzoom.pth
load pred_trained aster model from ../../aster.pytorch/models/demo.pth.tar
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 44, in <module>
main(config, args)
File "main.py", line 16, in main
Mission.demo()
File "/home/louis/dev/sr/TextZoom/src/interfaces/super_resolution.py", line 299, in demo
images_sr = model(images_lr)
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/louis/dev/sr/TextZoom/src/model/tsrn.py", line 66, in forward
block = {'1': self.block1(x)}
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/louis/miniconda3/envs/textzoom/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 4, 9, 9], expected input[1, 5, 32, 256] to have 4 channels, but got 5 channels instead
TODO: patch, debug further or wait for patches (seems to be under active development)
-
Perhaps report here
-
Reminder to self: conda environment is
textzoom