Training German Newspapers - UB-Mannheim/kraken GitHub Wiki

Training of kraken model for German newspapers

Ground Truth

Deutscher Reichsanzeiger
Ground truth for German newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–1945)
Dataset: https://github.com/UB-Mannheim/reichsanzeiger-gt

Austrian Newspapers
NewsEye / READ OCR training dataset from Austrian Newspapers (1864–1911)
Dataset: https://github.com/UB-Mannheim/AustrianNewspapers

Neue ZΓΌrcher Zeitung
Ground truth for swiss newspaper "Neue ZΓΌrcher Zeitung" (1780–1947)
Dataset: https://github.com/UB-Mannheim/NZZ-black-letter-ground-truth

(based on STRΓ–BEL, Phillip; CLEMATIDE, Simon. Improving OCR of black letter in historical newspapers: the unreasonable effectiveness of HTR models on low-resolution images. 2019.)

Hakenkreuzbanner
Ground truth for a political newspaper of the Mannheim region (1931–1945)
Dataset: https://github.com/UB-Mannheim/hkb-gt Evaluationset(!)

Preparing data for training

git clone https://github.com/UB-Mannheim/reichsanzeiger-gt
git clone https://github.com/UB-Mannheim/AustrianNewspapers
git clone https://github.com/UB-Mannheim/NZZ-black-letter-ground-truth
git clone https://github.com/UB-Mannheim/hkb-gt

./reichsanzeiger-gt/data/download_images.sh
mv -r ./reichsanzeiger-gt/data/images/* ./reichsanzeiger-gt/data/reichsanzeiger-1820-1939/GT-PAGE
./hkb-gt/data/download_images_to_page.sh

fdfind -a --full-path './reichsanzeiger-gt/data/reichsanzeiger-1820-1939/' -e xml >> german_newspapers.list
fdfind -a --full-path './NZZ-black-letter-ground-truth/data' -e xml >> german_newspapers.list 
fdfind -a --full-path './AustrianNewspapers/data' -e xml >> german_newspapers.list 
fdfind -a --full-path './hkb-gt/data' -e xml >> german_newspapers.list

shuf < german_newspapers.list | shuf > shuf_german_newspapers.list

export OMP_NUM_THREADS=1

Training with basemodel

Download basemodel german_print

wget https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/kraken/german_print/german_print_best.mlmodel

Creating an apache arrow file with ketos -> german_newspapers.arrow is about 4 GB in size

time ketos compile --format-type xml --files ./shuf_german_newspapers.list  --workers 12 -o ./german_newspapers.arrow
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 236003/236003 -:--:-- -:--:--
Output file written to ./german_newspapers.arrow
ketos compile --format-type xml --files ./shuf_german_newspapers.list  12 -o   25965,31s user 14011,24s system 1173% cpu 56:45,47 total

Comparing different neural network topolgies for training a basemodel

Here we test the topologies most commonly used in the Kraken (kraken, htru) and Transkribus (htr+) communities, a topology generated by OpenAI's GPT (gpt), and a unique variant derived from these topologies (sgd).

Interestingly, the results showed that there were no significant differences in training cycles or achieved accuracies among most of these topologies. This suggests that the choice of topology may be less critical than previously thought, as long as a basic level of complexity is maintained.

The larger topologies did not show a noticeable positive impact on performance (on our validation set). This may indicate that the training data (just printed Text, no handwritting) and the variations of characters and fonts may be too low in complexity to fully exploit the potential of these neural networks. Therefore, when training with larger datasets, it might be useful to choose a more complex topology, especially using mixed datasets (prints and manuscripts).

More detailed information about the specific network topologies, their parameters, and the accuracies achieved can be found in the attached tables. This provides a comprehensive overview and allows interested parties to review the results in detail and evaluate them for their own applications.

Information about the parameter and training

Data preparation

The datasets contains: Reichsanzeiger-gt, Austrian Newspapers and NZZ-Blackletter.

(kraken-venv)  jkamlah@notebook20 ξ‚° ~/Coding/models/german_newspapers/kraken ξ‚° time ketos compile --format-type xml --files ./shuf_german_newspapers_31-12-2023.list  --workers 12 -o ./german_newspapers_2023_12.arrow                                                                          
Extracting lines ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 222610/222610 -:--:-- -:--:--
Output file written to ./german_newspapers_2023_12.arrow
ketos compile --format-type xml --files  --workers 12 -o   21375,98s user 34524,70s system 1569% cpu 59:21,94 total

kraken

This network topology is often used and recommended by Benjaming Kiessling, the main developer of Kraken.
It is rather small, but performs quite well in the evaluation.

(kraken-venv)  ✘ jkamlah@notebook20 ξ‚° ~/Coding/models/german_newspapers/kraken ξ‚° time nice ketos train -f binary  -o ./20231231/kraken/german_newspapers -d cuda:0 --lag 10 -r 0.0001 -B 4 -w 0 -s '[1,120,0,1 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,13,32 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 Mp2,2 Cr3,9,64 Do0.1,2 S1(1x0)1,3 Lbx200 Do0.1,2 Lbx200 Do.1,2 Lbx200 Do]' german_newspapers_2023_12.arrow 
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A4000 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name      ┃ Type                     ┃ Params ┃                 In sizes ┃                Out sizes ┃
┑━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ 0  β”‚ val_cer   β”‚ CharErrorRate            β”‚      0 β”‚                        ? β”‚                        ? β”‚
β”‚ 1  β”‚ val_wer   β”‚ WordErrorRate            β”‚      0 β”‚                        ? β”‚                        ? β”‚
β”‚ 2  β”‚ net       β”‚ MultiParamSequential     β”‚  4.1 M β”‚  [[1, 1, 120, 400], '?'] β”‚   [[1, 264, 1, 50], '?'] β”‚
β”‚ 3  β”‚ net.C_0   β”‚ ActConv2D                β”‚  1.3 K β”‚  [[1, 1, 120, 400], '?'] β”‚ [[1, 32, 120, 400], '?'] β”‚
β”‚ 4  β”‚ net.Do_1  β”‚ Dropout                  β”‚      0 β”‚ [[1, 32, 120, 400], '?'] β”‚ [[1, 32, 120, 400], '?'] β”‚
β”‚ 5  β”‚ net.Mp_2  β”‚ MaxPool                  β”‚      0 β”‚ [[1, 32, 120, 400], '?'] β”‚  [[1, 32, 60, 200], '?'] β”‚
β”‚ 6  β”‚ net.C_3   β”‚ ActConv2D                β”‚ 40.0 K β”‚  [[1, 32, 60, 200], '?'] β”‚  [[1, 32, 60, 200], '?'] β”‚
β”‚ 7  β”‚ net.Do_4  β”‚ Dropout                  β”‚      0 β”‚  [[1, 32, 60, 200], '?'] β”‚  [[1, 32, 60, 200], '?'] β”‚
β”‚ 8  β”‚ net.Mp_5  β”‚ MaxPool                  β”‚      0 β”‚  [[1, 32, 60, 200], '?'] β”‚  [[1, 32, 30, 100], '?'] β”‚
β”‚ 9  β”‚ net.C_6   β”‚ ActConv2D                β”‚ 55.4 K β”‚  [[1, 32, 30, 100], '?'] β”‚  [[1, 64, 30, 100], '?'] β”‚
β”‚ 10 β”‚ net.Do_7  β”‚ Dropout                  β”‚      0 β”‚  [[1, 64, 30, 100], '?'] β”‚  [[1, 64, 30, 100], '?'] β”‚
β”‚ 11 β”‚ net.Mp_8  β”‚ MaxPool                  β”‚      0 β”‚  [[1, 64, 30, 100], '?'] β”‚   [[1, 64, 15, 50], '?'] β”‚
β”‚ 12 β”‚ net.C_9   β”‚ ActConv2D                β”‚  110 K β”‚   [[1, 64, 15, 50], '?'] β”‚   [[1, 64, 15, 50], '?'] β”‚
β”‚ 13 β”‚ net.Do_10 β”‚ Dropout                  β”‚      0 β”‚   [[1, 64, 15, 50], '?'] β”‚   [[1, 64, 15, 50], '?'] β”‚
β”‚ 14 β”‚ net.S_11  β”‚ Reshape                  β”‚      0 β”‚   [[1, 64, 15, 50], '?'] β”‚   [[1, 960, 1, 50], '?'] β”‚
β”‚ 15 β”‚ net.L_12  β”‚ TransposedSummarizingRNN β”‚  1.9 M β”‚   [[1, 960, 1, 50], '?'] β”‚   [[1, 400, 1, 50], '?'] β”‚
β”‚ 16 β”‚ net.Do_13 β”‚ Dropout                  β”‚      0 β”‚   [[1, 400, 1, 50], '?'] β”‚   [[1, 400, 1, 50], '?'] β”‚
β”‚ 17 β”‚ net.L_14  β”‚ TransposedSummarizingRNN β”‚  963 K β”‚   [[1, 400, 1, 50], '?'] β”‚   [[1, 400, 1, 50], '?'] β”‚
β”‚ 18 β”‚ net.Do_15 β”‚ Dropout                  β”‚      0 β”‚   [[1, 400, 1, 50], '?'] β”‚   [[1, 400, 1, 50], '?'] β”‚
β”‚ 19 β”‚ net.L_16  β”‚ TransposedSummarizingRNN β”‚  963 K β”‚   [[1, 400, 1, 50], '?'] β”‚   [[1, 400, 1, 50], '?'] β”‚
β”‚ 20 β”‚ net.Do_17 β”‚ Dropout                  β”‚      0 β”‚   [[1, 400, 1, 50], '?'] β”‚   [[1, 400, 1, 50], '?'] β”‚
β”‚ 21 β”‚ net.O_18  β”‚ LinSoftmax               β”‚  105 K β”‚   [[1, 400, 1, 50], '?'] β”‚   [[1, 264, 1, 50], '?'] β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Trainable params: 4.1 M                                                                                                                                                                
Non-trainable params: 0                                                                                                                                                                
Total params: 4.1 M                                                                                                                                                                    
Total estimated model params size (MB): 16                                                                                                                                             
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:03 β€’ 0:00:00 14.33it/s val_accuracy: 0.985 val_word_accuracy: 0.928  early_stopping: 0/10 0.98519
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:05 β€’ 0:00:00 14.21it/s val_accuracy: 0.989 val_word_accuracy: 0.949  early_stopping: 0/10 0.98946
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:05 β€’ 0:00:00 14.16it/s val_accuracy: 0.991 val_word_accuracy: 0.956  early_stopping: 0/10 0.99098
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:52 β€’ 0:00:00 14.31it/s val_accuracy: 0.992 val_word_accuracy: 0.961  early_stopping: 0/10 0.99178
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.44it/s val_accuracy: 0.992 val_word_accuracy: 0.962  early_stopping: 0/10 0.99218
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.26it/s val_accuracy: 0.993 val_word_accuracy: 0.966  early_stopping: 0/10 0.99289
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.13it/s val_accuracy: 0.993 val_word_accuracy: 0.966  early_stopping: 0/10 0.99292
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.20it/s val_accuracy: 0.993 val_word_accuracy: 0.966  early_stopping: 1/10 0.99292
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:59 β€’ 0:00:00 14.02it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 0/10 0.99344
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:54 β€’ 0:00:00 14.13it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 1/10 0.99344
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:01 β€’ 0:00:00 14.28it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 2/10 0.99344
stage 11/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:02 β€’ 0:00:00 14.25it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99362
stage 12/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:03 β€’ 0:00:00 14.16it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99363
stage 13/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:00 β€’ 0:00:00 14.30it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99371
stage 14/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:56 β€’ 0:00:00 14.11it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99380
stage 15/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:59 β€’ 0:00:00 14.20it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99385
stage 16/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:54 β€’ 0:00:00 14.08it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99390
stage 17/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.18it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99390
stage 18/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:56 β€’ 0:00:00 14.34it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99390
stage 19/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:02 β€’ 0:00:00 13.84it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99395
stage 20/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:52 β€’ 0:00:00 14.17it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99401
stage 21/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:54 β€’ 0:00:00 14.28it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 1/10 0.99401
stage 22/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:58 β€’ 0:00:00 14.34it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99402
stage 23/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:55 β€’ 0:00:00 14.00it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99406
stage 24/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:56 β€’ 0:00:00 14.13it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99414
stage 25/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.17it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 1/10 0.99414
stage 26/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:03 β€’ 0:00:00 14.15it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 2/10 0.99414
stage 27/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:06 β€’ 0:00:00 13.90it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99419
stage 28/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:01 β€’ 0:00:00 14.10it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 1/10 0.99419
stage 29/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:51 β€’ 0:00:00 14.54it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 2/10 0.99419
stage 30/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.15it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 3/10 0.99419
stage 31/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.12it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 4/10 0.99419
stage 32/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:58 β€’ 0:00:00 14.48it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 5/10 0.99419
stage 33/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:58 β€’ 0:00:00 14.24it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 6/10 0.99419
stage 34/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:55 β€’ 0:00:00 14.31it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 7/10 0.99419
stage 35/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:57 β€’ 0:00:00 14.18it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 8/10 0.99419
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:52 β€’ 0:00:00 14.09it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 9/10 0.99419
stage 37/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:55 β€’ 0:00:00 13.97it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99427
stage 38/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:03 β€’ 0:00:00 14.23it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99431
stage 39/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:00 β€’ 0:00:00 14.23it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 1/10 0.99431
stage 40/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:54 β€’ 0:00:00 14.38it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 2/10 0.99431
stage 41/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:55 β€’ 0:00:00 14.08it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 3/10 0.99431
stage 42/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:51 β€’ 0:00:00 14.06it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 4/10 0.99431
stage 43/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:00:04 β€’ 0:00:00 13.78it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 5/10 0.99431
stage 44/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:54 β€’ 0:00:00 13.70it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 6/10 0.99431
stage 45/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:59 β€’ 0:00:00 14.05it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 7/10 0.99431
stage 46/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:00:00 β€’ 0:00:00 13.61it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 8/10 0.99431
stage 47/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:00:03 β€’ 0:00:00 13.76it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 9/10 0.99431
stage 48/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:58 β€’ 0:00:00 13.85it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 10/10 0.99431
Moving best model ./20231231/kraken/german_newspapers_38.mlmodel (0.9943079948425293) to ./20231231/kraken/german_newspapers_best.mlmodel
nice ketos train -f binary -o ./20231231/krakenD/german_newspapers -d cuda:0   210504,49s user 7813,78s system 119% cpu 50:42:36,20 total

htru

This network topology is often used by Thibault ClΓ©rice and Alix Chague, the main developer of HTR-United.
It is quite complex and could potentially outperform smaller networks if manuscripts or mixed datasets were used.

time nice ketos train -f binary  -o ./20231231/htru/german_newspapers -d cuda:0 --lag 10 -r 0.0001 -B 4 -w 0 -s '[1,120,0,1 Cr4,2,32,4,2 Gn32 Cr4,2,64,1,1 Gn32 Mp4,2,4,2 Cr3,3,128,1,1 Gn32 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]' german_newspapers_2023_12.arrow
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A4000 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name      ┃ Type                     ┃ Params ┃                In sizes ┃               Out sizes ┃
┑━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ 0  β”‚ val_cer   β”‚ CharErrorRate            β”‚      0 β”‚                       ? β”‚                       ? β”‚
β”‚ 1  β”‚ val_wer   β”‚ WordErrorRate            β”‚      0 β”‚                       ? β”‚                       ? β”‚
β”‚ 2  β”‚ net       β”‚ MultiParamSequential     β”‚  5.7 M β”‚ [[1, 1, 120, 400], '?'] β”‚  [[1, 264, 1, 49], '?'] β”‚
β”‚ 3  β”‚ net.C_0   β”‚ ActConv2D                β”‚    288 β”‚ [[1, 1, 120, 400], '?'] β”‚ [[1, 32, 30, 200], '?'] β”‚
β”‚ 4  β”‚ net.Gn_1  β”‚ GroupNorm                β”‚     64 β”‚ [[1, 32, 30, 200], '?'] β”‚ [[1, 32, 30, 200], '?'] β”‚
β”‚ 5  β”‚ net.C_2   β”‚ ActConv2D                β”‚ 16.4 K β”‚ [[1, 32, 30, 200], '?'] β”‚ [[1, 64, 29, 199], '?'] β”‚
β”‚ 6  β”‚ net.Gn_3  β”‚ GroupNorm                β”‚    128 β”‚ [[1, 64, 29, 199], '?'] β”‚ [[1, 64, 29, 199], '?'] β”‚
β”‚ 7  β”‚ net.Mp_4  β”‚ MaxPool                  β”‚      0 β”‚ [[1, 64, 29, 199], '?'] β”‚   [[1, 64, 7, 99], '?'] β”‚
β”‚ 8  β”‚ net.C_5   β”‚ ActConv2D                β”‚ 73.9 K β”‚   [[1, 64, 7, 99], '?'] β”‚  [[1, 128, 7, 99], '?'] β”‚
β”‚ 9  β”‚ net.Gn_6  β”‚ GroupNorm                β”‚    256 β”‚  [[1, 128, 7, 99], '?'] β”‚  [[1, 128, 7, 99], '?'] β”‚
β”‚ 10 β”‚ net.Mp_7  β”‚ MaxPool                  β”‚      0 β”‚  [[1, 128, 7, 99], '?'] β”‚  [[1, 128, 7, 49], '?'] β”‚
β”‚ 11 β”‚ net.S_8   β”‚ Reshape                  β”‚      0 β”‚  [[1, 128, 7, 49], '?'] β”‚  [[1, 896, 1, 49], '?'] β”‚
β”‚ 12 β”‚ net.L_9   β”‚ TransposedSummarizingRNN β”‚  2.4 M β”‚  [[1, 896, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 13 β”‚ net.Do_10 β”‚ Dropout                  β”‚      0 β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 14 β”‚ net.L_11  β”‚ TransposedSummarizingRNN β”‚  1.6 M β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 15 β”‚ net.Do_12 β”‚ Dropout                  β”‚      0 β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 16 β”‚ net.L_13  β”‚ TransposedSummarizingRNN β”‚  1.6 M β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 17 β”‚ net.Do_14 β”‚ Dropout                  β”‚      0 β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 18 β”‚ net.O_15  β”‚ LinSoftmax               β”‚  135 K β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 264, 1, 49], '?'] β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Trainable params: 5.7 M                                                                                                                                                                
Non-trainable params: 0                                                                                                                                                                
Total params: 5.7 M                                                                                                                                                                    
Total estimated model params size (MB): 22                                                                                                                                             
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:48:05 β€’ 0:00:00 17.44it/s val_accuracy: 0.987 val_word_accuracy: 0.935  early_stopping: 0/10 0.98660
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:49:45 β€’ 0:00:00 16.97it/s val_accuracy: 0.99 val_word_accuracy: 0.951  early_stopping: 0/10 0.98987
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:49:49 β€’ 0:00:00 16.99it/s val_accuracy: 0.991 val_word_accuracy: 0.957  early_stopping: 0/10 0.99111
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:36 β€’ 0:00:00 16.62it/s val_accuracy: 0.992 val_word_accuracy: 0.96  early_stopping: 0/10 0.99188
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:32 β€’ 0:00:00 16.26it/s val_accuracy: 0.992 val_word_accuracy: 0.964  early_stopping: 0/10 0.99247
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:42 β€’ 0:00:00 16.14it/s val_accuracy: 0.993 val_word_accuracy: 0.965  early_stopping: 0/10 0.99282
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:55:55 β€’ 0:00:00 13.90it/s val_accuracy: 0.992 val_word_accuracy: 0.963  early_stopping: 1/10 0.99282
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:22 β€’ 0:00:00 16.70it/s val_accuracy: 0.993 val_word_accuracy: 0.966  early_stopping: 0/10 0.99306
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:55:13 β€’ 0:00:00 13.74it/s val_accuracy: 0.993 val_word_accuracy: 0.967  early_stopping: 0/10 0.99325
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:18 β€’ 0:00:00 16.38it/s val_accuracy: 0.993 val_word_accuracy: 0.967  early_stopping: 1/10 0.99325
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:37 β€’ 0:00:00 16.44it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 0/10 0.99350
stage 11/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:28 β€’ 0:00:00 16.18it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99383
stage 12/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:08 β€’ 0:00:00 16.45it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99383
stage 13/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:30 β€’ 0:00:00 16.49it/s val_accuracy: 0.994 val_word_accuracy: 0.969  early_stopping: 2/10 0.99383
stage 14/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:32 β€’ 0:00:00 16.42it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 3/10 0.99383
stage 15/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:32 β€’ 0:00:00 16.24it/s val_accuracy: 0.994 val_word_accuracy: 0.969  early_stopping: 4/10 0.99383
stage 16/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:31 β€’ 0:00:00 16.26it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99404
stage 17/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:51 β€’ 0:00:00 14.36it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99408
stage 18/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:07 β€’ 0:00:00 16.57it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 1/10 0.99408
stage 19/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:35 β€’ 0:00:00 16.37it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99420
stage 20/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:38 β€’ 0:00:00 14.41it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99424
stage 21/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:23 β€’ 0:00:00 14.08it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 1/10 0.99424
stage 22/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:45 β€’ 0:00:00 14.34it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99425
stage 23/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:49 β€’ 0:00:00 16.64it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 1/10 0.99425
stage 24/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:37 β€’ 0:00:00 16.34it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 2/10 0.99425
stage 25/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:36 β€’ 0:00:00 16.78it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99426
stage 26/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:38 β€’ 0:00:00 16.71it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99434
stage 27/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:37 β€’ 0:00:00 16.63it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 1/10 0.99434
stage 28/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:38 β€’ 0:00:00 16.63it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99438
stage 29/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:22 β€’ 0:00:00 16.68it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 1/10 0.99438
stage 30/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:37 β€’ 0:00:00 16.46it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 2/10 0.99438
stage 31/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:41 β€’ 0:00:00 16.21it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 3/10 0.99438
stage 32/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:41 β€’ 0:00:00 16.32it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 4/10 0.99438
stage 33/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:37 β€’ 0:00:00 16.66it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 5/10 0.99438
stage 34/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:42 β€’ 0:00:00 16.47it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99439
stage 35/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:43 β€’ 0:00:00 16.42it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99442
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:43 β€’ 0:00:00 16.43it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99442
stage 37/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:54:10 β€’ 0:00:00 16.80it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 2/10 0.99442
stage 38/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:29 β€’ 0:00:00 16.31it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 3/10 0.99442
stage 39/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:43 β€’ 0:00:00 16.28it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 4/10 0.99442
stage 40/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:44 β€’ 0:00:00 16.29it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 5/10 0.99442
stage 41/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:21 β€’ 0:00:00 16.64it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 6/10 0.99442
stage 42/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:53:59 β€’ 0:00:00 16.45it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 7/10 0.99442
stage 43/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:29 β€’ 0:00:00 16.48it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 8/10 0.99442
stage 44/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:41 β€’ 0:00:00 16.55it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 9/10 0.99442
stage 45/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:44 β€’ 0:00:00 16.68it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 10/10 0.99442
Moving best model ./20231231/htru/german_newspapers_35.mlmodel (0.9944181442260742) to ./20231231/htru/german_newspapers_best.mlmodel
nice ketos train -f binary -o ./20231231/htru/german_newspapers -d     183509,75s user 9714,60s system 128% cpu 41:43:41,61 total

htr+

This is the network topology was developed by CITlab and used in the Transkribus project.
It is quite complex and could potentially outperform smaller networks if manuscripts or mixed datasets were used.

time nice ketos train -f binary  -o ./20231231/htr+/german_newspapers -d cuda:0 --lag 10 -r 0.0001 -B 4 -w 0 -s '[1,128,0,1 Cr4,2,8,4,2 Cr4,2,32,1,1 Mp4,2,4,2 Cr3,3,64,1,1 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5]' german_newspapers_2023_12.arrow
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A4000 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name      ┃ Type                     ┃ Params ┃                In sizes ┃               Out sizes ┃
┑━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ 0  β”‚ val_cer   β”‚ CharErrorRate            β”‚      0 β”‚                       ? β”‚                       ? β”‚
β”‚ 1  β”‚ val_wer   β”‚ WordErrorRate            β”‚      0 β”‚                       ? β”‚                       ? β”‚
β”‚ 2  β”‚ net       β”‚ MultiParamSequential     β”‚  4.8 M β”‚ [[1, 1, 128, 400], '?'] β”‚  [[1, 264, 1, 49], '?'] β”‚
β”‚ 3  β”‚ net.C_0   β”‚ ActConv2D                β”‚     72 β”‚ [[1, 1, 128, 400], '?'] β”‚  [[1, 8, 32, 200], '?'] β”‚
β”‚ 4  β”‚ net.C_1   β”‚ ActConv2D                β”‚  2.1 K β”‚  [[1, 8, 32, 200], '?'] β”‚ [[1, 32, 31, 199], '?'] β”‚
β”‚ 5  β”‚ net.Mp_2  β”‚ MaxPool                  β”‚      0 β”‚ [[1, 32, 31, 199], '?'] β”‚   [[1, 32, 7, 99], '?'] β”‚
β”‚ 6  β”‚ net.C_3   β”‚ ActConv2D                β”‚ 18.5 K β”‚   [[1, 32, 7, 99], '?'] β”‚   [[1, 64, 7, 99], '?'] β”‚
β”‚ 7  β”‚ net.Mp_4  β”‚ MaxPool                  β”‚      0 β”‚   [[1, 64, 7, 99], '?'] β”‚   [[1, 64, 7, 49], '?'] β”‚
β”‚ 8  β”‚ net.S_5   β”‚ Reshape                  β”‚      0 β”‚   [[1, 64, 7, 49], '?'] β”‚  [[1, 448, 1, 49], '?'] β”‚
β”‚ 9  β”‚ net.L_6   β”‚ TransposedSummarizingRNN β”‚  1.4 M β”‚  [[1, 448, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 10 β”‚ net.Do_7  β”‚ Dropout                  β”‚      0 β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 11 β”‚ net.L_8   β”‚ TransposedSummarizingRNN β”‚  1.6 M β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 12 β”‚ net.Do_9  β”‚ Dropout                  β”‚      0 β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 13 β”‚ net.L_10  β”‚ TransposedSummarizingRNN β”‚  1.6 M β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 14 β”‚ net.Do_11 β”‚ Dropout                  β”‚      0 β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 512, 1, 49], '?'] β”‚
β”‚ 15 β”‚ net.O_12  β”‚ LinSoftmax               β”‚  135 K β”‚  [[1, 512, 1, 49], '?'] β”‚  [[1, 264, 1, 49], '?'] β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Trainable params: 4.8 M                                                                                                     
Non-trainable params: 0                                                                                                     
Total params: 4.8 M                                                                                                         
Total estimated model params size (MB): 19                                                                                  
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:17 β€’ 0:00:00 18.54it/s val_accuracy: 0.983 early_stopping: 0/10   val_word_accuracy:   0.921   0.98301
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:46:34 β€’ 0:00:00 17.92it/s val_accuracy: 0.988 early_stopping: 0/10   val_word_accuracy:   0.944   0.98826
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:48:04 β€’ 0:00:00 17.12it/s val_accuracy: 0.991 early_stopping: 0/10   val_word_accuracy:   0.955   0.99067
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:48:17 β€’ 0:00:00 17.64it/s val_accuracy: 0.992 early_stopping: 0/10   val_word_accuracy:   0.96    0.99153 
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:39 β€’ 0:00:00 17.59it/s val_accuracy: 0.992 early_stopping: 0/10   val_word_accuracy:   0.962   0.99213
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:42 β€’ 0:00:00 17.28it/s val_accuracy: 0.993 early_stopping: 0/10   val_word_accuracy:   0.964   0.99261 
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:39 β€’ 0:00:00 17.36it/s val_accuracy: 0.993 early_stopping: 0/10   val_word_accuracy:   0.966   0.99285  
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:45 β€’ 0:00:00 17.65it/s val_accuracy: 0.993 early_stopping: 0/10   val_word_accuracy:   0.967   0.99313 
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:46:58 β€’ 0:00:00 18.58it/s val_accuracy: 0.993 early_stopping: 1/10   val_word_accuracy:   0.967   0.99313 
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:45 β€’ 0:00:00 18.53it/s val_accuracy: 0.993 early_stopping: 0/10   val_word_accuracy:   0.968   0.99342 
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:43 β€’ 0:00:00 18.07it/s val_accuracy: 0.993 early_stopping: 1/10   val_word_accuracy:   0.967   0.99342 
stage 11/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:43 β€’ 0:00:00 18.41it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.969   0.99369
stage 12/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:46 β€’ 0:00:00 18.34it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.97    0.99382
stage 13/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:41 β€’ 0:00:00 18.84it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.971   0.99389
stage 14/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:45 β€’ 0:00:00 18.38it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.971   0.99389 
stage 15/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:45 β€’ 0:00:00 18.49it/s val_accuracy: 0.994 early_stopping: 1/10   val_word_accuracy:   0.97    0.99389
stage 16/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:42 β€’ 0:00:00 18.49it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.972   0.99403 
stage 17/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:45 β€’ 0:00:00 18.31it/s val_accuracy: 0.993 early_stopping: 1/10   val_word_accuracy:   0.968   0.99403 
stage 18/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:45 β€’ 0:00:00 18.46it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.972   0.99410
stage 19/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:46 β€’ 0:00:00 18.41it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.972   0.99416 
stage 20/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:47 β€’ 0:00:00 18.18it/s val_accuracy: 0.994 early_stopping: 1/10   val_word_accuracy:   0.971   0.99416
stage 21/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:46 β€’ 0:00:00 18.49it/s val_accuracy: 0.994 early_stopping: 2/10   val_word_accuracy:   0.972   0.99416 
stage 22/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:47 β€’ 0:00:00 18.20it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.972   0.99419
stage 23/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:49 β€’ 0:00:00 18.49it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.973   0.99421
stage 24/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:42 β€’ 0:00:00 18.48it/s val_accuracy: 0.994 early_stopping: 0/10   val_word_accuracy:   0.973   0.99430
stage 25/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:44 β€’ 0:00:00 18.24it/s val_accuracy: 0.994 early_stopping: 1/10   val_word_accuracy:   0.973   0.99430 
stage 26/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:46 β€’ 0:00:00 18.01it/s val_accuracy: 0.994 early_stopping: 2/10   val_word_accuracy:   0.972   0.99430 
stage 27/∞ ━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:45:44 β€’ 0:00:00 18.34it/s val_accuracy: 0.994 early_stopping: 3/10   val_word_accuracy:   0.972   0.99430 
stage 28/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:18 β€’ 0:00:00 17.95it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 4/10 0.99430
stage 29/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:53:32 β€’ 0:00:00 14.57it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 0/10 0.99438
stage 30/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:54:49 β€’ 0:00:00 15.01it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 1/10 0.99438
stage 31/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:54:25 β€’ 0:00:00 15.30it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 2/10 0.99438
stage 32/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:48:15 β€’ 0:00:00 17.84it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 3/10 0.99438
stage 33/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:53:27 β€’ 0:00:00 14.30it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 4/10 0.99438
stage 34/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:49:38 β€’ 0:00:00 17.53it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 5/10 0.99438
stage 35/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:49:29 β€’ 0:00:00 14.25it/s val_accuracy: 0.995 val_word_accuracy: 0.974  early_stopping: 0/10 0.99451
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:54:41 β€’ 0:00:00 17.24it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 1/10 0.99451
stage 37/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:52:41 β€’ 0:00:00 17.80it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 2/10 0.99451
stage 38/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:51:58 β€’ 0:00:00 14.66it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 3/10 0.99451
stage 39/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:51:29 β€’ 0:00:00 17.50it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 4/10 0.99451
stage 40/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:38 β€’ 0:00:00 17.61it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 5/10 0.99451
stage 41/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:39 β€’ 0:00:00 17.47it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 6/10 0.99451
stage 42/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:39 β€’ 0:00:00 17.41it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 7/10 0.99451
stage 43/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:50:48 β€’ 0:00:00 14.55it/s val_accuracy: 0.994 val_word_accuracy: 0.974  early_stopping: 8/10 0.99451
stage 44/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:49:31 β€’ 0:00:00 17.31it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 9/10 0.99451
stage 45/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:47:42 β€’ 0:00:00 17.41it/s val_accuracy: 0.994 val_word_accuracy: 0.973  early_stopping: 10/10 0.99451
Moving best model ./20231231/htr+/german_newspapers_35.mlmodel (0.9945129752159119) to ./20231231/htr+/german_newspapers_best.mlmodel
nice ketos train -f binary -o ./20231231/htr+/german_newspapers -d cuda:0  10  171697,09s user 11665,18s system 130% cpu 38:55:34,52 total

gpt

This is the network topology was recommended by ChatGPT based on the other networks as input.
It is quite complex and could potentially outperform smaller networks if manuscripts or mixed datasets were used.

time nice ketos train -f binary  -o ./20231231/gpt/german_newspapers -d cuda:0 --lag 10 -r 0.0001 -B 4 -w 0 -s '[1,120,0,1 Cr3,3,32,1,1 Gn32 Mp2,2 Cr3,3,64,1,1 Gn64 Mp2,2,2,2 Cr3,3,128,1,1 Gn128 Mp2,2,2,2 Cr3,3,256,1,1 Gn256 Mp2,2,2,2 S1(1x0)1,3 Lbx256 Do0.2 Lbx256 Do0.2 Lbx256 Do0.2]' german_newspapers_2023_12.arrow
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A4000 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name      ┃ Type                     ┃ Params ┃                 In sizes ┃                Out sizes ┃
┑━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ 0  β”‚ val_cer   β”‚ CharErrorRate            β”‚      0 β”‚                        ? β”‚                        ? β”‚
β”‚ 1  β”‚ val_wer   β”‚ WordErrorRate            β”‚      0 β”‚                        ? β”‚                        ? β”‚
β”‚ 2  β”‚ net       β”‚ MultiParamSequential     β”‚  7.9 M β”‚  [[1, 1, 120, 400], '?'] β”‚   [[1, 264, 1, 25], '?'] β”‚
β”‚ 3  β”‚ net.C_0   β”‚ ActConv2D                β”‚    320 β”‚  [[1, 1, 120, 400], '?'] β”‚ [[1, 32, 120, 400], '?'] β”‚
β”‚ 4  β”‚ net.Gn_1  β”‚ GroupNorm                β”‚     64 β”‚ [[1, 32, 120, 400], '?'] β”‚ [[1, 32, 120, 400], '?'] β”‚
β”‚ 5  β”‚ net.Mp_2  β”‚ MaxPool                  β”‚      0 β”‚ [[1, 32, 120, 400], '?'] β”‚  [[1, 32, 60, 200], '?'] β”‚
β”‚ 6  β”‚ net.C_3   β”‚ ActConv2D                β”‚ 18.5 K β”‚  [[1, 32, 60, 200], '?'] β”‚  [[1, 64, 60, 200], '?'] β”‚
β”‚ 7  β”‚ net.Gn_4  β”‚ GroupNorm                β”‚    128 β”‚  [[1, 64, 60, 200], '?'] β”‚  [[1, 64, 60, 200], '?'] β”‚
β”‚ 8  β”‚ net.Mp_5  β”‚ MaxPool                  β”‚      0 β”‚  [[1, 64, 60, 200], '?'] β”‚  [[1, 64, 30, 100], '?'] β”‚
β”‚ 9  β”‚ net.C_6   β”‚ ActConv2D                β”‚ 73.9 K β”‚  [[1, 64, 30, 100], '?'] β”‚ [[1, 128, 30, 100], '?'] β”‚
β”‚ 10 β”‚ net.Gn_7  β”‚ GroupNorm                β”‚    256 β”‚ [[1, 128, 30, 100], '?'] β”‚ [[1, 128, 30, 100], '?'] β”‚
β”‚ 11 β”‚ net.Mp_8  β”‚ MaxPool                  β”‚      0 β”‚ [[1, 128, 30, 100], '?'] β”‚  [[1, 128, 15, 50], '?'] β”‚
β”‚ 12 β”‚ net.C_9   β”‚ ActConv2D                β”‚  295 K β”‚  [[1, 128, 15, 50], '?'] β”‚  [[1, 256, 15, 50], '?'] β”‚
β”‚ 13 β”‚ net.Gn_10 β”‚ GroupNorm                β”‚    512 β”‚  [[1, 256, 15, 50], '?'] β”‚  [[1, 256, 15, 50], '?'] β”‚
β”‚ 14 β”‚ net.Mp_11 β”‚ MaxPool                  β”‚      0 β”‚  [[1, 256, 15, 50], '?'] β”‚   [[1, 256, 7, 25], '?'] β”‚
β”‚ 15 β”‚ net.S_12  β”‚ Reshape                  β”‚      0 β”‚   [[1, 256, 7, 25], '?'] β”‚  [[1, 1792, 1, 25], '?'] β”‚
β”‚ 16 β”‚ net.L_13  β”‚ TransposedSummarizingRNN β”‚  4.2 M β”‚  [[1, 1792, 1, 25], '?'] β”‚   [[1, 512, 1, 25], '?'] β”‚
β”‚ 17 β”‚ net.Do_14 β”‚ Dropout                  β”‚      0 β”‚   [[1, 512, 1, 25], '?'] β”‚   [[1, 512, 1, 25], '?'] β”‚
β”‚ 18 β”‚ net.L_15  β”‚ TransposedSummarizingRNN β”‚  1.6 M β”‚   [[1, 512, 1, 25], '?'] β”‚   [[1, 512, 1, 25], '?'] β”‚
β”‚ 19 β”‚ net.Do_16 β”‚ Dropout                  β”‚      0 β”‚   [[1, 512, 1, 25], '?'] β”‚   [[1, 512, 1, 25], '?'] β”‚
β”‚ 20 β”‚ net.L_17  β”‚ TransposedSummarizingRNN β”‚  1.6 M β”‚   [[1, 512, 1, 25], '?'] β”‚   [[1, 512, 1, 25], '?'] β”‚
β”‚ 21 β”‚ net.Do_18 β”‚ Dropout                  β”‚      0 β”‚   [[1, 512, 1, 25], '?'] β”‚   [[1, 512, 1, 25], '?'] β”‚
β”‚ 22 β”‚ net.O_19  β”‚ LinSoftmax               β”‚  135 K β”‚   [[1, 512, 1, 25], '?'] β”‚   [[1, 264, 1, 25], '?'] β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Trainable params: 7.9 M                                                                                                                                                                
Non-trainable params: 0                                                                                                                                                                
Total params: 7.9 M                                                                                                                                                                    
Total estimated model params size (MB): 31                                                                                                                                             
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:08 β€’ 0:00:00 15.31it/s val_accuracy: 0.987 val_word_accuracy: 0.934  early_stopping: 0/10 0.98657
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:26 β€’ 0:00:00 14.48it/s val_accuracy: 0.99 val_word_accuracy: 0.951  early_stopping: 0/10 0.98966
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:58 β€’ 0:00:00 14.20it/s val_accuracy: 0.991 val_word_accuracy: 0.957  early_stopping: 0/10 0.99089
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:46 β€’ 0:00:00 14.48it/s val_accuracy: 0.991 val_word_accuracy: 0.96  early_stopping: 0/10 0.99148
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:43 β€’ 0:00:00 13.63it/s val_accuracy: 0.992 val_word_accuracy: 0.963  early_stopping: 0/10 0.99219
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:51 β€’ 0:00:00 14.57it/s val_accuracy: 0.992 val_word_accuracy: 0.964  early_stopping: 0/10 0.99229
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:56 β€’ 0:00:00 13.41it/s val_accuracy: 0.993 val_word_accuracy: 0.965  early_stopping: 0/10 0.99257
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:14 β€’ 0:00:00 14.22it/s val_accuracy: 0.993 val_word_accuracy: 0.966  early_stopping: 0/10 0.99277
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:20 β€’ 0:00:00 14.38it/s val_accuracy: 0.993 val_word_accuracy: 0.967  early_stopping: 0/10 0.99284
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:21 β€’ 0:00:00 14.67it/s val_accuracy: 0.993 val_word_accuracy: 0.967  early_stopping: 0/10 0.99288
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:21 β€’ 0:00:00 14.70it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 0/10 0.99317
stage 11/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:36 β€’ 0:00:00 13.09it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 1/10 0.99317
stage 12/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:00:46 β€’ 0:00:00 14.51it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 2/10 0.99317
stage 13/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:53 β€’ 0:00:00 14.80it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 0/10 0.99323
stage 14/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:41 β€’ 0:00:00 14.99it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 0/10 0.99326
stage 15/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:41 β€’ 0:00:00 14.80it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 1/10 0.99326
stage 16/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:44 β€’ 0:00:00 14.63it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 0/10 0.99327
stage 17/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:42 β€’ 0:00:00 14.58it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 0/10 0.99345
stage 18/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:42 β€’ 0:00:00 14.83it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 1/10 0.99345
stage 19/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:43 β€’ 0:00:00 14.85it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99353
stage 20/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:46 β€’ 0:00:00 14.61it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 1/10 0.99353
stage 21/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:12 β€’ 0:00:00 14.19it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 2/10 0.99353
stage 22/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:22 β€’ 0:00:00 14.09it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 3/10 0.99353
stage 23/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:00:30 β€’ 0:00:00 14.71it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99356
stage 24/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:27 β€’ 0:00:00 15.03it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 1/10 0.99356
stage 25/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:42 β€’ 0:00:00 14.42it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 2/10 0.99356
stage 26/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:46 β€’ 0:00:00 14.44it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 3/10 0.99356
stage 27/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:01 β€’ 0:00:00 14.50it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 4/10 0.99356
stage 28/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:45 β€’ 0:00:00 14.54it/s val_accuracy: 0.993 val_word_accuracy: 0.969  early_stopping: 5/10 0.99356
stage 29/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:40 β€’ 0:00:00 14.19it/s val_accuracy: 0.994 val_word_accuracy: 0.969  early_stopping: 6/10 0.99356
stage 30/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:45 β€’ 0:00:00 13.70it/s val_accuracy: 0.994 val_word_accuracy: 0.969  early_stopping: 7/10 0.99356
stage 31/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:43 β€’ 0:00:00 14.88it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 8/10 0.99356
stage 32/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:29 β€’ 0:00:00 14.86it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99357
stage 33/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:00 β€’ 0:00:00 13.28it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99358
stage 34/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:43 β€’ 0:00:00 15.01it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99360
stage 35/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:25 β€’ 0:00:00 14.71it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99360
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:10 β€’ 0:00:00 14.92it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99360
stage 37/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:06 β€’ 0:00:00 14.31it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99364
stage 38/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:20 β€’ 0:00:00 13.65it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99364
stage 39/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:10 β€’ 0:00:00 14.97it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99364
stage 40/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:05 β€’ 0:00:00 14.74it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 3/10 0.99364
stage 41/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:58 β€’ 0:00:00 14.84it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 4/10 0.99364
stage 42/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:26 β€’ 0:00:00 13.70it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 5/10 0.99364
stage 43/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:31 β€’ 0:00:00 14.74it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99369
stage 44/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:08 β€’ 0:00:00 14.99it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99369
stage 45/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:00:15 β€’ 0:00:00 14.65it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99369
stage 46/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:55 β€’ 0:00:00 14.42it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99370
stage 47/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:27 β€’ 0:00:00 14.38it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99370
stage 48/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:55 β€’ 0:00:00 14.74it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99371
stage 49/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:27 β€’ 0:00:00 14.39it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99371
stage 50/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:13 β€’ 0:00:00 14.28it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99372
stage 51/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:59:01 β€’ 0:00:00 14.41it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99372
stage 52/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:58:44 β€’ 0:00:00 14.40it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99372
stage 53/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:38 β€’ 0:00:00 14.47it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 3/10 0.99372
stage 54/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:31 β€’ 0:00:00 14.33it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 4/10 0.99372
stage 55/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:33 β€’ 0:00:00 14.31it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99380
stage 56/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:26 β€’ 0:00:00 14.73it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99380
stage 57/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:57:31 β€’ 0:00:00 14.40it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99380
stage 58/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:36 β€’ 0:00:00 15.01it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 3/10 0.99380
stage 59/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:34 β€’ 0:00:00 14.69it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99384
stage 60/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:37 β€’ 0:00:00 14.88it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99384
stage 61/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:39 β€’ 0:00:00 14.71it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 2/10 0.99384
stage 62/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:35 β€’ 0:00:00 14.78it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 3/10 0.99384
stage 63/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:38 β€’ 0:00:00 14.74it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 4/10 0.99384
stage 64/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:36 β€’ 0:00:00 14.83it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 5/10 0.99384
stage 65/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:36 β€’ 0:00:00 15.05it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 6/10 0.99384
stage 66/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:37 β€’ 0:00:00 14.84it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 7/10 0.99384
stage 67/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:36 β€’ 0:00:00 14.76it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 8/10 0.99384
stage 68/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:55 β€’ 0:00:00 14.55it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 9/10 0.99384
stage 69/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 0:56:38 β€’ 0:00:00 14.83it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 10/10 0.99384
Moving best model ./20231231/gpt/german_newspapers_56.mlmodel (0.9943616986274719) to ./20231231/gpt/german_newspapers_best.mlmodel

sdg

This is network topology was developed by Jan Kamlah in the OCR-D Project, based on the other networks as input.
It is quite complex and could potentially outperform smaller networks if manuscripts or mixed datasets were used.

time nice ketos train -f binary  -o ./20231231/sgd/german_newspapers -d cuda:0 --lag 10 -r 0.0001 -B 4 -w 0 -s '[1,144,0,1 Cr4,2,16,1,1 Mp4,2 Cr2,2,48,1,1, Gn24 Mp2,2 Cr2,2,72,1,1 Gn36 Mp2,2 S1(1x0)1,3 Lbx288 Do0.2,2 Lbx288 Do0.2,2 Lbx288]' german_newspapers_2023_12.arrow

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA RTX A4000 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name      ┃ Type                     ┃ Params ┃                 In sizes ┃                Out sizes ┃
┑━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
β”‚ 0  β”‚ val_cer   β”‚ CharErrorRate            β”‚      0 β”‚                        ? β”‚                        ? β”‚
β”‚ 1  β”‚ val_wer   β”‚ WordErrorRate            β”‚      0 β”‚                        ? β”‚                        ? β”‚
β”‚ 2  β”‚ net       β”‚ MultiParamSequential     β”‚  6.2 M β”‚  [[1, 1, 144, 400], '?'] β”‚   [[1, 264, 1, 49], '?'] β”‚
β”‚ 3  β”‚ net.C_0   β”‚ ActConv2D                β”‚    144 β”‚  [[1, 1, 144, 400], '?'] β”‚ [[1, 16, 143, 399], '?'] β”‚
β”‚ 4  β”‚ net.Mp_1  β”‚ MaxPool                  β”‚      0 β”‚ [[1, 16, 143, 399], '?'] β”‚  [[1, 16, 35, 199], '?'] β”‚
β”‚ 5  β”‚ net.C_2   β”‚ ActConv2D                β”‚  3.1 K β”‚  [[1, 16, 35, 199], '?'] β”‚  [[1, 48, 34, 198], '?'] β”‚
β”‚ 6  β”‚ net.Gn_3  β”‚ GroupNorm                β”‚     96 β”‚  [[1, 48, 34, 198], '?'] β”‚  [[1, 48, 34, 198], '?'] β”‚
β”‚ 7  β”‚ net.Mp_4  β”‚ MaxPool                  β”‚      0 β”‚  [[1, 48, 34, 198], '?'] β”‚   [[1, 48, 17, 99], '?'] β”‚
β”‚ 8  β”‚ net.C_5   β”‚ ActConv2D                β”‚ 13.9 K β”‚   [[1, 48, 17, 99], '?'] β”‚   [[1, 72, 16, 98], '?'] β”‚
β”‚ 9  β”‚ net.Gn_6  β”‚ GroupNorm                β”‚    144 β”‚   [[1, 72, 16, 98], '?'] β”‚   [[1, 72, 16, 98], '?'] β”‚
β”‚ 10 β”‚ net.Mp_7  β”‚ MaxPool                  β”‚      0 β”‚   [[1, 72, 16, 98], '?'] β”‚    [[1, 72, 8, 49], '?'] β”‚
β”‚ 11 β”‚ net.S_8   β”‚ Reshape                  β”‚      0 β”‚    [[1, 72, 8, 49], '?'] β”‚   [[1, 576, 1, 49], '?'] β”‚
β”‚ 12 β”‚ net.L_9   β”‚ TransposedSummarizingRNN β”‚  2.0 M β”‚   [[1, 576, 1, 49], '?'] β”‚   [[1, 576, 1, 49], '?'] β”‚
β”‚ 13 β”‚ net.Do_10 β”‚ Dropout                  β”‚      0 β”‚   [[1, 576, 1, 49], '?'] β”‚   [[1, 576, 1, 49], '?'] β”‚
β”‚ 14 β”‚ net.L_11  β”‚ TransposedSummarizingRNN β”‚  2.0 M β”‚   [[1, 576, 1, 49], '?'] β”‚   [[1, 576, 1, 49], '?'] β”‚
β”‚ 15 β”‚ net.Do_12 β”‚ Dropout                  β”‚      0 β”‚   [[1, 576, 1, 49], '?'] β”‚   [[1, 576, 1, 49], '?'] β”‚
β”‚ 16 β”‚ net.L_13  β”‚ TransposedSummarizingRNN β”‚  2.0 M β”‚   [[1, 576, 1, 49], '?'] β”‚   [[1, 576, 1, 49], '?'] β”‚
β”‚ 17 β”‚ net.O_14  β”‚ LinSoftmax               β”‚  152 K β”‚   [[1, 576, 1, 49], '?'] β”‚   [[1, 264, 1, 49], '?'] β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Trainable params: 6.2 M                                                                                                                                                                
Non-trainable params: 0                                                                                                                                                                
Total params: 6.2 M                                                                                                                                                                    
Total estimated model params size (MB): 24                                                                                                                                             
stage 0/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:04:58 β€’ 0:00:00 13.09it/s val_accuracy: 0.988 val_word_accuracy: 0.939  early_stopping: 0/10 0.98759
stage 1/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:29 β€’ 0:00:00 12.79it/s val_accuracy: 0.991 val_word_accuracy: 0.953  early_stopping: 0/10 0.99055
stage 2/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:34 β€’ 0:00:00 12.94it/s val_accuracy: 0.992 val_word_accuracy: 0.958  early_stopping: 0/10 0.99168
stage 3/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:33 β€’ 0:00:00 12.86it/s val_accuracy: 0.992 val_word_accuracy: 0.962  early_stopping: 0/10 0.99236
stage 4/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:37 β€’ 0:00:00 12.87it/s val_accuracy: 0.993 val_word_accuracy: 0.964  early_stopping: 0/10 0.99273
stage 5/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:41 β€’ 0:00:00 12.74it/s val_accuracy: 0.993 val_word_accuracy: 0.966  early_stopping: 0/10 0.99314
stage 6/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:47 β€’ 0:00:00 12.27it/s val_accuracy: 0.993 val_word_accuracy: 0.968  early_stopping: 0/10 0.99344
stage 7/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:48 β€’ 0:00:00 12.95it/s val_accuracy: 0.994 val_word_accuracy: 0.968  early_stopping: 0/10 0.99355
stage 8/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:07:48 β€’ 0:00:00 12.51it/s val_accuracy: 0.993 val_word_accuracy: 0.967  early_stopping: 1/10 0.99355
stage 9/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:48 β€’ 0:00:00 12.70it/s val_accuracy: 0.994 val_word_accuracy: 0.969  early_stopping: 0/10 0.99366
stage 10/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:07:07 β€’ 0:00:00 12.74it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99392
stage 11/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:43 β€’ 0:00:00 12.55it/s val_accuracy: 0.994 val_word_accuracy: 0.969  early_stopping: 1/10 0.99392
stage 12/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:31 β€’ 0:00:00 12.47it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99397
stage 13/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:42 β€’ 0:00:00 12.41it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99400
stage 14/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:32 β€’ 0:00:00 12.41it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 0/10 0.99403
stage 15/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:19 β€’ 0:00:00 12.77it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99403
stage 16/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:27 β€’ 0:00:00 12.84it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 2/10 0.99403
stage 17/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:25 β€’ 0:00:00 12.74it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 3/10 0.99403
stage 18/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:26 β€’ 0:00:00 12.89it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99414
stage 19/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:29 β€’ 0:00:00 12.79it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99414
stage 20/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:34 β€’ 0:00:00 12.92it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 2/10 0.99414
stage 21/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:25 β€’ 0:00:00 12.79it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 3/10 0.99414
stage 22/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:26 β€’ 0:00:00 12.94it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99416
stage 23/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:26 β€’ 0:00:00 12.60it/s val_accuracy: 0.994 val_word_accuracy: 0.97  early_stopping: 1/10 0.99416
stage 24/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:31 β€’ 0:00:00 12.81it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 2/10 0.99416
stage 25/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:28 β€’ 0:00:00 12.73it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 0/10 0.99426
stage 26/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:27 β€’ 0:00:00 12.62it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99426
stage 27/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:31 β€’ 0:00:00 12.73it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 2/10 0.99426
stage 28/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:26 β€’ 0:00:00 12.78it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99429
stage 29/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:34 β€’ 0:00:00 11.81it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99429
stage 30/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:30 β€’ 0:00:00 12.41it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 0/10 0.99436
stage 31/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:07:38 β€’ 0:00:00 12.34it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 1/10 0.99436
stage 32/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:08:01 β€’ 0:00:00 12.13it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 2/10 0.99436
stage 33/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:07:04 β€’ 0:00:00 12.69it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 3/10 0.99436
stage 34/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:15 β€’ 0:00:00 12.95it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 4/10 0.99436
stage 35/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:07:30 β€’ 0:00:00 11.97it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 5/10 0.99436
stage 36/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:06 β€’ 0:00:00 12.78it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 6/10 0.99436
stage 37/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:40 β€’ 0:00:00 12.85it/s val_accuracy: 0.994 val_word_accuracy: 0.972  early_stopping: 7/10 0.99436
stage 38/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:06:42 β€’ 0:00:00 12.65it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 8/10 0.99436
stage 39/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:41 β€’ 0:00:00 12.94it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 9/10 0.99436
stage 40/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50088/50088 1:05:43 β€’ 0:00:00 12.49it/s val_accuracy: 0.994 val_word_accuracy: 0.971  early_stopping: 10/10 0.99436
Moving best model ./20231231/sgd/german_newspapers_30.mlmodel (0.9943616986274719) to ./20231231/sgd/german_newspapers_best.mlmodel
nice ketos train -f binary -o ./20231231/sgd/german_newspapers -d cuda:0  199080,08s user 13389,96s system 123% cpu 47:37:04,78 total

Evaluation

The evaluation set contains pages from the newspaper Hakenkreuzbanner, which were not included in the training!
Evaluation also includes a newly trained Tesseract model (german_newspapers) and the current best Tesseract model for historical sources (frak2021).
All topologies are quite similar in their results and should be trained on a larger dataset to test the potential of more complex neural network structures.
For this or similar datasets, we recommend using the kraken topology as it is the smallest model and gives the best results.

πŸ“ Top models for /home/jkamlah/Documents/projects/OCR-D/Models/Evaluation/german_print/hkb_1941-09-03_0005 
πŸ₯‡   99.58  	Kraken_german_newspapers_default_deep
πŸ₯ˆ   99.57  	Kraken_german_newspapers_htr+
πŸ₯‰   99.57  	Kraken_german_newspapers_default
☁️   99.56  	Kraken_german_newspapers_sdg
☁️   99.56  	Kraken_german_newspapers_gpt
☁️   99.16  	Tesseract_german_newspapers
☁️   98.02  	Tesseract_frak2021
☁️   97.47  	Kraken_digi_tue
πŸ“ Top models for /home/jkamlah/Documents/projects/OCR-D/Models/Evaluation/german_print/hkb_1931-01-03_0003 
πŸ₯‡   99.77  	Kraken_german_newspapers_default
πŸ₯ˆ   99.76  	Kraken_german_newspapers_default_deep
πŸ₯‰   99.70  	Kraken_german_newspapers_gpt
☁️   99.68  	Kraken_german_newspapers_htr+
☁️   99.61  	Kraken_german_newspapers_sdg
☁️   99.50  	Kraken_digi_tue
☁️   99.12  	Tesseract_german_newspapers
☁️   98.58  	Tesseract_frak2021
πŸ“ Top models for /home/jkamlah/Documents/projects/OCR-D/Models/Evaluation/german_print/hkb_1943-04-01_0008 
πŸ₯‡   99.62  	Kraken_german_newspapers_htr+
πŸ₯ˆ   99.55  	Kraken_german_newspapers_default
πŸ₯‰   99.54  	Kraken_german_newspapers_gpt
☁️   99.52  	Kraken_german_newspapers_default_deep
☁️   99.48  	Kraken_german_newspapers_sdg
☁️   99.11  	Tesseract_german_newspapers
☁️   98.75  	Kraken_digi_tue
☁️   97.95  	Tesseract_frak2021
πŸ“ Top models for /home/jkamlah/Documents/projects/OCR-D/Models/Evaluation/german_print/hkb_1931-01-03_0011 
πŸ₯‡   100.00  	Kraken_german_newspapers_sdg
πŸ₯ˆ   100.00  	Kraken_german_newspapers_default_deep
πŸ₯‰   100.00  	Kraken_german_newspapers_default
☁️   99.80  	Kraken_german_newspapers_gpt
☁️   99.79  	Kraken_german_newspapers_htr+
☁️   99.52  	Kraken_digi_tue
☁️   99.41  	Tesseract_german_newspapers
☁️   99.12  	Tesseract_frak2021
πŸ“ Top models for /home/jkamlah/Documents/projects/OCR-D/Models/Evaluation/german_print/hkb_1937-11-21_0026 
πŸ₯‡   99.77  	Kraken_german_newspapers_sdg
πŸ₯ˆ   99.77  	Kraken_german_newspapers_default
πŸ₯‰   99.77  	Kraken_german_newspapers_gpt
☁️   99.73  	Kraken_german_newspapers_default_deep
☁️   99.68  	Kraken_german_newspapers_htr+
☁️   99.48  	Tesseract_german_newspapers
☁️   99.25  	Kraken_digi_tue
☁️   99.01  	Tesseract_frak2021
πŸ“ Top models for /home/jkamlah/Documents/projects/OCR-D/Models/Evaluation/german_print/hkb_1945-01-01_0022 
πŸ₯‡   99.01  	Kraken_german_newspapers_default_deep
πŸ₯ˆ   99.00  	Kraken_german_newspapers_default
πŸ₯‰   98.96  	Kraken_german_newspapers_htr+
☁️   98.80  	Kraken_german_newspapers_gpt
☁️   98.79  	Kraken_german_newspapers_sdg
☁️   97.63  	Kraken_digi_tue
☁️   96.20  	Tesseract_frak2021
☁️   96.20  	Tesseract_german_newspapers

🎽 Top models over all
πŸ₯‡   99.61  	Kraken_german_newspapers_default
πŸ₯ˆ   99.60  	Kraken_german_newspapers_default_deep
πŸ₯‰   99.55  	Kraken_german_newspapers_htr+
☁️   99.54  	Kraken_german_newspapers_sdg
☁️   99.53  	Kraken_german_newspapers_gpt
☁️   98.75  	Tesseract_german_newspapers
☁️   98.69  	Kraken_digi_tue
☁️   98.15  	Tesseract_frak2021