Synthesizing Artificial Character Images - hcts-hra/ecpo-fulltext-experiments GitHub Wiki

Since the training data retrieved by extracting character images from manual crops is limited in size – partly due to lack of more annotations, partly due to the effort of creating the crops and finding the corresponding section in the text ground truth – more training data is created artificially. The idea is to pre-train a classifier on large amounts of artificial training data randomly augmented on the fly before actually fine-tuning it on real-life character images.

The process to synthesize character images whose distribution is as similar as possible to real-life data is as follows:

  1. Extract png images of a predefined set of glyphs from a publicly available traditional Chinese Song font.
  2. Add random noise (peppering).
  3. Use morphological opening and then closing to enlarge noise pixels, grow them to close black pixels (other noise or the actual character) and remove useless noise again.
  4. Use erosion to thicken lines.
  5. Emphasize vertical lines while blurring and staining the remaining parts:
    1. Extract vertical elements of a certain minimum length using dilation with a vertical kernel.
    2. Separately apply the following:
      1. Further erode and blur the image.
      2. Generate random patches (algorithm see here).
      3. Add the patches to the image.
    3. Join the result and the previously extracted vertical lines back together using bitwise AND.
  6. Blur the image once more. Additionally, the brightness can be randomly increased or decreased before (not done in the above example image). After, linearly rescale pixel values to cover the whole range from 0 to 255, just like the real-life images.
  7. Apply randomized elastic transformation (hardly visible in the above example).

Adding padding to the real-life images to make them squared causes the characters to be off-centered (see here) as well as some size incoherence after resizing them to a consistent input size needed for the CNN. In order to imitate this, randomized padding and resizing is applied to the synthetic character images as well.

Finally, randomizing (1) the number of iterations for the morphological operations in step c., (2) the kernel size for blurring in step f. and (3) the value for increasing/decreasing the brightness in step f., we obtain the following output. The upper half are 91 random samples created using the algorithm described above from a single png image of the character 當, the lower half are the 91 training images of the same character for comparison:

⚠️ **GitHub.com Fallback** ⚠️