Column seperator module - AdrianC2000/InvoiceScannerApp GitHub Wiki

Purpose of Column seperator module is to create a list of Column objects based on the table image

ColumnsSeparator

Class receives a table image, that was previously extracted
Image is processed:
1. Table image is transformed into the binary image
2. Using ImageRotator class binary image skew angle is calculated and fixed
Using ContoursDefiner class:
1. Table's contours are calculated
2. Table's contours positions are recalculated - single line position is calculated based on the mean coordinates values, so the contours are even and straight
Then list of Column objects is prepared and returned:
1. Column -> list of Position objects (cells in single column)
2. Position -> starting_x, starting_y, ending_x, ending_y

Example with images

Table image

2 Extracted table

Binary image and skew fixed

3 Binary table

Original table's contours

4 Original contours

Fixed table's contours

5 Fixed contours

Extracted list of columns on the image

6 Table with bounding boxes

ImageRotator

Image rotator receives a table image and fix its skew angle - often scanned invoice are not scanned perfectly straight, so this class fixes this. The procedure is as follows:

Separate horizontal lines of the table using ContourDefiner
Getting first horizontal lines (so the line above the header row)
Calculating first and last points that creates that line (so most left and right points x and y coordinates)
Calculating the angle based on those two points and rotating the whole table by that angle

Example with image

ContoursDefiner

ContoursDefiner is a class that extracts contours from the table image. The contours are extracted as the ndarray. This class also removes the redundant part of the table (in the current approach only rectangular table content is processed because the below part is commonly a table summarization, which right now is not processed).

Using the cv2 library horizontal and vertical lines are extracted:

contours, _ = cv2.findContours(table_contours_image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

Besides calculating the contours, ContoursDefiner class also calculates fixed contours, so that every cell is a rectangle. Each horizontal and vertical line is separated, and then:

For horizontal lines mean y coordinate is calculated
For vertical lines mean x coordinate is calculated

Example with image

Extracted contours:

4 Original contours

Fixed contours (with redundant table's part removed):

5 Fixed contours