Manual - PaulBreugnot/TheMaterialParser GitHub Wiki

This guide assumes that you have already installed and run TheMaterialParser, and that you can access it from your web browser.

It will detail the steps required to extract data from your datasheets, and a few extra functionalities such as access to the extracted materials database.

Contents

  1. Home page
  2. Create datasheet categories
  3. Upload datasheets
  4. Process datasheets
  5. Materials database

Home page

When you start TheMaterialParser for the first time, this is how the home page should look like : VoidHomePage

Create datasheet categories

Go to the Categories tab. You should see the following page : VoidCategories

To create a datasheet category, choose a logo (optional) and enter a category name, then click on Add Category. Here an example of what you could obtain : ExampleCategories

Upload datasheets

You can now come back to the Datasheets tab, and select one of the previously created categories. Then, click on Select file to select datasheets to upload (you can choose multiple files to upload), and finally click on Upload. You should obtain something like that : ExampleDatasheets

Process datasheets

To process datasheets, select them using the select boxes, and then click on Process selection. You should be redirected to this page : InitProcessExample

You can now start to draw composition tables selections directly on PDF views. SelectionExample

Selection concepts

Number of selections

You can potentially perform any number of selections on any number of datasheets, selecting them on the list at left. From the server side, all the selections will be applied to all datasheets to try to extract compositions.

Considering this, you should not need to perform selections on all the datasheets, obviously assuming that datasheets you try to process have a similar layout.

Table or not table?

Notice that on the previous example, the selected structure does not exactly look as a table. However, Tabula is still able to parse such structure in an understandable .csv.

To know if a structure could be processed, you can eventually try to parse it using the original Tabula app. The only requirements is that component names and values are parsed as distinct cells. TheMaterialParser will then automatically handle data cleaning and parsing.

Table orientation

In order to help the algorithm to parse your table, you should specify if the selected table correspond to a horizontal or vertical table, eventually with headers in the case of an horizontal table. To better understand this concept, have a look at the following examples :

  • Horizontal table without headers :

HorizontalTableNoHeaders

  • Horizontal table with headers :

HorizontalTableWithHeaders

  • Vertical table :

VerticalTable

Manage selections

Performed selections are displayed at the top of the view. If you want to modify a selection, delete it and draw another one. You can also highlight a selection on its original datasheet clicking on View selection.

Launch process

Once you have performed one (or more) selection, click on Extract Data! to send selections to the server that will try to parse material compositions. Results are displayed in real time next each datasheet name, to indicate if valid data has been found. FirstIteration

Where warning signs appear, you can now eventually performs new selections, and Extract data! again : SecondIteration

Extracted Data

To view current results, click on Extracted data at the top of the page. ExtractedData

On this page, you can ignore materials if data doesn't look good for some reasons. Finally, two operations are available :

  • download .csv : This will download extracted data, except ignored one, as a .csv file.
  • save to database : This will save extracted data, except ignored one, to the sqlite3 database so that you can access them later in the Materials tab.

You're now done on how to upload your datasheets and parse material compositions from them!

Materials database

If you have saved some results to the database, come back to TheMaterialParser home page, and go to the Materials tab. MaterialsExample

By default, all the available materials will be displayed. You can then apply some filters using the control pane to the left :

  • Name : search materials by name
  • Categories : select materials that come from datasheets belonging to the specified categories
  • Components : select materials that contain at least all the specified components.

Finally, you can download the selected data using the Download .csv button. You can also delete some materials from this tab.