Manual - PaulBreugnot/TheMaterialParser GitHub Wiki
This guide assumes that you have already installed and run TheMaterialParser, and that you can access it from your web browser.
It will detail the steps required to extract data from your datasheets, and a few extra functionalities such as access to the extracted materials database.
Contents
Home page
When you start TheMaterialParser for the first time, this is how the home page should look like :
Create datasheet categories
Go to the Categories
tab. You should see the following page :
To create a datasheet category, choose a logo (optional) and enter a category name, then click on Add Category
.
Here an example of what you could obtain :
Upload datasheets
You can now come back to the Datasheets
tab, and select one of the previously created categories.
Then, click on Select file
to select datasheets to upload (you can choose multiple files to upload), and finally click on Upload
. You should obtain something like that :
Process datasheets
To process datasheets, select them using the select boxes, and then click on Process selection
. You should be redirected to this page :
You can now start to draw composition tables selections directly on PDF views.
Selection concepts
Number of selections
You can potentially perform any number of selections on any number of datasheets, selecting them on the list at left. From the server side, all the selections will be applied to all datasheets to try to extract compositions.
Considering this, you should not need to perform selections on all the datasheets, obviously assuming that datasheets you try to process have a similar layout.
Table or not table?
Notice that on the previous example, the selected structure does not exactly look as a table. However, Tabula is still able to parse such structure in an understandable .csv.
To know if a structure could be processed, you can eventually try to parse it using the original Tabula app. The only requirements is that component names and values are parsed as distinct cells. TheMaterialParser will then automatically handle data cleaning and parsing.
Table orientation
In order to help the algorithm to parse your table, you should specify if the selected table correspond to a horizontal or vertical table, eventually with headers in the case of an horizontal table. To better understand this concept, have a look at the following examples :
- Horizontal table without headers :
- Horizontal table with headers :
- Vertical table :
Manage selections
Performed selections are displayed at the top of the view. If you want to modify a selection, delete it and draw another one. You can also highlight a selection on its original datasheet clicking on View selection
.
Launch process
Once you have performed one (or more) selection, click on Extract Data!
to send selections to the server that will try to parse material compositions. Results are displayed in real time next each datasheet name, to indicate if valid data has been found.
Where warning signs appear, you can now eventually performs new selections, and Extract data!
again :
Extracted Data
To view current results, click on Extracted data
at the top of the page.
On this page, you can ignore
materials if data doesn't look good for some reasons.
Finally, two operations are available :
download .csv
: This will download extracted data, except ignored one, as a .csv file.save to database
: This will save extracted data, except ignored one, to the sqlite3 database so that you can access them later in theMaterials
tab.
You're now done on how to upload your datasheets and parse material compositions from them!
Materials database
If you have saved some results to the database, come back to TheMaterialParser home page, and go to the Materials tab.
By default, all the available materials will be displayed. You can then apply some filters using the control pane to the left :
- Name : search materials by name
- Categories : select materials that come from datasheets belonging to the specified categories
- Components : select materials that contain at least all the specified components.
Finally, you can download the selected data using the Download .csv
button. You can also delete some materials from this tab.