Requirements - EPDF-Extractor/indu-doc-transformer GitHub Wiki

Requirements Document

These user requirements outline the core functionalities the system must provide from the perspective of the end-user. They will serve as the foundation for detailed design and development.

Definitions

  • External Processor (EP) - any external program, which uses the system. (source email 14.07, 5)
  • Level same as Aspect same as device type, has type, can be assigned with Prefix
  • Prefix same as Separator - + ++ - = == / : &
  • Tag - ( Prefix [ Name ] )* - unique for an object - can be recomposed by Level priority
    • Prefix [ Name ] = Aspect
  • Object - also XTarget (object is too confusing) has Tag and ID (guid format). Can have attributes
  • Connection - has Object tgt1 and Object tgt2. Can go through Cable Can have attributes
  • Cable is an Object has multiple Wires, which can go from src object to tgt object. Can have attributes
  • Wire or Link - connection between two connection points. Can have attributes
  • Connection Point same as Pin - objects are not connected directly but through connection points at src and dst

The priority lies in data that supports identifying devices and their connectivity relevant for virtual commissioning

User Stories

  1. The core purpose of the software is to translate structured information from the EPLAN PDF export into a corresponding XML format. Therefore, the base workflow remains as follows: Open PDF β†’ Parse PDF β†’ Produce and Save XML

  2. A preview of the extracted data would be very helpful β€” ideally showing: The object hierarchy (similar to the "Betriebsmittelbaum" structure in the PDF) Page references where the objects were found And eventually, all additional information discovered per object

More user stories are in a raw state under the wiki's meetings section.

Priority notation

as discussed with the customer, certain requirements have higher priorities than other.

Following notation is used:

  • must - functionality must appear in the final product. Out of all requirements, it must be done first. Out of equal priorities, one is selected when the iteration plan is discussed on the regular meetings
  • should - functionality is greatly desired, but it might not appear in the final product
  • can functionality is desired, but is planned as extension

User Requirements list

  • UF1 The system must provide following workflow for External Processor:

    • UF1.1 The system must provide feature "Open input document"
    • UF1.2 The system must provide feature "Parse input document"
    • UF1.3 The system must provide feature "Preview output document"
    • UF1.4 The system can provide feature "Manipulate output"
    • UF1.5 The system must provide feature "Produce output document"
    • (source email 14.07, 2)
  • UF2 The system must accept PDF documents as input (UF1.1) for User.

    • (source: project task presentation, pg. 4)
  • UF3 The system must extract all objects found in PDF for User (comm. object list is similar to the "Betriebsmittelliste" = β€œDevice tag list” structure in the PDF).

    • (source email 14.07, 2)
    • UF3.1 The system must extract from structured data (=tables) (source email 14.07, 5)
    • UF3.2 The system can extract from unstructured data (=any other place where object is referenced by tag)
  • UF4 The system must extract following information for each object for External Processor:

    • UF4.1 Tag (source: email 14.07, XML)
    • UF4.2 Name (source: email 14.07, XML)
    • UF4.3 ID as guid (source: email 14.07, XML)
    • UF4.4 Prefix (source: email 14.07, XML)
    • UF4.5 Device type (sensor, actuator, ...) (source: email 14.07, 5)_
    • UF4.6 Page references where the object was found
    • UF4.7 Other existing attributes
    • (source: email 14.07, 5; XML; example PDF)
  • UF5 The system should extract following information for each connection for External Processor:

    • UF5.1 Connection endpoints (From/To, β€œvon/bis”):
      • UF5.1.1 Target Designations (Object from, Object to)
      • UF5.1.2 Connection points (pin from, pin to)
      • UF5.1.3 Terminal Designation (Object this)
      • UF5.1.4 Direction
      • (source: project task presentation, pg. 4)
    • UF5.2 Conductor(s)
    • UF5.3 Cable name
    • UF5.4 Cable type
    • UF5.6 Diameter
    • UF5.7 Length
    • UF5.8 Other existing attributes
    • (focus on the "KabelΓΌbersicht")
    • (source: email 14.07, 3)
  • UF18 The system must allow search for following extracted information

    • UF18.1 Tag
    • UF18.2 attribute name
    • UF18.2 attribute value
  • UF6 Before export the system must provide a preview of extracted data for user_(source email 14.07, 2)_

    • UF6.1 Preview must show the object hierarchy (similar to the "Betriebsmittelbaum" structure in the PDF)
    • UF6.2 Page references where the objects were found with the rectangular outline on the page (precision up to a table row)
    • UF6.3 All information discovered per object
    • UF6.4 tree text preview
    • UF6.5 visual foldable tree preview
  • UF7 The system must use separators for level detection.

    • UF7.1 Separator must consist of one or combination of symbols.
    • UF7.2 Separator must be unique.
    • UF7.3 Separator must correspond to a level with a priority.
    • UF7.4 The System must parse separators from object tags.
  • UF17 The system must have levels.

    • UF17.1 level must have type (e.g., plant, component). (source: project task presentation, pg. 7)
    • UF17.2 level must have priority.
  • UF16 The system must allow configuration of levels (source: project task presentation, pg. 7)

    • UF16.1 The system must allow changing the priority of the levels
    • UF16.2 The system must allow changing separators of the levels
    • UF16.3 The system can allow changing visibility of aspect in the export view
  • UF8 The system can allow enhancement of object data:

    • UF8.1 Add attributes to objects.
    • UF8.2 Define relations between objects.
  • UF9 The system should generate export:

    • UF9.1 JSON export.
    • UF9.2 XML export.
  • UF10 The system should have following internal structure:

    • UF10.1 A list of objects.
    • UF10.2 A list of connection.
    • UF10.3 An object can have any number of attributes.
    • UF10.4 Levels configuration.
    • UF10.5 Pages configuration.
    • UF10.6 A list of aspects.
    • UF10.7 A list of connection points / pins.
  • UF20 The system must create guids persistent through extractions.

    • UF20.1 - for objects
    • UF20.2 - for connections
    • UF20.3 - for aspects
    • UF20.4 - for connection points /pins
  • UF12 The system must be capable of working with tables in different languages.

  • UF21 The system should provide error handling mechanism

  • UC1 The system must store parsed data in a structured internal format.

  • UC2 The system must have persistent parsed data storage (database).

  • UC3 The system must be under MIT License

  • UC4 The system must allow customization of pdf processing behavior (rationale: as the system is going to be reused with PDFs of potentially different structure. source: stakeholder X)

  • UC5 The system must be available for non-human users.

  • UC6 The system must be written in python.

  • UC7 The system should be deployed publicly.

  • UC8 The system's source code must be written in English.

  • UC9 The system's source code must be available publicly.

  • UC10 The EPLAN PDFs used as an input for the system cannot be open to public in any form (including processing by LLMs).

  • UF14 The system can extract objects and their information from vectorized images (SVG).

  • UF15 The system can extract objects and their information from rasterized images (JPG).

  • UF19 The system must provide interfaces:

    • UF19.1 The system must provide GUI interface.
    • UF19.2 The system must provide CLI interface.
  • UF11 The system should provide AutomationML export.

    • UF11.1 The system should provide ECAD tree in AutomationML export.
    • UF11.2 The system should provide aspect trees in AutomationML export.
    • UF11.3 The system can provide APC tree in AutomationML export.
    • UF11.5 The system should provide following attributes object and aspect in AutomationML export.
      • UF11.5.1 Id (unique per AML)
      • UF11.5.2 diamondId (unique per object appearance in the AML)
      • UF11.5.3 BMK = joined full aspect string (including empty aspects higher in the hierarchy)
  • UF13 The system can provide a chat-like interface.

Test of the system against requirements (27.10)

Requirement Implemented Fully (βœ…)/ Partially (🟑) / Not (❌) Additional Info
UF1 βœ…
UF1.1 βœ… See picture 1
UF1.2 βœ… See picture 1
UF1.3 βœ… See picture 1
UF1.4 ❌ Was deprioritized in favor of APC tree
UF1.5 βœ… See picture 1
UF2 βœ… See picture 1
UF3 βœ…
UF3.1 βœ… The system detects specified tables and extracts all information from it
UF3.2 ❌ Too large effort was expected, deprioritized
UF4.* βœ… See picture 2
UF4.6 βœ… See picture 3
UF5.* βœ… See picture 4
UF18 βœ… See picture 5
UF6 βœ…
UF6.1 βœ… See picture 6
UF6.2 βœ… See picture 3
UF6.3 βœ… See picture 2
UF6.5 βœ… See picture 6
UF7.*, UF17.* βœ… Separators and levels are retrieved from config, their position refer to their priority
UF16.* βœ… See picture 7. For CLI JSON settings file must be edited directly
UF16.* βœ… See picture 7
UF8.* ❌ Was deprioritized in favor of APC tree
UF9.* 🟑 AML export was focused. Other exports can be added through plugin mechanism
UF10.* βœ… See here
UF20.* βœ… See here
UF12 βœ… Performed through page setup
UF21 βœ… Provided on the page basis. See picture 8
UC1 βœ… Stored as a db file. Structure see here
UC2 βœ… Stored as a db file using relational db
UC3 βœ… Ensured that deps follow MIT
UC4 βœ… Implemented plugin mechanism
UC5 βœ… System is available using CLI
UC6 βœ… The system is indeed written in python
UC7 βœ… Deployment is available here
UC8 βœ… The code is indeed written in English
UF14, UF15 🟑 Was deprioritized. Plugin mechanism was implemented to support extension
UF19.* βœ… The system provides CLI and GUI interfaces
UC11 βœ… Simplified object diagram of AML tree can be located here
UF11.3 ❌ After talk with experts it was stated that it is impossible with providen inputs
UF13 ❌ Did not fit into effort distribution

Picture 1

Picture 2

Picture 3

Picture 4

Picture 5

Picture 6

Picture 7

Picture 8