Knowledge Graph Pattern - OSLC/lifecycle-integration-patterns GitHub Wiki

PatternID: 100.004

Name: Knowledge Graph Pattern

Category: Extract-Transform-Load Scenario

Creation Date: June 5, 2018

Creators: Axel Reichwein

Description:

When designing complex systems, engineering organizations use many different software applications to describe different aspects of a system, such as requirements, test cases, architecture models, simulation models, 3D geometric models, etc. As each software application uses different data formats and different APIs, engineering organizations end up having engineering data in many different data formats. This data heterogeneity prevents organizations of efficiently analyzing all their data as a whole.

Engineering organizations currently have silos of data, one for requirements, one for software, one for simulation models and simulation results, one for 3D models etc. Data integration solutions exist but are limited to specific engineering disciplines, such as Application Lifecycle Management (ALM), Product Lifecycle Management (PLM), and Simulation and Process Data Management (SPDM).

Many engineering activities involve cross-cutting concerns (requirements traceability, reuse, change management, project management, risk analysis, trade-off studies). According to David Meza, Head of Knowledge Management at NASA (https://www.youtube.com/watch?v=QEBVoultYJg), the combination of data silos and the need to address cross-cutting concerns leads to

  • engineers having to Look at 13 different sources to find the information they are looking for
  • organizations having to spend 30% of R&D efforts to redo what has already been done before
  • 54% of decisions being taken with inconsistent, or incomplete, or inadequate information

Engineering organizations would therefore like to treat all their data as a whole and perform queries against all the data. Queries at a global level can help organizations to:

  • check what data is related to a change order or a change request
  • perform traceability studies to see how a requirement has been verified
  • perform impact analysis to identify elements which would be impacted by a change
  • verify if design rules are satisfied (e.g. correct usage of units, compatibility of interfaces)
  • generate automated reports containing accurate and complete information

As engineering data can be structured in many different ways (e.g. tabular, relational, document-based, object-oriented), it is necessary to use a common abstract data structure which can represent without information loss all the specific data structures. Such a generic data structure is the graph data structure. Due to its flexible nature, many tech organizations are exposing their data in the form of graphs. Examples include Google Knowledge Graph, Facebook Graph API, LinkedIn Economic Graph, and Microsoft Graph. When data in the form of a graph is query-able, it is often called a knowledge graph.

Engineering organizations need a knowledge graph to address more efficiently cross-cutting concerns, which are at the core of most engineering activities. This is especially important for engineering organizations designing safety-critical systems, such as the aerospace and autonomous vehicle industry, in which system failure can be fatal. The process of retrieving data from local data silos and converting it into a graph format such as RDF, and loading it into a knowledge graph, is called ETL (Extract-Transform-Load). The same ETL process is also used to populate data warehouses, or data lakes. This ETL effort is relatively simple when data is available in a common format like CSV. The ETL effort can be huge, or impossible to realize, when data is available in many different formats, as it is the case with engineering data. Implementing the data extraction and transformation solutions is the most time-consuming activity for setting up a knowledge graph.

The only hope for reducing the ETL effort with engineering data is standardization. Luckily, OSLC provides a standard to expose data in a graph format being RDF (Resource Description Framework). RDF is a widely supported graph data format supported by graph database vendors such as Amazon Neptune. RDF is also suitable for semantic reasoning (e.g. automatic consistency checking, classification) using first order logic. Many knowledge graphs support RDF and the corresponding query language SPARQL.

An OSLC API reads in data in a specific format, converts it into RDF, and exposes it through a REST API. OSLC APIs have been developed amongst others for software applications covering requirements, test cases, change management, simulation models, architecture models. There are at least 30 OSLC APIs for 30 different engineering data formats available commercially or as open-source. Obviously, this is just a first step, as there are over 500 different data formats used in engineering. Nevertheless, these 30 existing OSLC APIs already perform the data extraction and transformation efforts (the first two parts of an ETL workflow) for 30 different data formats to populate a knowledge graph. By relying on OSLC APIs, engineering organizations no longer have to implement their own data extraction and transformation solutions. Instead, they can use data exposed by OSLC APIs to populate a knowledge graph with heterogeneous data.

Furthermore, OSLC APIs support important concepts to sync the knowledge graph with local data repositories. OSLC APIs expose changes to data sets (e.g. TRS, the Tracked Resource Set protocol), metadata, and local versions of data in a standard way. The OSLC TRS concept for example enables the knowledge graph to be updated incrementally based on changes in local repositories.

The value of a knowledge graph in an engineering organization depends on the amount of engineering data which can be consumed by a knowledge graph. With increasing adoption of OSLC, more data formats will be available in RDF, and more engineering data will be consumable by the knowledge graph, thus leading to a more informative knowledge graph.

Stakeholder Roles: Engineers, Project Managers, Business Analysts