dataset telemark 120 - statnett/Talk2PowerSystem GitHub Wiki

Task: https://github.com/statnett/Talk2PowerSystem_PM/issues/56

  • Statnett to describe queries that can be answered only by Telemark-120
  • Graphwise to analyse and describe (variety of data and volumetrics)
  • Post issues that might need to be fixed, and give some estimates.
  • Statnett/Graphwise to look if there's correlation of resources between Nordic44 and Telemark-120, because cross-dataset questions maybe most interesting

Intro

Telemark-120 (formerly CIM4NoUtility and DIGIN10 Grunnprofil) is another dataset of the NO grid It includes extra kinds of data:

  • Aviation obstacles
  • Geospatial data
  • Timeseries data ("schedules")

TODO: the main problem is where to obtain ontologies for all the novel data in this dataset

Scope

"Map of the territory" from FileNameStandard.adoc, and complemented with actual files found (see RDF Files to Load)

"Roles"

  • HV1: High Voltage
  • MV1: Medium Voltage
  • LV1: Low Voltage
  • M1: Manufacture

Usual profiles (also found in ENTSO-E Conformance and Nordic44):

  • DL: Diagram Layout
  • EQ: Equipment Core
  • GL: Geographical Location: but includes GeoJSON
  • OP: Equipment Operation
  • SSH: Steady State Hypothesis
  • SV: State Variables
  • TP: Topology

Assets (added recently to Nordi44)

  • AC: Asset Catalog (as per manufacturer)
  • AS: Asset information (as per owner)

Less usual profiles

  • CU: Customer
  • SC: Equipment Short Circuit
  • OR: Object Registry
  • AO: Aviation Obstacle
  • Boundary Models:
    • HV1-MV1_BM: High Voltage - Medium Voltage
    • MV1-LV1_BM: Medium Voltage - Low Voltage

Reference Data

  • BaseVoltage_RD: Base Voltage
  • GeographicalRegion_RD: Geographical Region
  • MeasurementValueSource_RD: Measurement Value Source
  • ReadingQualityType_RD: Reading Quality Type
  • ReadingType_RD: Reading Type

Folders and Files

Clone the repo

git clone [email protected]:3lbits/CIM4NoUtility.git
cd CIM4NoUtility

Show all folders and files (without .git folders)

ls -1RF .

Count files

ls -1RF .|grep -c '\*$'
263

Count folders

ls -1RF .|grep -c '/$'
33

List folders

find . -type d
./Code Scripts/Neo4J
./Code Scripts/Python
./Code Scripts/Python/cim-convert-tool
./Code Scripts/Python/cim-convert-tool/archive
./Code Scripts/Python/cim-convert-tool/archive/20230413
./Code Scripts/Python/cim-convert-tool/_config
./Code Scripts/Python/cim-convert-tool/_config/Archive
./Code Scripts/Python/cim-convert-tool/_config/Archive/Previous
./Code Scripts/Python/cim-convert-tool/_data
./Code Scripts/Python/cim-convert-tool/_data/CIMXML
./Code Scripts/Python/cim-convert-tool/_data/CIMXML_Output
./Code Scripts/Python/cim-convert-tool/_data/JSON-LD
./Telemark-120
./Telemark-120/Asset
./Telemark-120/Asset/CIMJSON-LD
./Telemark-120/Asset/CIMJSON-LD/AviationObstacle
./Telemark-120/Asset/CIMJSON-LD/AviationObstacle/jsonSchemas
./Telemark-120/Asset/CIMJSON-LD/AviationObstacle/jsonSchemas/SubSchemas
./Telemark-120/Asset/CIMXML
./Telemark-120/diagrams
./Telemark-120/diagrams/images
./Telemark-120/diagrams/png
./Telemark-120/diagrams/svg
./Telemark-120/docs
./Telemark-120/Grid
./Telemark-120/Grid/CIMJSON-LD
./Telemark-120/Grid/CIMXML
./Telemark-120/Schedule
./Telemark-120/Schedule/2022
./Telemark-120/Schedule/2022/03
./Telemark-120/Schedule/2022/03/30
./Telemark-120/Validation

Docs

Important docs that I list in subjective order of importance.

Modeling

File manipulation

Diagrams

JSON-LD Representation

In addition to CIMXML this dataset includes JSON-LD

cd CIM4NoUtility/Telemark-120/Asset/CIMJSON-LD
riot.bat --formatted trig Telemark-120-LV1_AS.jsonld > Telemark-120-LV1_AS.trig

JSON-LD Pros

  • Uses latest CIM namespaces (TODO check no-nc extensons)
PREFIX cim:     <https://cim.ucaiug.io/ns#>
PREFIX eu:      <https://cim.ucaiug.io/ns/eu#>
  • Uses urn:uuid: for resources, so there is no trouble with under-defined or divergent URLs
  • Uses named graphs for models
  • Uses the latest dcat representation of model metadata.
    • (Doesn't use differential models and the dcat-cim extensions)
  • Uses strict naming conventions for name and descripition, and has mRID:
<urn:uuid:fc57ebb1-8059-4dd7-9811-2129b912dd45>
  rdf:type                         cim:Terminal;
  cim:IdentifiedObject.name        "TELEMA2 04 T15";
  cim:IdentifiedObject.description "Telemarkstien2 400 Volt Terminal 15";
  cim:IdentifiedObject.mRID        "fc57ebb1-8059-4dd7-9811-2129b912dd45";

JSON-LD Cons

  • There are still a number of open json issues (see all json issues)
  • Lexical form '2022-10-28T13:37:00Z' not valid for datatype XSD date: Reported as JSON-LD dct:issued: wrong datatype #361
  • Dataset metadata (model header) is in the default graph. This differs from the decision in Inst4CIM-KG and Nordic44
  • Uses plain strings for some numeric values, eg
    "cim:Asset.purchasePrice": "500.00",
  • Uses JSON numbers for other numeric values
  "cim:EndDevice.timeZoneOffset": 1.0,

  "cim:Conductor.length":        50.0,
  "cim:ACLineSegment.bch":        0.0,
  "cim:ACLineSegment.gch":        0.0,
  "cim:ACLineSegment.r":          0.015999999945951,
  "cim:ACLineSegment.x":          0.003769911221053,

Because JSON numbers don't have strict separation into integer, float and double, this results in some surprises.

 cim:EndDevice.timeZoneOffset 1; # integer

 cim:Conductor.length        50; # integer
 cim:ACLineSegment.bch        0; # integer
 cim:ACLineSegment.gch        0; # integer
 cim:ACLineSegment.r          1.5999999945951E-2; # double, scientific notation
 cim:ACLineSegment.x          3.769911221053E-3;  # double, scientific notation

But these numbers are comparable and orderable, so that is fine.

Round-tripping

_data is testing data to check whether the round-tripping CIMXML -> JSON-LD -> CIMXML_Output provides correct results. I diffed a few files and here are some problems:

  • JSONLD to XML swaps fields scenarioTime and description #358
  • Omitted metadata fields dcat:keyword, dct:conformsTo, md:Model.profile
  • Omitted data fields: eu:IdentifiedObject.shortName, eu:BoundaryPoint.isDirectCurrent, eu:BoundaryPoint.isExcludedFromAreaInterchange
    • And probably other eu, nc properties since _config is hard-coded for CIM17
  • Numbers like 50 are converted to 50.0 (but that's very minor)

XML vs JSON-LD Files

I want to make sure that

  • All XML files are also available as JSON-LD
  • All round-trip test files (DIGIN10-30) are also available in the official distribution (Telemark-120)
find . -iname Telemark-120-*.jsonld -printf "%f\n" | perl -pe 's{Telemark-120-(.*).jsonld}{$1}' |sort> files-Telemark-jsonld.txt
find . -iname Telemark-120-*.xml    -printf "%f\n" | perl -pe 's{Telemark-120-(.*).xml}{$1}'    |sort> files-Telemark-xml.txt
find . -iname DIGIN10-30-*.jsonld   -printf "%f\n" | perl -pe 's{DIGIN10-30-(.*).jsonld}{$1}'          > files-DIGIN-jsonld.txt
find . -iname DIGIN10-30-*.xml      -printf "%f\n" | perl -pe 's{DIGIN10-30-(.*).xml}{$1}'             > files-DIGIN-xml.txt

A comparison of the file lists shows that is indeed the case. The only differences are:

  • files-DIGIN-xml.txt has the files twice (in folders CIMXML vs CIMXML_Output)
  • Telemark-120_AO.jsonld is not included in the test files
  • The test files are older

RDF Files to Load

Given the checking we did in the previous section, the files to load are:

find . -iname Telemark-120*.jsonld
./Telemark-120/Asset/CIMJSON-LD/AviationObstacle/Telemark-120_AO.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-LV1_AS.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-LV1_CU.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-M1_AC.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-MV1_AS.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-MV1_CU.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-ReadingQualityType_RD.jsonld
./Telemark-120/Asset/CIMJSON-LD/Telemark-120-ReadingType_RD.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-BaseVoltage_RD.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-GeographicalRegion_RD.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-HV1-MV1_BM.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_DL.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_EQ.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_GL.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_OP.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_OR.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_SC.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-LV1_SSH.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MeasurementValueSource_RD.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1-LV1_BM.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1-LV1_SV.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1-LV1_TP.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1_DL.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1_EQ.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1_GL.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1_OP.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1_SC.jsonld
./Telemark-120/Grid/CIMJSON-LD/Telemark-120-MV1_SSH.jsonld

We zip them up to Telemark-120-jsonld.zip as follows:

find . -iname Telemark-120*.jsonld -exec zip -j Telemark-120-jsonld.zip {} \;

We use the -j option to junk (don't record) directory names. This flattens the archive and makes it easier to peruse. Of course, this is a matter of preference.

Non-RDF Files

Telemark-120 includes some useful data that unfortunately is not in RDF.

Time-series Data

Schedule/2022/03/30

GeoJson Data

This includes Geospatial data, but not yet as GeoSPARQL (rather, as GeoJSON)

Data Statistics

Prefixes

The following prefixes are used in JSON-LD (except nc-no), but we put them in a separate file prefixes.ttl so they are also ingested as repository namespaces.

PREFIX cim:     <https://cim.ucaiug.io/ns#>
PREFIX eu:      <https://cim.ucaiug.io/ns/eu#>
PREFIX nc-no:   <https://cim4.eu/ns/nc-no#>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX geo:     <http://www.opengis.net/ont/geosparql#>
PREFIX prov:    <http://www.w3.org/ns/prov#>

Note: posted enhancement request GDB-12329 ingest JSON-LD prefixes as repo namespaces.

Classes

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
select ?x (count(*) as ?c) {
    [] a ?x
    filter(!strstarts(str(?x),str(rdf:)))
    filter(!strstarts(str(?x),str(rdfs:)))
    filter(!strstarts(str(?x),str(owl:)))
} group by ?x order by ?x

87 classes:

  • Models: 26 dcat:Dataset with 21 dcterms:PeriodOfTime (temporal coverage)
  • GeoSPARQL: 4 geo:Feature with 3 geo:Geometry about AviationObstacles.

Correlation to Nordic44

Svein: Yes, there is correlation between mRIDs in Nordic44 and Telemark-120. I already put out a model that integrates Nordic44 and Telemark-120

⚠️ **GitHub.com Fallback** ⚠️