dblp KG Tutorial - dblp/kg GitHub Wiki
A guided tour of the dblp KG
With this tutorial we would like to give dblp KG beginners a "guided tour" to help them find out what kind of data can be found in the knowledge graph. We encourage you to run the examples below, modify them to your liking on our live query server and try to develop your own queries. We are convinced that the best way to learn how to query a new knowledge graph is to get right into it and play with it.
Be bold! Even if your query is too demanding for our server to compute a result, you can't really break anything.
SPARQL basics
We cannot provide a general tutorial on the SPARQL query language here, but we refer the reader to the following resources to get started:
- https://cambridgesemantics.com/blog/semantic-university/learn-sparql/sparql-by-example/
- https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial
- https://jena.apache.org/tutorials/sparql.html
Publications
Publications in the dblp KG are realized as entities of class dblp:Publication
. These entities carry a number of bibliographic metadata items like dblp:title
, dblp:numberOfCreators
, dblp:pagination
, yearOfPublication
, dblp:bibtexType
, or a URL pointing to the paper in the web via dblp:documentPage
. Whenever known, publications are also linked with persistently and uniquely identifying IRI's like dblp:doi
, dblp:isbn
, and dblp:wikidata
. For example, if you know the DOI of a paper, you can obtain the statements about it using the following query:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?publ ?pred ?object WHERE {
?publ dblp:doi <https://doi.org/10.1007/978-3-540-76298-0_52> .
?publ ?pred ?object .
}
⚠ An important note on DOIs in the dblp KG: dblp is not a DOI registry. That is, we do not have universal knowledge of all DOIs registered at crossref.org, datacite.org, or any other registry. Instead, we only have knowledge the DOIs that have been communicated as part of the metadata packages from the publishers or otherwise derived from the web. Statements like ?publ dblp:doi ?doi_iri
do only exist for DOIs that are explicitly stored in the dblp dataset. In the absence of a DOI in dblp, you may of course always use the internal dblp key IRI to identify a publication in dblp:
SELECT ?publ ?pred ?object WHERE {
BIND( <https://dblp.org/rec/conf/semweb/AuerBKLCI07> as ?publ )
?publ ?pred ?object .
}
Each publication is also redundantly an entity of a sub-class of dblp:Publication
, according to the type classification scheme used on the dblp website. These sub-classes can be used when a more fine-grained selection of publications is required:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type (COUNT(?type) AS ?count) WHERE {
?subject rdf:type dblp:Publication .
?subject rdf:type ?type .
FILTER( ?type != dblp:Publication )
}
GROUP BY ?type
ORDER BY DESC(?count)
Creators
The authors and editors of publications in the dblp KG are realized as entities of class dblp:Creator
. Creator entities carry metadata such as all alias names via dblp:creatorName
or point to an academic homepage via dblp:homepage
. Whenever known, creators are linked with persistently and uniquely identifying IRI's like dblp:orcid
and dblp:wikidata
. For example, if you know the ORCID of a creator, you can obtain its statements using the following query:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?pers ?pred ?object WHERE {
?pers dblp:orcid <https://orcid.org/0000-0003-2367-0237> .
?pers ?pred ?object .
}
⚠ An important note on ORCIDs in the dblp KG: dblp is not an ORCID registry. That is, we do not have universal knowledge of all ORCIDs registered at orcid.org. Instead, for persons in dblp, we only have knowledge of the ORCIDs that have been explicitly researched, verified and manually annotated to the dblp:Creator
entity by the dblp editorial team. Hence, statements like ?pers dblp:orcid ?orcid_iri
do only exist for ORCIDs that are explicitly stored in that person's dblp person record. In the absence of a personalized ORCID entry, you may of course always use the internal dblp PID IRI to identify a creator in dblp:
SELECT ?pers ?pred ?object WHERE {
BIND( <https://dblp.org/pid/b/ChristianBizer> as ?pers )
?pers ?pred ?object .
}
Each creator is also redundantly an entity of a sub-class of dblp:Creator
. Most of the creator entities in dblp which model normal (human) authors and editors are entities of sub-class dblp:Person
. If it is known that a given creator stands for a group or a consortium, we use sub-class dblp:Group
.
Creator entities that are rather unique to dblp's modelling are creators of sub-class dblp:AmbiguousCreator
. While the dblp team works continuously to identify and disambiguate the "true authors" behind the plain character strings given in bibliographic metadata, this work often leaves a fair number of disambiguation cases unresolved, as the information at hand does not allow for a reliable decision. In this case, we assign an entity as of sub-class dblp:AmbiguousCreator
. For all intents and purposes, such entities are used and referenced just like normal unambiguous creator entities, and they are linked using creator predicates in the usual way. However, when retrieved in complex queries, their ambiguous nature should be understood and results should be handled accordingly.
You can get a statistic of the different creator types used in the dblp KG with the following query:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type (COUNT(?type) AS ?count) WHERE {
?subject rdf:type dblp:Creator .
?subject rdf:type ?type .
FILTER( ?type != dblp:Creator )
}
GROUP BY ?type
ORDER BY DESC(?count)
Authorship relations
The modelling of the authorship relation between publications and their creators is the backbone of the dblp KG and it is conveniently provided by the dblp:createdBy
property. Using the DOI of a paper, you can retrieve its authors using the following query:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?author ?name WHERE {
?publ dblp:doi <https://doi.org/10.1007/978-3-540-76298-0_52> .
?publ dblp:createdBy ?author .
?author rdfs:label ?name .
}
Conversely, you can retrieve all publications created by a given creator, identified by, say, their ORCID:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?publ ?title ?year WHERE {
?pers dblp:orcid <https://orcid.org/0000-0003-2367-0237> .
?publ dblp:createdBy ?pers .
?publ dblp:title ?title .
?publ dblp:yearOfPublication ?year .
}
ORDER BY DESC(?year)
In dblp, creators may be linked to a publication either in their role as an author, or in their role as an editor. The dblp KG redundantly provides two additional sub-properties of dblp:createdBy
in order to distinguish between those two roles, if desired, namely dblp:authoredBy
and dblp:editedBy
:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?publ ?rel ?pers WHERE {
?pers dblp:orcid <https://orcid.org/0000-0003-2367-0237> .
VALUES ?rel { dblp:createdBy dblp:authoredBy dblp:editedBy }
?publ ?rel ?pers .
}
ORDER BY ?publ ?rel
Signatures
The dblp:createdBy
property and its sub-properties are intended for convenient use if only the simple authorship relation between publications and creators is relevant. However, in many cases, further metadata about that authorship relation might be required. To this end, the dblp KG contains dblp:Signature
entities to provide more context to this otherwise simple link. Signatures are linked to publications using the dblp:hasSignature
property. To distinguish between the roles of an editor and an author, the two signature sub-classes dblp:AuthorSignature
and dblp:EditorSignature
are used:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?publ ?pers ?type WHERE {
?pers dblp:orcid <https://orcid.org/0000-0003-2367-0237> .
?sig dblp:signatureCreator ?pers .
?sig dblp:signaturePublication ?publ .
?sig rdf:type ?type .
}
ORDER BY ?publ ?type
Besides linking publications and their creators, signature entities may link to an ORCID IRI that has been stated in the publication's metadata using dblp:signatureOrcid
and provides the relative position of a publication's creator in the complete creator list using dblp:signatureOrdinal
:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?nr ?name ?orcid WHERE {
?publ dblp:doi <https://doi.org/10.1007/978-3-540-76298-0_52> .
?publ dblp:hasSignature ?sig .
?sig dblp:signatureOrdinal ?nr .
?sig dblp:signatureDblpName ?name .
OPTIONAL { ?sig dblp:signatureOrcid ?orcid . }
}
In future iterations of the dblp KG, we aim to provide additional context via the signature entities, such as the affiliation information provided in the publication.
Citations
We do not collect and store citation information ourselves within the dblp computer science bibliography. Instead, we make use of the open citation data released by the marvelous folks at OpenCitations. Please be aware that while OpenCitations may be the best open collection of citation data available, it is still incomplete and numerous citation links may not be openly available. However, the corpus is making steady progress to match the data provided by commercial services.
The OpenCitations data model uses cito:Citation
entities from the Citation Typing Ontology (CiTO) to represent a citation link. Each citation entity links both citing and a cited document using cito:hasCitingEntity
and cito:hasCitedEntity
. Those documents are identified by their OpenCitations Meta Identifier (OMID). A citation may also optionally provide a date for the citation (cito:hasCitationCreationDate
) and the time that has passed between publications and citation of the cited document. For example, the following query retrieves the data for a single citation link from a known citation IRI:
PREFIX cito: <http://purl.org/spar/cito/>
SELECT ?citation ?citing_omid ?cited_omid ?citation_date ?citation_timespan WHERE {
BIND(<https://w3id.org/oc/index/ci/06503267503-06703780559> as ?citation)
?citation cito:hasCitingEntity ?citing_omid .
?citation cito:hasCitedEntity ?cited_omid .
OPTIONAL { ?citation cito:hasCitationCreationDate ?citation_date . }
OPTIONAL { ?citation cito:hasCitationTimeSpan ?citation_timespan . }
}
To allow for combined queries of dblp metadata and OpenCitations citation data, we link dblp publications to their OMID. You can obtain a full mapping from our SPARQL query service by means of the dblp:omid
property:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX schema: <https://schema.org/>
SELECT * WHERE {
?dblp dblp:omid ?omid .
OPTIONAL { ?omid schema:url ?url . }
}
Please note that not all citing or cited papers (identified by an OMID) necessarily need to be indexed in dblp, as dblp remains solely focused on core computer science publications. Whenever possible, however, we try to specify an outgoing URL to the document using the schema:url
property.
Using the linkage provided by OMID, you may perform citation analyses using our SPARQL query service. For example, if you know the DOI of a paper, you can request all papers citing it using the following query:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>
SELECT ?citing_omid ?citing_publ ?citing_label ?citing_url WHERE {
?this_publ dblp:doi <https://doi.org/10.1007/978-3-540-76298-0_52> .
?this_publ dblp:omid ?this_omid .
?cite cito:hasCitedEntity ?this_omid .
?cite cito:hasCitingEntity ?citing_omid .
OPTIONAL {
?citing_publ dblp:omid ?citing_omid .
?citing_publ rdfs:label ?citing_label .
}
OPTIONAL { ?citing_omid schema:url ?citing_url . }
}
Streams
In dblp, we use the term "stream" to refer to any journal, conference series, book series, or repository that acts as a regular source for publications. Such streams are modeled in the dblp KG as entities of class dblp:Stream
. Streams carry metadata like alias and former tiles using dblp:streamTitle
, or point to a stream's homepage via dblp:webpage
. Stream entities are linked with persistently and uniquely identifying IRI's like dblp:issn
and dblp:wikidata
. For example, if you know the Wikidata Q of a stream, you can obtain its statements using the following query:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?stream ?pred ?object WHERE {
?stream dblp:wikidata <http://www.wikidata.org/entity/Q6053150> .
?stream ?pred ?object .
}
Each stream is also redundantly an entity of a sub-class of dblp:Stream
. Specifically, dblp:Journal
models periodically published journals, dblp:Conference
models conference or workshop series, and dblp:Series
models series of published volumes like monograph series and proceedings series. Only very recently, we expanded the dblp data model to also include the fourth, new sub-class dblp:Repository
for sources of research data and artifacts. These sub-classes can be used when a selection of, say, "only journals" or "only conferences" is desired:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type (COUNT(?type) AS ?count) WHERE {
?subject rdf:type dblp:Stream .
?subject rdf:type ?type .
FILTER( ?type != dblp:Stream )
}
GROUP BY ?type
ORDER BY DESC(?count)
Publications are linked to the streams they appeared in using the property dblp:publishedInStream
. A single publication might be linked to multiple streams in that way. For example, a conference paper might be linked to both the stream of its conference event series as well as the stream of the book series that publishes the conference proceedings. As another example, a paper might be published in a joint proceedings volume of two or more conference series and, hence, be linked to two conference streams, as the following query shows:
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?publ ?stream WHERE {
?publ dblp:doi <https://doi.org/10.1007/978-3-540-76298-0_52> .
?publ dblp:publishedInStream ?stream .
}
Stream relations
We aim to model relationships between streams using the dblp:relatedStream
property.
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?stream ?other_stream WHERE {
?stream dblp:wikidata <http://www.wikidata.org/entity/Q6053150> .
?stream dblp:relatedStream ?other_stream .
}
Again, we redundantly use sub-properties of dblp:relatedStream
to model hierarchical relations (dblp:subStream
and dblp:superStream
) in cases of streams that take place or are published as part of another stream, and temporal relations (dblp:predeccessorStream
and dblp:successorStream
) in cases where streams merge with or are replaced by another stream:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?relation (COUNT(?relation) AS ?count) WHERE {
VALUES ?relation { dblp:subStream dblp:superStream dblp:predeccessorStream dblp:successorStream } .
?subject rdf:type dblp:Stream .
?subject ?relation ?object .
}
GROUP BY ?relation
ORDER BY DESC(?count)
Note that if the relation between two streams is unspecified, then only the dblp:relatedStream
property is used.
Identifiers
We link the entities in dblp to a wide array of external identifiers. While for convenience sake a small selection of central, persistent IDs (like dblp:doi
, dblp:orcid
, or dblp:wikidata
) have their own dedicated property in the dblp KG, we reuse the more flexible datacite:Identifier
entities from the DataCite Ontology in order to model arbitrary external identifiers like GoogleScholar user IDs, Twitter handles, or GND identifiers. We currently list more than 30 identifier schemes in dblp:
PREFIX datacite: <http://purl.org/spar/datacite/>
SELECT ?scheme (COUNT(DISTINCT ?id) as ?count) WHERE {
?item datacite:hasIdentifier ?id .
?id datacite:usesIdentifierScheme ?scheme .
FILTER (?scheme != datacite:dblp-record && ?scheme != datacite:dblp)
}
GROUP BY ?scheme
ORDER BY DESC(?count)
Please be aware that -- other than the convenience properties like dblp:doi
or dblp:orcid
, which directly link to IRIs representing their ID -- the datacite:Identifier
entities link to string literals that specify the identifier:
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX litre: <http://purl.org/spar/literal/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?id_type ?id_scheme ?id_value WHERE {
?pers dblp:orcid <https://orcid.org/0000-0003-2367-0237> .
?pers datacite:hasIdentifier ?id .
?id rdf:type ?id_type .
?id datacite:usesIdentifierScheme ?id_scheme .
?id litre:hasLiteralValue ?id_value .
}