Metadata API - serlo/documentation GitHub Wiki

Metadata API

The Metadata API can be accessed via our GraphQL API.

How to access our GraphQL API

The Serlo GraphQL API is a powerful tool that allows you to fetch data of our educational resources. It is accessible via a POST request to https://api.serlo.org/graphql. The body of the request should contain a GraphQL query as described in the official GraphQL documentation.

To test and refine your queries, you can use our interactive GraphQL playground at https://api.serlo.org/___graphql. This tool provides a user-friendly interface for building queries, exploring the schema, and viewing the returned data.

Understanding the Request Payload and Pagination

The MetadataQuery endpoint is a part of the GraphQL API and allows you to fetch metadata about our educational resources and Serlo itself. It provides few methods for querying data such as resources, publisher, and the current version of the API.

  query {
    metadata {
      resources(first: 10, instance: "de", modifiedAfter: "2023-05-17T00:00:00Z") {
        nodes
        pageInfo {
          hasNextPage
          endCursor
        }
      }
      publisher
      version
    }
  }

The most important one is resources which serves the actual metadata of our educational content. It is structured as follows:

type Resources {
  first: Int!
  after: String
  instance: Instance
  modifiedAfter: String
}

Here's a detailed explanation of each field:

  • first: This is the number of records you want to fetch in one request. The maximum value for this field is 1000 at the time of writing. This field is crucial for implementing pagination in your queries. By adjusting the first parameter, you can control the number of records fetched in each request.

  • after: This is an optional field. If provided, the API will return records with a higher id as specified by the after property. This is also known as a cursor and in conjunction with first, you can implement so called cursor-based pagination. For example, to fetch the first 10 records, you would set first to 10. To fetch the next 10 records, you would leave first unchanged and set after to the id of the last record you fetched (see endCursor parameter from the pageInfo property).

  • instance: This is also an optional field. If provided, the API will return records for the specified instance. The instance field is of type Instance as defined in your GraphQL schema. Specify either "de" | "en" | "es" | "ta" | "hi" | "fr".

  • modifiedAfter: This is another optional field. If provided, the API will return records that were modified after the specified date and time. The string must be in the ISO 8601 datetime format YYYY-MM-DDTHH:MM:SSZ.

The GraphQL API returns a hasNextPage object that helps you navigate through the data. It has the following interface:

type HasNextPageInfo {
  hasNextPage: Boolean!
  endCursor: String
}

This object tells you whether there are more records to fetch and provides a cursor to the last fetched record (endCursor). You can use this cursor as the after parameter in your next query to continue fetching records from where you left off. This way, you can efficiently paginate through large sets of data while respecting the limit of first and without overloading the server or the client.

Querying the API

You can query the API using a GraphQL query. Here's an example of how you can do it using curl:

curl --location 'https://api.serlo.org/graphql' \
--header 'Content-Type: application/json' \
--data '{                                                           '\
'    "query": "query($first: Int, $after: String,                   '\
'                    $instance: Instance, $modifiedAfter: String) { '\
'            metadata {                                             '\
'                resources(first: $first, after: $after,             '\
'                         instance: $instance,                      '\
'                         modifiedAfter: $modifiedAfter) {          '\
'                    nodes                                          '\
'                    pageInfo {                                     '\
'                       hasNextPage                                 '\
'                       endCursor                                   '\
'                    }                                              '\
'                }                                                  '\
'            }                                                      '\
'        }",                                                        '\
'    "variables": {                                                 '\
'        "first": 10,                                               '\
'        "after": "NjI4Mw==",                                       '\
'        "instance": "de",                                          '\
'        "modifiedAfter": "2023-05-17T00:00:00Z"                    '\
'    }                                                              '\
'}'

In this example, we're fetching the first 10 records after the record with ID NjI4Mw== for the de instance that were modified after 2023-05-17T00:00:00Z.

Querying in Node.js

Note that the following code example requires a Node.js version of 18 or greater to have the fetch function available in the global scope without third party packages. If you're on a lower Node.js version, consider using node-fetch or similar libraries. Here's how you can query the API:

const gql = require("graphql-tag");
const { print } = require("graphql/language/printer");

const query = gql`
  query ($first: Int, $instance: Instance, $modifiedAfter: String) {
    metadata {
      resources(
        first: $first
        instance: $instance
        modifiedAfter: $modifiedAfter
      ) {
        nodes
      }
      publisher
      version
    }
  }
`;

const variables = {
  first: 10,
  instance: "de",
  modifiedAfter: "2023-05-17T00:00:00Z",
};

const response = await fetch("https://api.serlo.org/graphql", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    query: print(query),
    variables,
  }),
})
const metadata = await response.json()

Querying in Python

The following shows an example of how a request in Python could look like:

import requests

payload = {
    "first": 10,
}

response = requests.post(
    "https://api.serlo.org/graphql",
    headers={"Content-Type": "application/json"},
    json={
        "query": """
        query($first: Int) {
            metadata {
                resources(first: $first) {
                    nodes
                }
            }
        }
    """,
        "variables": {
            "first": payload["first"],
        },
    },
)

print(response.json())

Tips for API consumer

  • Start with a small value for first to test your query and then gradually increase it.
  • Use the after field to paginate through the records.
  • Use the instance and modified_after fields to filter the records.
  • Ensure you don't have duplicated records. Use the provided id for each record to identify the resources.
  • Store the date of when you queried the database for the last time and have a CRON job run every few weeks with modified_after set to said date. That way, you'll only get entities that have changed, or were added since your last request.
  • Always check the API response for errors. If there's an error, the API will return an errors field in the response.

Metadata format and response

As we wanted to base the Metadata API on a good standard that can be widely adopted by the education community, we helped draft the "Allgemeines Metadatenprofil für Bildungsressourcen" AMB standard. It translates to "general metadata profile for educational resources" and is based on schema.org and JSON-LD. In summary, it is a metadata standard for learning resources and designed to provide a structured, universal way to describe and categorize learning resources, making them easier to find, parse and use. The following is a brief summary and description of each property in the AMB standard and schema.org that we are returning.

@context This property provides the context for interpreting the JSON-LD document. It includes the language, vocabulary, and definitions for the type and id properties.

id This property is a unique identifier for the learning resource. It is a URL that points to the resource.

type This property specifies the type of the learning resource. It can be an array of types. It's vocabulary is defined by the LearningResource and classes of (CreativeWork)[https://schema.org/CreativeWork] of schema.org.

creator This property describes the creator(s) of the learning resource. Each creator is an object with properties for id, name, type, and affiliation. The affiliation is another object that describes the organization the creator is affiliated with (containing id, type and name)

dateCreated This property specifies the date the learning resource was created.

dateModified This property specifies the date the learning resource was last modified.

headline This schema.org property provides a headline/title for the learning resource.

identifier This schema.org property provides an additional identifier for the learning resource. It is an object with properties for type, propertyID, and value.

isAccessibleForFree This property indicates whether the learning resource is accessible for free. For Serlo content, this will always be true!

isFamilyFriendly This property indicates whether the learning resource is family-friendly. For Serlo content, this will also always be true.

inLanguage This property specifies the language(s) of the learning resource as an array.

learningResourceType This property describes the type of learning resource. It is an array of objects, each with an id property that points to a definition of the resource type. The vocabulary for it is defined in the OpenEduHub resource type.

license This property describes the license under which the learning resource is distributed.

mainEntityOfPage This property contains description about our metadata – with information about the publisher of the metadata (Serlo Education e.V.) and when it was created.

maintainer This property describes the maintainer of the learning resource. In our API, it always has the same structure and content as the affiliation of the creator field seen above.

name This property provides a name for the learning resource.

isPartOf This property describes the larger resource(s) that the learning resource is part of. It is an array of objects, each with an id property that points to the larger resource.

publisher This property describes the publisher(s) of the learning resource. Each publisher is an object with properties for id, type, and name. In our API, it always has the same structure and content as the affiliation of the creator field seen above.

version This property provides a version identifier for the learning resource.

Sample response

The following shows a complete example of how the response of a query for an article on Addition could look like.

{
  "@context": [
    "https://w3id.org/kim/amb/context.jsonld",
    {
      "@language": "de",
      "@vocab": "http://schema.org/",
      "type": "@type",
      "id": "@id"
    }
  ],
  "id": "https://serlo.org/1495",
  "type": [ "LearningResource", "Article" ],
  "creator": [
    {
      "id": "https://serlo.org/324",
      "name": "122d486a",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/15491",
      "name": "125f4a84",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/22573",
      "name": "12600e93",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/1",
      "name": "admin",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/6",
      "name": "12297c72",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/677",
      "name": "124902c9",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/15473",
      "name": "125f3e12",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
    {
      "id": "https://serlo.org/15478",
      "name": "125f467c",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },

    {
      "id": "https://serlo.org/27689",
      "name": "1268a3e2",
      "type": "Person",
      "affiliation": {
        "id": "https://serlo.org/#organization",
        "type": "Organization",
        "name": "Serlo Education e.V.",
      }
    },
  ],
  "dateCreated": "2014-03-01T20:36:44+00:00",
  "dateModified": "2014-10-31T15:56:50+00:00",
  "headline": "Addition",
  "identifier": {
    "type": "PropertyValue",
    "propertyID": "UUID",
    "value": 1495
  },
  "isAccessibleForFree": true,
  "isFamilyFriendly": true,
  "inLanguage": [ "de" ],
  "learningResourceType": [
    { "id": "http://w3id.org/openeduhub/vocabs/learningResourceType/text" },
    { "id": "http://w3id.org/openeduhub/vocabs/learningResourceType/worksheet" },
    { "id": "http://w3id.org/openeduhub/vocabs/learningResourceType/course" },
    { "id": "http://w3id.org/openeduhub/vocabs/learningResourceType/web_page" },
    { "id": "http://w3id.org/openeduhub/vocabs/learningResourceType/wiki" },
  ],
  "license": { "id": "https://creativecommons.org/licenses/by-sa/4.0/" },
  "mainEntityOfPage": [{
    "id": "https://serlo.org/metadata-api",
    "provider": {
       "id": "https://serlo.org/#organization",
       "type": "Organization",
       "name": "Serlo Education e.V."
    },
  }],
  "maintainer": {
    "id": "https://serlo.org/#organization",
    "type": "Organization",
    "name": "Serlo Education e.V.",
  },
  "name": "Addition",
  "isPartOf": [
    { "id": "https://serlo.org/1292" },
    { "id": "https://serlo.org/16072" },
    { "id": "https://serlo.org/16174" },
    { "id": "https://serlo.org/33119" },
    { "id": "https://serlo.org/34743" },
    { "id": "https://serlo.org/34744" },
  ],
  "publisher": [
    {
      "id": "https://serlo.org/#organization",
      "type": "Organization",
      "name": "Serlo Education e.V.",
    }
  ],
  "version": { "id": "https://serlo.org/32614" },
}

License: CC BY-SA 4.0

The metadata API uses the CC BY-SA 4.0 license. Note that the content itself has another license (which is also CC-BY-SA in most cases) which can be accessed by the license property. This is a human-readable summary of the license:

You are free to

  • Share — copy and redistribute the metadata in any medium or format
  • Adapt — remix, transform, and build upon the metadata for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms

  • Attribution — You must give appropriate credit to Serlo Education e.V., provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
  • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

For API consumers, this means that you can use and adapt the data from the API for any purpose, including commercial purposes, as long as you provide appropriate credit and distribute your contributions under the same license. You also cannot apply any additional legal or technological restrictions that would prevent others from doing anything the license permits.

Publisher API Documentation

Overview

The Publisher API is a part of the Metadata API that provides information about the publisher of the content. You can use it to retrieve information about Serlo.

Querying the Publisher API

To query the Publisher API endpoint, use the publisher field like this:

query {
  metadata {
    publisher
  }
}

This query will return an object with metadata about us. You can see the example response below.

Response

The response from the Publisher API looks like the following:

{
  "@context": [
    "https://w3id.org/kim/lrmi-profile/draft/context.jsonld",
    { "@language": "de" }
  ],
  "id": "https://serlo.org/",
  "type": ["EducationalOrganization", "NGO"],
  "name": "Serlo Education e.V.",
  "alternateName": "Serlo",
  "url": "https://de.serlo.org/",
  "description": "Serlo.org bietet einfache Erklärungen, Kurse, Lernvideos, Übungen und Musterlösungen mit denen Schüler*innen und Studierende nach ihrem eigenen Bedarf und in ihrem eigenen Tempo lernen können. Die Lernplattform ist komplett kostenlos und werbefrei.",
  "image": "https://assets.serlo.org/5ce4082185f5d_5df93b32a2e2cb8a0363e2e2ab3ce4f79d444d11.jpg",
  "logo": "https://de.serlo.org/_assets/img/serlo-logo.svg",
  "address": {
    "type": "PostalAddress",
    "streetAddress": "Daiserstraße 15 (RGB)",
    "postalCode": "81371",
    "addressLocality": "München",
    "addressRegion": "Bayern",
    "addressCountry": "Germany"
  },
  "email": "[email protected]"
}

Handling Deleted Data

When data is deleted from our API, it can impact the state of your local data store or application. To ensure your application data remains consistent with the API, it's crucial to handle these deletions appropriately.

Refetching and Reconciliation

If data is deleted from the API, the current recommended approach is to refetch all data and reconcile it with your local database. This process involves comparing the newly fetched data with the data in your database and making necessary updates.

Here's a general outline of the steps you'd have to perform:

  1. Fetch All Data: Make a request to the API to fetch all available data. This data represents the current state of the API after the deletions. You can use a transformation to just keep the entity ids in memory and discard all the other data we are serving.

  2. Compare with Local Data: Compare the fetched data with the data in your local database. For each item in your database, check if it exists in the fetched data.

  3. Handle Deletions: If an item in your database does not exist in the fetched data, it means that the item has been deleted in the API. You should then delete this item from your database to keep it in sync with our API.

  4. Update Database: After all deletions have been handled, don't forget to update your database with the newly fetched data or make an extra request utilizing the modifiedAfter parameter. This will ensure that your database has parity with the current state of our API.

Future outlook

We are evaluating other alternatives to this somewhat cumbersome process of handling entity deletions. We are considering having a distinct API to fetch individual entities based on their id, or an easy way to list recently deleted entities.

Please get in contact with us and let us know if this is something of interest to you and your team. You can find the email to reach us on the metadata website.

Changelog

Changelog 1.0.0

Breaking Changes

Our Metadata API has undergone significant changes from the old version to version 1.0.0. Here's a summary of these changes:

Changes to the GraphQl payload

The property entities was named to resources and is the primary way to fetch metadata from our GraphQL API.

Changes to the @context Property
  • The previous context "https://w3id.org/kim/lrmi-profile/draft/context.jsonld" has been replaced by "https://w3id.org/kim/amb/context.jsonld".
  • Alongside the existing "@language": "de", three additional attributes have been introduced:
Changes to Entity Descriptions
  • The description property is no longer universally available. From now on, it will only be present in entities where a description exists.
Introduction of the creator Property
  • A new creator property has been added, representing an array of objects. Each object in this array corresponds to a different author and includes an id, name, type, and affiliation. The affiliation object always refers to the Serlo organization, represented as follows:
{
  "id": "https://serlo.org/organization",
  "type": "Organization",
  "name": "Serlo Education e.V.",
}
Changes to the learningResourceType Property
  • The learningResourceType property, previously a string, is now an array of objects. Each object has an id property. The value of this property maps to a vocabulary term defined in the AMB standard, or more precisely here.
Changes to the maintainer and publisher Properties
  • The maintainer and publisher properties have been expanded from simple strings to objects with id, type, and name fields. Both now link to the Serlo organization.
Changes to the version Property
  • The version property has transitioned from a simple string to an object containing an id field. This change supports serving and versioning the most current revision of an entity.

Other Changes

Introduction of the about Property
  • The new about property is an array of subjects the resource belongs to.
Introduction of the isPartOf Property
  • The new isPartOf property is an array of objects. Each object includes an id property, which is a URL pointing to the taxonomy of the entity.
Introduction of the mainEntityOfPage Property
  • A new mainEntityOfPage property has been added, which includes an array of objects. Each object in the array contains an id and a provider property. The id links to "https://serlo.org/metadata", while the provider links to the Serlo organization.

Changelog 1.1.0

Changelog 1.2.0

  • Add property image to all resources pointing to a thumbnail for the learning resource (based on the subject).
  • Fix a bug that for some CC-BY-SA resources the URL to the original author was returned as the license URL. Now in those cases it correctly returns a URL pointing to the CC license and the URL of the original author was added in the list of the creators.
  • Links in learningResourceType to the deprecated vocabulary https://vocabs.openeduhub.de/w3id.org/openeduhub/vocabs/learningResourceType/index.html have been deleted.

Changelog 1.3.0

  • We have added some additional filters to exclude some content like pages from our documententation or articles in construction from our metadata API.

Changelog 2.0.0

Breaking Changes

  • We deleted the edges property in the return type of resources.
  • We renamed EntityMetadataConnection to ResourceMetadataConnection (this is the return type if resources)