Meta Metadata Language for Wrappers - ecologylab/BigSemantics GitHub Wiki
One unique contribution of the Meta-Metadata language is its integration of data model definition, information extraction, presentation, and subsequent semantic actions. Another, for web semantics, is its polymorphic type system and support for inheritance.
Polymorphism of the metadata type system enables seamlessly integrating new wrappers into existing systems. The repository and the type system help us and interested developers curate wrappers, forming an infrastructural basis to support applications working with heterogeneous and interconnected metadata on the web.
Informative research papers include the 2010 CIKM paper and the 2014 EICS paper
- Wrappers define the data models for different kinds of metadata, such as books or electronic products. Data models are strongly typed, consist of field declarations, and may be nested, cross-linked, or recursive. Defined data structures (i.e. types) can be reused by inheritance, like in object-oriented programming languages.
See Data Fields for specific information on the types of fields available.
- Before we can extract metadata, we must first know which meta-metadata Wrapper should be used. Selectors allow us to selected a meta-metadata wrapper based on the target document's URL.
- For each metadata field, extraction rules can be specified on the corresponding field in the wrapper, to define where to find the relevant information from the source web page, and how to transform it. BigSemantics supports XPath and regular expressions for finding and transforming information.
- Wrapper authors can specify semantic actions which will be performed on extracted metadata. BigSemantics supports semantic actions such as normalizing the input URL, branching on condition, looping, and "bridge functions" that connect to the applications and execute user defined tasks.
- Presentation semantics are high level directives or CSS styles for guiding the presentation of extracted metadata. For example, MICE uses presentation semantics to change font, re-order or hide fields, and create navigation links when displaying metadata.