notes on tika reply to metadata - krickert/search-api GitHub Wiki
import org.apache.tika.metadata.Metadata;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Arrays;
import java.util.stream.Collectors;
public class MetadataToJsonConverter {
private static final ObjectMapper MAPPER = new ObjectMapper();
public static String convertMetadataToJson(Metadata metadata) {
var map = Arrays.stream(metadata.names())
.collect(Collectors.toMap(name -> name, metadata::get));
try {
return MAPPER.writeValueAsString(map);
} catch (Exception e) {
e.printStackTrace();
return "{}";
}
}
public static void main(String[] args) {
var metadata = new Metadata();
metadata.set("Author", "John Doe");
metadata.set("Title", "Sample Document");
System.out.println("Metadata in JSON: " + convertMetadataToJson(metadata));
}
}
We can improve this:
Below is a full example where we define a simple Document bean with two members—a String for the body and a Tika Metadata object—and then provide two functions: one that converts the Document to a Jackson JSON object (an ObjectNode) and another helper that returns the JSON as a String.
import org.apache.tika.metadata.Metadata;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.util.Arrays;
public class DocumentJsonConverter {
private static final ObjectMapper MAPPER = new ObjectMapper();
/**
* Converts a Document into a Jackson ObjectNode.
* The resulting JSON object has two keys: "body" and "metadata".
*/
public static ObjectNode convertDocumentToJson(Document doc) {
ObjectNode root = MAPPER.createObjectNode();
root.put("body", doc.getBody());
ObjectNode metaNode = MAPPER.createObjectNode();
// Convert each metadata entry into a JSON field.
Arrays.stream(doc.getMetadata().names())
.forEach(name -> metaNode.put(name, doc.getMetadata().get(name)));
root.set("metadata", metaNode);
return root;
}
/**
* Helper method that converts a Document into a JSON string.
*/
public static String documentToJsonString(Document doc) {
try {
ObjectNode json = convertDocumentToJson(doc);
return MAPPER.writeValueAsString(json);
} catch (Exception e) {
e.printStackTrace();
return "{}";
}
}
// Example usage.
public static void main(String[] args) {
Metadata metadata = new Metadata();
metadata.set("Author", "John Doe");
metadata.set("Title", "Sample Document");
Document doc = new Document("This is the document body.", metadata);
// Convert to JSON object and print it.
ObjectNode jsonObject = convertDocumentToJson(doc);
System.out.println("JSON Object: " + jsonObject);
// Convert to JSON string and print it.
String jsonString = documentToJsonString(doc);
System.out.println("JSON String: " + jsonString);
}
}
/**
* A simple bean representing a document with a text body and Tika metadata.
*/
class Document {
private String body;
private Metadata metadata;
public Document(String body, Metadata metadata) {
this.body = body;
this.metadata = metadata;
}
public String getBody() {
return body;
}
public Metadata getMetadata() {
return metadata;
}
public void setBody(String body) {
this.body = body;
}
public void setMetadata(Metadata metadata) {
this.metadata = metadata;
}
}
Explanation
-
Document Bean:
TheDocument
class has two members—aString body
and aMetadata metadata
(from Apache Tika). This simulates your input document object. -
JSON Conversion:
TheconvertDocumentToJson
method creates anObjectNode
with two keys:"body"
is set to the document’s text."metadata"
is anotherObjectNode
where each metadata entry is added as a field (using the metadata’s names).
-
Helper Function:
ThedocumentToJsonString
method simply uses Jackson’swriteValueAsString
to turn the JSON object into its string representation. -
Usage:
Themain
method demonstrates creating a sample document and then printing both the JSON object and its string form.
This setup leverages Jackson’s ObjectMapper to create a structured JSON output that follows the style of Jackson’s mapping.