notes on tika reply to metadata - krickert/search-api GitHub Wiki

import org.apache.tika.metadata.Metadata;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.util.Arrays;
import java.util.stream.Collectors;

public class MetadataToJsonConverter {
    private static final ObjectMapper MAPPER = new ObjectMapper();

    public static String convertMetadataToJson(Metadata metadata) {
        var map = Arrays.stream(metadata.names())
                        .collect(Collectors.toMap(name -> name, metadata::get));
        try {
            return MAPPER.writeValueAsString(map);
        } catch (Exception e) {
            e.printStackTrace();
            return "{}";
        }
    }

    public static void main(String[] args) {
        var metadata = new Metadata();
        metadata.set("Author", "John Doe");
        metadata.set("Title", "Sample Document");
        System.out.println("Metadata in JSON: " + convertMetadataToJson(metadata));
    }
}

We can improve this:

Below is a full example where we define a simple Document bean with two members—a String for the body and a Tika Metadata object—and then provide two functions: one that converts the Document to a Jackson JSON object (an ObjectNode) and another helper that returns the JSON as a String.

import org.apache.tika.metadata.Metadata;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;
import java.util.Arrays;

public class DocumentJsonConverter {
    private static final ObjectMapper MAPPER = new ObjectMapper();

    /**
     * Converts a Document into a Jackson ObjectNode.
     * The resulting JSON object has two keys: "body" and "metadata".
     */
    public static ObjectNode convertDocumentToJson(Document doc) {
        ObjectNode root = MAPPER.createObjectNode();
        root.put("body", doc.getBody());
        ObjectNode metaNode = MAPPER.createObjectNode();
        // Convert each metadata entry into a JSON field.
        Arrays.stream(doc.getMetadata().names())
              .forEach(name -> metaNode.put(name, doc.getMetadata().get(name)));
        root.set("metadata", metaNode);
        return root;
    }

    /**
     * Helper method that converts a Document into a JSON string.
     */
    public static String documentToJsonString(Document doc) {
        try {
            ObjectNode json = convertDocumentToJson(doc);
            return MAPPER.writeValueAsString(json);
        } catch (Exception e) {
            e.printStackTrace();
            return "{}";
        }
    }

    // Example usage.
    public static void main(String[] args) {
        Metadata metadata = new Metadata();
        metadata.set("Author", "John Doe");
        metadata.set("Title", "Sample Document");

        Document doc = new Document("This is the document body.", metadata);
        
        // Convert to JSON object and print it.
        ObjectNode jsonObject = convertDocumentToJson(doc);
        System.out.println("JSON Object: " + jsonObject);
        
        // Convert to JSON string and print it.
        String jsonString = documentToJsonString(doc);
        System.out.println("JSON String: " + jsonString);
    }
}

/**
 * A simple bean representing a document with a text body and Tika metadata.
 */
class Document {
    private String body;
    private Metadata metadata;

    public Document(String body, Metadata metadata) {
        this.body = body;
        this.metadata = metadata;
    }

    public String getBody() {
        return body;
    }

    public Metadata getMetadata() {
        return metadata;
    }

    public void setBody(String body) {
        this.body = body;
    }

    public void setMetadata(Metadata metadata) {
        this.metadata = metadata;
    }
}

Explanation

  • Document Bean:
    The Document class has two members—a String body and a Metadata metadata (from Apache Tika). This simulates your input document object.

  • JSON Conversion:
    The convertDocumentToJson method creates an ObjectNode with two keys:

    • "body" is set to the document’s text.
    • "metadata" is another ObjectNode where each metadata entry is added as a field (using the metadata’s names).
  • Helper Function:
    The documentToJsonString method simply uses Jackson’s writeValueAsString to turn the JSON object into its string representation.

  • Usage:
    The main method demonstrates creating a sample document and then printing both the JSON object and its string form.

This setup leverages Jackson’s ObjectMapper to create a structured JSON output that follows the style of Jackson’s mapping.