Notes on protobuf conversion technique - krickert/search-api GitHub Wiki

Below is an example of Java code that:

  1. Uses Jackson to parse a JSON object and infer data types via its tree model (JsonNode).
  2. Recursively flattens the JSON structure using a custom delimiter convention (__) so that nested maps become keys like "key__childKey__grandChildKey".
  3. When it encounters an array (including a “map of a list of maps”), it flattens each element—if the array elements are objects, each gets flattened to a map, and then the flattened objects are collected into a List.
  4. Populates a SolrInputDocument (which is a writeable version of a Solr document) with the flattened key/value pairs.

The code handles primitive values, nested maps, arrays of primitives, and arrays of objects.

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.solr.common.SolrInputDocument;

import java.util.*;

public class JsonToSolrDocument {

    /**
     * Recursively flattens a JSON tree into a map using the specified delimiter convention.
     *
     * @param prefix  The current compound key prefix (empty for the root)
     * @param node    The current JSON node being processed
     * @param flatMap The map where flattened key/value pairs are accumulated
     */
    public static void flattenJson(String prefix, JsonNode node, Map<String, Object> flatMap) {
        if (node.isObject()) {
            // Process each field in the object node.
            Iterator<Map.Entry<String, JsonNode>> fields = node.fields();
            while (fields.hasNext()) {
                Map.Entry<String, JsonNode> entry = fields.next();
                // Build compound key using __ as a delimiter.
                String newPrefix = prefix.isEmpty() ? entry.getKey() : prefix + "__" + entry.getKey();
                flattenJson(newPrefix, entry.getValue(), flatMap);
            }
        } else if (node.isArray()) {
            // When encountering an array, iterate over its elements.
            List<Object> list = new ArrayList<>();
            for (JsonNode element : node) {
                if (element.isObject()) {
                    // If element is an object, flatten it into a map.
                    Map<String, Object> subMap = new HashMap<>();
                    flattenJson("", element, subMap);
                    list.add(subMap);
                } else if (element.isArray()) {
                    // For nested arrays, you could choose to either recursively flatten or store as a string.
                    list.add(element.toString());
                } else if (element.isValueNode()) {
                    // For primitive values, simply add their text representation.
                    list.add(element.asText());
                }
            }
            // Store the list under the current prefix.
            flatMap.put(prefix, list);
        } else if (node.isValueNode()) {
            // For a primitive value, just put the value into the map.
            flatMap.put(prefix, node.asText());
        }
    }

    /**
     * Converts a JSON string into a SolrInputDocument by flattening its structure.
     *
     * @param jsonString the JSON content as a string
     * @return a SolrInputDocument populated with the flattened key/value pairs
     * @throws Exception if parsing fails
     */
    public static SolrInputDocument convertJsonToSolrDocument(String jsonString) throws Exception {
        ObjectMapper mapper = new ObjectMapper();
        JsonNode root = mapper.readTree(jsonString);
        Map<String, Object> flatMap = new HashMap<>();
        flattenJson("", root, flatMap);

        SolrInputDocument doc = new SolrInputDocument();
        for (Map.Entry<String, Object> entry : flatMap.entrySet()) {
            doc.addField(entry.getKey(), entry.getValue());
        }
        return doc;
    }

    public static void main(String[] args) throws Exception {
        // Example JSON with nested objects and an array of objects
        String json = "{" +
                      "  \"id\": \"123\"," +
                      "  \"info\": {" +
                      "    \"name\": \"test\"," +
                      "    \"values\": [" +
                      "      {\"a\": 1, \"b\": 2}," +
                      "      {\"a\": 3, \"b\": 4}" +
                      "    ]" +
                      "  }," +
                      "  \"tags\": [\"alpha\", \"beta\"]" +
                      "}";
        
        SolrInputDocument solrDoc = convertJsonToSolrDocument(json);
        System.out.println("Flattened Solr Document: " + solrDoc);
    }
}

How It Works

  • Jackson Parsing & Type Inference:
    The ObjectMapper reads the JSON string into a JsonNode tree, which naturally reflects the underlying data types.

  • Flattening Logic:

    • If the node is an object, the method iterates over its fields and concatenates keys using "__".
    • If the node is an array, it builds a list. For arrays containing objects (maps), each element is flattened into its own map.
    • Primitive nodes (text, numbers, booleans) are added as string representations.
  • Solr Document Population:
    The flattened map is iterated, and each key/value pair is added to a SolrInputDocument via its addField() method.

This approach lets you take arbitrarily nested JSON (including maps that contain lists of maps) and convert it into a flat document suitable for Solr indexing.

import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import com.google.protobuf.util.JsonFormat;
import org.apache.solr.common.SolrInputDocument;
import java.util.*;

public class ProtoToSolrInputDocument {
    private static final Gson GSON = new Gson();

    public static SolrInputDocument convertProtobufToSolrDoc(com.example.MyProtoMessage protoMessage) throws Exception {
        // Convert Protobuf to JSON string
        String jsonString = JsonFormat.printer()
                .omittingInsignificantWhitespace() // Remove extra formatting
                .preservingProtoFieldNames(false) // Use camelCase field names
                .print(protoMessage);

        // Convert JSON string to Map
        JsonElement jsonElement = JsonParser.parseString(jsonString);
        JsonObject jsonObject = jsonElement.getAsJsonObject();
        Map<String, Object> jsonMap = GSON.fromJson(jsonObject, Map.class);

        // Create SolrInputDocument
        SolrInputDocument solrDoc = new SolrInputDocument();

        // Process fields and add to SolrInputDocument
        for (Map.Entry<String, Object> entry : jsonMap.entrySet()) {
            String key = entry.getKey();
            Object value = entry.getValue();

            // Convert List<Map<String, Object>> into List<String> (Solr-compatible)
            if (value instanceof List) {
                List<?> list = (List<?>) value;
                if (!list.isEmpty() && list.get(0) instanceof Map) {
                    List<String> jsonList = new ArrayList<>();
                    for (Object mapObj : list) {
                        jsonList.add(GSON.toJson(mapObj)); // Convert each map to a JSON string
                    }
                    solrDoc.addField(key, jsonList);
                } else {
                    solrDoc.addField(key, list);
                }
            } else {
                solrDoc.addField(key, value);
            }
        }

        return solrDoc;
    }

    public static void main(String[] args) throws Exception {
        // Example Protobuf message
        com.example.MyProtoMessage protoMessage = com.example.MyProtoMessage.newBuilder()
                .setId("123")
                .setName("Test Document")
                .addDetails(com.example.MyProtoMessage.Detail.newBuilder()
                        .putAttributes("key1", "value1")
                        .putAttributes("key2", "value2"))
                .addDetails(com.example.MyProtoMessage.Detail.newBuilder()
                        .putAttributes("keyA", "valueA"))
                .build();

        // Convert and print SolrInputDocument
        SolrInputDocument solrDoc = convertProtobufToSolrDoc(protoMessage);
        System.out.println(solrDoc);
    }
}
⚠️ **GitHub.com Fallback** ⚠️