Notes on protobuf conversion technique - krickert/search-api GitHub Wiki
Below is an example of Java code that:
- Uses Jackson to parse a JSON object and infer data types via its tree model (JsonNode).
- Recursively flattens the JSON structure using a custom delimiter convention (
__
) so that nested maps become keys like"key__childKey__grandChildKey"
. - When it encounters an array (including a “map of a list of maps”), it flattens each element—if the array elements are objects, each gets flattened to a map, and then the flattened objects are collected into a List.
- Populates a SolrInputDocument (which is a writeable version of a Solr document) with the flattened key/value pairs.
The code handles primitive values, nested maps, arrays of primitives, and arrays of objects.
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.solr.common.SolrInputDocument;
import java.util.*;
public class JsonToSolrDocument {
/**
* Recursively flattens a JSON tree into a map using the specified delimiter convention.
*
* @param prefix The current compound key prefix (empty for the root)
* @param node The current JSON node being processed
* @param flatMap The map where flattened key/value pairs are accumulated
*/
public static void flattenJson(String prefix, JsonNode node, Map<String, Object> flatMap) {
if (node.isObject()) {
// Process each field in the object node.
Iterator<Map.Entry<String, JsonNode>> fields = node.fields();
while (fields.hasNext()) {
Map.Entry<String, JsonNode> entry = fields.next();
// Build compound key using __ as a delimiter.
String newPrefix = prefix.isEmpty() ? entry.getKey() : prefix + "__" + entry.getKey();
flattenJson(newPrefix, entry.getValue(), flatMap);
}
} else if (node.isArray()) {
// When encountering an array, iterate over its elements.
List<Object> list = new ArrayList<>();
for (JsonNode element : node) {
if (element.isObject()) {
// If element is an object, flatten it into a map.
Map<String, Object> subMap = new HashMap<>();
flattenJson("", element, subMap);
list.add(subMap);
} else if (element.isArray()) {
// For nested arrays, you could choose to either recursively flatten or store as a string.
list.add(element.toString());
} else if (element.isValueNode()) {
// For primitive values, simply add their text representation.
list.add(element.asText());
}
}
// Store the list under the current prefix.
flatMap.put(prefix, list);
} else if (node.isValueNode()) {
// For a primitive value, just put the value into the map.
flatMap.put(prefix, node.asText());
}
}
/**
* Converts a JSON string into a SolrInputDocument by flattening its structure.
*
* @param jsonString the JSON content as a string
* @return a SolrInputDocument populated with the flattened key/value pairs
* @throws Exception if parsing fails
*/
public static SolrInputDocument convertJsonToSolrDocument(String jsonString) throws Exception {
ObjectMapper mapper = new ObjectMapper();
JsonNode root = mapper.readTree(jsonString);
Map<String, Object> flatMap = new HashMap<>();
flattenJson("", root, flatMap);
SolrInputDocument doc = new SolrInputDocument();
for (Map.Entry<String, Object> entry : flatMap.entrySet()) {
doc.addField(entry.getKey(), entry.getValue());
}
return doc;
}
public static void main(String[] args) throws Exception {
// Example JSON with nested objects and an array of objects
String json = "{" +
" \"id\": \"123\"," +
" \"info\": {" +
" \"name\": \"test\"," +
" \"values\": [" +
" {\"a\": 1, \"b\": 2}," +
" {\"a\": 3, \"b\": 4}" +
" ]" +
" }," +
" \"tags\": [\"alpha\", \"beta\"]" +
"}";
SolrInputDocument solrDoc = convertJsonToSolrDocument(json);
System.out.println("Flattened Solr Document: " + solrDoc);
}
}
-
Jackson Parsing & Type Inference:
TheObjectMapper
reads the JSON string into aJsonNode
tree, which naturally reflects the underlying data types. -
Flattening Logic:
- If the node is an object, the method iterates over its fields and concatenates keys using
"__"
. - If the node is an array, it builds a list. For arrays containing objects (maps), each element is flattened into its own map.
- Primitive nodes (text, numbers, booleans) are added as string representations.
- If the node is an object, the method iterates over its fields and concatenates keys using
-
Solr Document Population:
The flattened map is iterated, and each key/value pair is added to aSolrInputDocument
via itsaddField()
method.
This approach lets you take arbitrarily nested JSON (including maps that contain lists of maps) and convert it into a flat document suitable for Solr indexing.
import com.google.gson.Gson;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import com.google.protobuf.util.JsonFormat;
import org.apache.solr.common.SolrInputDocument;
import java.util.*;
public class ProtoToSolrInputDocument {
private static final Gson GSON = new Gson();
public static SolrInputDocument convertProtobufToSolrDoc(com.example.MyProtoMessage protoMessage) throws Exception {
// Convert Protobuf to JSON string
String jsonString = JsonFormat.printer()
.omittingInsignificantWhitespace() // Remove extra formatting
.preservingProtoFieldNames(false) // Use camelCase field names
.print(protoMessage);
// Convert JSON string to Map
JsonElement jsonElement = JsonParser.parseString(jsonString);
JsonObject jsonObject = jsonElement.getAsJsonObject();
Map<String, Object> jsonMap = GSON.fromJson(jsonObject, Map.class);
// Create SolrInputDocument
SolrInputDocument solrDoc = new SolrInputDocument();
// Process fields and add to SolrInputDocument
for (Map.Entry<String, Object> entry : jsonMap.entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();
// Convert List<Map<String, Object>> into List<String> (Solr-compatible)
if (value instanceof List) {
List<?> list = (List<?>) value;
if (!list.isEmpty() && list.get(0) instanceof Map) {
List<String> jsonList = new ArrayList<>();
for (Object mapObj : list) {
jsonList.add(GSON.toJson(mapObj)); // Convert each map to a JSON string
}
solrDoc.addField(key, jsonList);
} else {
solrDoc.addField(key, list);
}
} else {
solrDoc.addField(key, value);
}
}
return solrDoc;
}
public static void main(String[] args) throws Exception {
// Example Protobuf message
com.example.MyProtoMessage protoMessage = com.example.MyProtoMessage.newBuilder()
.setId("123")
.setName("Test Document")
.addDetails(com.example.MyProtoMessage.Detail.newBuilder()
.putAttributes("key1", "value1")
.putAttributes("key2", "value2"))
.addDetails(com.example.MyProtoMessage.Detail.newBuilder()
.putAttributes("keyA", "valueA"))
.build();
// Convert and print SolrInputDocument
SolrInputDocument solrDoc = convertProtobufToSolrDoc(protoMessage);
System.out.println(solrDoc);
}
}