offline import into opendj - plembo/onemoretech GitHub Wiki

Offiline import of bulk data into OpenDJ

August 30, 2016

Offline import is the fastest way to populate an OpenDJ LDAP directory.

The tool to use for this is import-ldif.

Here's a real-world example of how to use the tool, along with the heretofore unrevealed top secret options required to actually make it work with real data. Thanks to my colleague S. Soogur for sharing this with me.

Be sure to stop the OpenDJ directory service before running this command. This is an offline import, after all!

./import-ldif \
 --includeBranch dc=example,dc=com \
 --backendID userRoot \
 --ldifFile ./prod.ldif \
 --rejectFile ./prod-rejects.ldif \
 --skipFile ./prod-skips.ldif \
 --replaceExisting \
 --skipSchemaValidation

Some notes:

  1. Be sure to execute the command as the directory system owner (for example, the user "opendj" who owns all the directory application files). Make sure that user's environment is set correctly. I usually set $JAVA_HOME (the path to the JDK or JRE used by the directory server) and $DSHOME (the directory installation path) and then something like "PATH=$JAVA_HOME/bin:$DSHOME/bin:$PATH".

  2. It's a really good idea to record any rejects and skipped entries when you do one of these imports, especially when working with thousands (or tens of thousands) of entries. Believe me, you'll be grateful later when you have to track one of these down.

  3. Using the --replaceExisting option ensures that the import will overwrite any entries that already exist in the directory. Omitting it will cause the tool to throw errors when it finds pre-existing entries.

  4. The --skipSchemaValidation switch is necessary, even if you have schema-checking disabled in your directory configuration. Why? Because your directory isn't running when you execute an offline import! Like the inevitably of death and taxes, in the real world of directory administration there is no such thing as a schema-violation free directory database.