20070719 when only direct db access will do - plembo/onemoretech GitHub Wiki

title: When only direct db access will do link: https://onemoretech.wordpress.com/2007/07/19/when-only-direct-db-access-will-do/ author: lembobro description: post_id: 672 created: 2007/07/19 00:26:00 created_gmt: 2007/07/19 00:26:00 comment_status: open post_name: when-only-direct-db-access-will-do status: publish post_type: post

When only direct db access will do

Over the years I’ve had a few “close calls” where a corrupted directory db was brought back from the abyss. Had one of those the last couple of days, when one of our newly installed Oracle Internet Directory (OID) instances started misbehaving after I screwed up the acl (access control list) while trying to use the aci (access control instruction) “wizard” in oidadmin to create a new set of permissions.

In seven years of working with LDAP directories. I’ve come to view the GUI administration consoles that ship with commercial products with deep suspicion. Ditto for the “automatic” configuration routines in almost all 3rd party apps that claim to be “LDAP compliant”. In this case I should have known better, but the whole purpose of the work I’ve been doing the last few months is to learn as much as I can about the new application stack so I can provide useful input on how we go forward.

Anyway, the problem we (Oracle Support and I) were faced with was a cn=Users container that could no longer be searched, even by cn=orcladmin. Now “cn=orcladmin” is the root account for OID, just like “cn=Directory Manager” for the Netscape family of directories (Netscape, iPlanet, Sun, RedHat), or OpenLDAP’s “cn=Manager”. Of all the login accounts on the directory, this is the one that is supposed to not be subject to access controls. In our case, every time we tried to search the container over LDAP (using ldapsearch) the directory process would die and get restarted by Oracle’s monitor process.

After going through a series of diagnostic procedures (stopping and starting different pieces of the stack and then reading the logs), it was finally determined that the way to deal with this was to use Oracle’s “bulk” utilities to access the underlying database and restore it to health.

We first issued an “opmnctl shutdown” to kill all running processes, and then used ldifwrite to dump the entire cn=Users container directly from the datatabase to an LDIF file. Then we used bulkdelete to remove the container, and all it’s subentries, from the database. I then edited the LDIF file to remove the corrupt aci from the container base. Finally, bulkload was used to re-load the corrected LDIF back into the database. During this process the data was reindexed and subjected to rigorous checking.

A “opmnctl startall” brought the directory and it’s associated processes back on line, and after a few simple tests using ldapsearch and ldapmodify we were satisfied all was well again.

Here’s the syntax for the commands used during this recovery (during each operation you will be prompted for the root directory user (cn=orcladmin) password, which should be the same as that for the ODS database user):

[In these examples my directory root is “dc=mycompany,dc=com”, “testinf” is the $ORACLE_SID for the infrastructure database where directory data is stored and “mydata.ldif” is my LDIF data file. Where “” appears at the end of a line, it is a “continuation character” indicating that the line should not be broken by a carriage return]

Dumping the container data from the database
$ORACLE_HOME/ldap/bin/ldifwrite connect=testinf basedn="cn=Users, dc=mycompany,dc=com" ldiffile=/tmp/mydata.ldif>

Removing the container from the database`

$ORACLE_HOME/ldap/bin/bulkdelete connect=testinf basedn="cn=Users,
dc=mycompany,dc=com"`

Reloading the LDIF data directly into the database $ORACLE_HOME/ldap/bin/bulkload connect=testinf check="TRUE" generate="TRUE" append="TRUE" file=/tmp/mydata.ldif

Lest it appear that Oracle is alone in making such lifesaving “direct to database” methods available, Sun/RedHat and OpenLDAP all have similar tools used to directly manipulated their underlying BerkeleyDB hash table managers (BerkeleyDB is now, ironically, owned by Oracle).

For OpenLDAP these are slapcat for dumping data from the db, and slapadd to load data directly into the db. With OpenLDAP there is no way to directly delete just a portion of the directory tree. You’ve got to dump and reload the whole thing (making whatever changes you need to the LDIF file in between).

Sun/RedHat have a set of command line scripts, db2ldif and ldif2db that you to dump and reload data from the db, but still require you to in essence reload everything if you want to clean it up. For practical purposes you’d want to do this on a Netscape family directory anyway, to make sure all indexes are regenerated correctly (not always a given with their architecture).