20081009 finding duplicate entries in an ldap directory - plembo/onemoretech GitHub Wiki

title: Finding duplicate entries in an LDAP directory link: https://onemoretech.wordpress.com/2008/10/09/finding-duplicate-entries-in-an-ldap-directory/ author: lembobro description: post_id: 447 created: 2008/10/09 19:48:40 created_gmt: 2008/10/09 19:48:40 comment_status: open post_name: finding-duplicate-entries-in-an-ldap-directory status: publish post_type: post

Finding duplicate entries in an LDAP directory

To check a directory server you can’t guarantee doesn’t have duplicate entries, you first have to have something that will readily distinguish one entry from another. In my case, and for many others, this turns out to be the value for the LDAP uid attribute.

Being able to ferret out duplicates on your directory can be very useful in those times when the advantages of a hierarchical data store aren’t appreciated by your developers (or your Single Sign-On infrastructure — most assume the uniqueness of the user name or ID).

This is why you should insist on deploying an attribute uniqueness constraint like the uid uniqueness preoperation plugin (known in OpenLDAP as the Attribute Uniqueness Overlay) in all your directory environments. What rules to apply in coming up with a properly unique uid is a discussion for another day.

For now, here’s my code. In this case my criteria is an 8 character uid value, which if it contains at least one letter should be more than sufficient for all but the largest companies over a long period of time. Not a solution for Google or Yahoo, but fine for the rest of us.

The structure of this script assumes an external config file with variables defined like:

$ldapHost = "ldap.example.com";

And ending with a

1;

on the last line.

It also shells out to the Unix [uniq](http://www.linuxmanpages.com/man1/uniq.1.php) command (see the cited man page for details on the switches used). There are probably a ways to do this in pure Perl, but I was in too much of a hurry to research them.

As usual, “” at the end of a line indicates it should continue, but for the break I was forced to use to fit it inside the margins of this page.

#!/usr/bin/perl
# Find duplicate uids on LDAP
use strict;
use Net::LDAP;
use Net::LDAP::Entry;
use Net::LDAP::LDIF;
use File::Sort qw(sort_file);
use utf8;
#
our($ldapHost,$ldapUsr,$ldapPass);
my $HOME = $ENV{'HOME'};
my $APPS = "$HOME/bin";
require "$HOME/etc/ldap.conf";
#
my $ldifdata = "$HOME/data/testdata.ldif";
my $csvdata = "$HOME/data/testdata.csv";
my $dupfile = "$HOME/data/testduprept.csv";
my $basedn = "dc=example,dc=com";
my $attrs = "uid cn";
my $query = "(objectclass=inetorgperson)";
#
get_ldap();
make_rept();
check_sort();
#
sub get_ldap {
#
 system("$APPS/ldapsearch -LLL -x -h $dirHost -D "$dirUsr"
 -w $dirPass -b "$basedn" -s sub "$query" $attrs >$ldifdata");
#
}
#
sub make_rept {
#
 my $ldif = Net::LDAP::LDIF->new("$ldifdata", 'r') or die $!;
 open FH, ">$csvdata" or die $!;
#
 print FH "UID,LDAPDNn";
#
 while(not $ldif->eof() ) {
#
     my $entry = $ldif->read_entry();
     my $dn = $entry->dn;
     my $uid = $entry->get_value('uid');
     print FH "$uid,$dnn";
 #
 }
#
 close FH;
 $ldif->done;
#
}
#
sub check_sort {
#
 system("/usr/bin/uniq -w 8 -d $csvdata $dupfile");
#
}
#
 __END__;

There. Doesn’t that feel better now?