20090708 splitting an ldif - plembo/onemoretech GitHub Wiki

title: Splitting an LDIF link: https://onemoretech.wordpress.com/2009/07/08/splitting-an-ldif/ author: lembobro description: post_id: 288 created: 2009/07/08 18:51:59 created_gmt: 2009/07/08 18:51:59 comment_status: open post_name: splitting-an-ldif status: publish post_type: post

Splitting an LDIF

Not as difficult as splitting an atom, but then most of us really don’t have any use for thermonuclear weapons. We have people to take care of that for us.

As usual, being able to do this trick was borne of necessity. I had a really big (6,000 entry) LDIF file that I needed to break into smaller pieces so the changes could get applied slowly, over time, to an Active Directory environment.

What I settled on was something that broke out a new file every 200 entries.

Also Got to use perl’s modulus function for the first time. Very cool.

Not bad for a liberal arts major.

Here’s the code:

#!/usr/bin/perl -w
# ldifsplit.pl Splits up an LDIF file by a set number of entries and write to
# separate, smaller files.
# Example below will split every 200 entries.
# Created 07/08/2009 by P Lembo
	
use strict;
use Net::LDAP;
use Net::LDAP::Entry;
use Custom::Net::LDAP::LDIF;
	
my $HOME = $ENV{'HOME'};
my $inldif = "$HOME/ad.ldif";
my $outldif = "$HOME/ad.0";
open FH, ">$outldif" or die $!;
close FH;
my $ldif = Custom::Net::LDAP::LDIF->new($inldif, 'r') or die $!;
my $count =0;
my $fileno =1;
my $divisor =200;
	
while (not $ldif->eof() ) {
	my $entry = $ldif->read_entry();
	$count++;
	if ($count % $divisor ==0) {
		my $filename = "ad" . "." . $fileno;
		my $filepath = "$HOME/";
		$outldif = $filepath . "/" . $filename;
		print $outldif, "n";
		open FH, ">$outldif" or die $!;
		print FH $ldif->current_lines($entry), "n";
		close FH;
		$fileno++;
	}
	else {
		open FH, ">>$outldif" or die $!;
		print FH $ldif->current_lines($entry), "n";
		close FH;
	}
}
print "Total entries: $countn";
close FH;
$ldif->done;
__END__;

The heart of the matter here is the “if ($count % $divisor ==0) ” condition, where “$divisor” becomes the modulus (% is the modulus operator) against which the current item count (in this case the number of LDAP entries in an LDIF, recorded in $count) is tested. The modulus in this example is 200.

A separate counter, $fileno, is kept for creating new file names. It gets incremented after each new file is created.

Basically the script will begin by using the code in the “else” block to write into the original file (here, “ad.0”). Once it reaches entry 200 (the first one divisible by the modulus), it creates a new file (”ad.1”)and starts writing to it.

This new file name is used from that point forward until the next entry number divisible by 200, in this case, entry 400, whereupon a new file is created using the previously incremented $fileno value.

Note it took a little trial and error to get the file naming this right. The method used here, breaking out the file path and name into separate elements helped me keep everything straight. I’m sure there’s a more efficient way to do it.