20130206 capitalization of names - plembo/onemoretech GitHub Wiki

title: Capitalization of names link: https://onemoretech.wordpress.com/2013/02/06/capitalization-of-names/ author: lembobro description: post_id: 4245 created: 2013/02/06 17:55:10 created_gmt: 2013/02/06 21:55:10 comment_status: closed post_name: capitalization-of-names status: publish post_type: post

Capitalization of names

Or rather, the reformatting of name data so they're properly capitalized. Our directory is fully of name data that was formatted during the Dark Ages of computing, IN ALL CAPS. It would be nice to get them into human friendly mixed case. But we all know these things are never as simple as they seem. If you can follow my perl code you'll see where I've got to go with this:

#!/usr/bin/perl 
use strict;
use Text::Capitalize;

my @names = ("LAURA D'AMICO", "Mike Harvey", "PHIL LEMBO",
              "Dan (IT) Smith");

foreach my $name (@names) {

    if($name =~ /'/g) {

        my $mixed = capitalize_title("$name", PRESERVE_WHITESPACE =>1);
        my ($pre, $post) = split("'", $mixed);
        $pre = ucfirst($pre);
        $post = ucfirst($post);
        $mixed = $pre . "'" . $post;
        print $mixed, "n";
    }
    else {
        my $mixed =  capitalize_title("$name", PRESERVE_WHITESPACE =>1 );
        print $mixed, "n";
    }
}

Results:

Laura D'Amico
Mike Harvey
Phil Lembo
Dan (It) Smith

Two problems I've got in my data here. Names with apostrophes, for one thing. Then there are the people who have exercised the right to differentiate themselves by modifying their own name data to include symbols and titles of various kinds. It's that last one that I really need to work on some more. The one good thing about the "capitalize_title" method in Text::Capitalize (apart from the fact that it actually works!) is that it will consistently capitalize each "word" in a "sentence", so multiple words in a name value don't phase it the way they do "ucfirst". Building the module from source is not a trivial task, it requires some dependencies that score pretty low for build success themselves (see test results). END;

Copyright 2004-2019 Phil Lembo