20100319 regex for stripping all non ascii characters - plembo/onemoretech GitHub Wiki

title: Regex for stripping all non-ASCII characters link: https://onemoretech.wordpress.com/2010/03/19/regex-for-stripping-all-non-ascii-characters/ author: lembobro description: post_id: 175 created: 2010/03/19 14:16:33 created_gmt: 2010/03/19 14:16:33 comment_status: open post_name: regex-for-stripping-all-non-ascii-characters status: publish post_type: post

Regex for stripping all non-ASCII characters

NOTE: If you want to filter telephone number data, take a look at this newer post. This is important for me because I see a lot of creating formatting in directory entries, especially telephone number fields, that comes back to haunt me when exporting data for reports. Of course it would be a lot better if the Netscape family of directory servers strictly enforced the schema when it came to phone number values, but unfortunately that train has left the station. So here it is, a perl regex to strip any non-ASCII characters from an attribute value. I'll use a cell phone number as an example.

my $mobile = $entry->get_value('mobile');
for($mobile) {
     s/[^x20-x7Ex0Ax0D]/ /g;
}

Note that in this example I replace any of the offending characters with white space. Thanks to the indomitable Mark Smith for this one.

Copyright 2004-2019 Phil Lembo