20070820 using a regex to filter a string value - plembo/onemoretech GitHub Wiki
title: Using a Regex to Filter a String Value link: https://onemoretech.wordpress.com/2007/08/20/using-a-regex-to-filter-a-string-value/ author: lembobro description: post_id: 660 created: 2007/08/20 20:57:00 created_gmt: 2007/08/20 20:57:00 comment_status: open post_name: using-a-regex-to-filter-a-string-value status: publish post_type: post
Using a Regex to Filter a String Value
Recently I had to filter a user ID value to clean it up. The problem was that whitespace, tabs and all kinds of other non-visible stuff was getting into the data as a result of poor validation on the application used to do the data entry. This was preventing us from reading the “letter followed by 6 digits” ID (e.g. “Z123456”) as required. I used a regular expression match to get the job done. Here’s my code (the variable $uid has already received the raw value):
`
for ($uid) {
m/([A-Z]d{6})/i;
$uid = $1;
}
`
Notice the use of the [A-Z]
character class for the leading letter and the d
metacharacter for digits, along with the range qualifier {n}
on the number of digits. The /i
tells the regex engine to make a case-insensitive match.
To filter out all non-digit characters in a less elegant way, you can use a simple search and replace operation, like this:
`
for($string) {
s/D//g;
}
`
Which will result in $string
containing only digits. This is the method I now use for filtering out formatting characters (as well as stray whitespace, tabs and other such annoyances) from telephone numbers. In Perl, the D
metacharacter indicates all non-digit characters, while it’s little brother, d
, represents only a digit character. The /g
indicates the regex engine should do a “greedy” match. In such a match operation, the regex engine doesn’t stop with the first matched character, but keeps going to the end of the string until it has found every matching characters.
Additional Note: Just today I found I needed a way to efficiently filter out all but alphanumeric values in an attribute. This was easy, thanks to Perl’s W
metacharacter. Here’s how I used it:
`
for($uid) {
s/W//g;
}
`
The result is that anything but [A-z] or [0-9] gets nuked. Which is A Good Thing ™ when you’re talking about user ID values.
Copyright 2004-2019 Phil Lembo