Regex Notes - edorlando07/datasciencecoursera GitHub Wiki
1234
\d+
Finds any length of digits. The Regex code is a match.
1234 567890
^\d+$
The ^ indicates start of line. The $ indicates end of a line. The Regex code is not a match since the whole text needs to include digits.
1234567890
^\d+$
The Regex code is a match.
ABCD1234567890
^\d+$
The ^ indicates start of line. The $ indicates end of a line. The Regex code is not a match since the whole text needs to include digits.
1234
\b\d+\b
\b indicates the start or the end of a word boundary. The Regex code is a match.
1234 567890
\b\d+\b
\b indicates the start or the end of a word boundary. The Regex code is a match for 1234 and for 567890, independently.
ABCD1234
\b\d+\b
\b indicates the start or the end of a word boundary. The Regex code is a not match since numbers and letters are a part of the same string.
NY Postal Codes are 10001, 10002, 10003, 10004
\b\d+\b
\b indicates the start or the end of a word boundary. The Regex code is a match for 10001, 10002, 10003, 10004, independently.
Timestamp=20160502
\d{8}
{8} indicates the number of text or numerical digits to find. The Regex code is a match for 20160502.
Timestamp=20160502
\d{4}\d{2}\d{2}
{n} indicates the number of text or numerical digits to find. The Regex code is a match for 20160502.
Timestamp=20160502
(\d{4})(\d{2})(\d{2})
{n} indicates the number of text or numerical digits to find.
() Indicate independent groups of patters.
The Regex code is a match for 2016 05 02.
Timestamp=20160502
(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})
{n} indicates the number of text or numerical digits to find.
() Indicate independent groups of patters.
?P allows you to name groups so they are identified as year, month, day
The Regex code is a match for 2016 05 02.
2016 = year, 05 = month, 02 = day.
Widget Unit cost: 12,000.56 dollars
Taxes: 234.00 dollars
Total: 12,234.56 dollars
(?P<value>\d+(,\d{3})*(\.\d{2})?)\s+dollar(s)?
value for 1st line is 12,000.56
value for 2nd line is 234.00
value for 3rd line is 12,234.56
Here is the list...1.soccer 2. tennis 3.basketball 4. cricket
\d+\.\s*
s* equals white space
The split would provide the following:
Here is the list...
soccer
tennis
basketball
cricket
this is a test
(?i)A
The (?i) will allow the Regex to find all As regardless of case
this is the biggest test
b|i|g
[big]
Both these codes will find all Bs, Is, and Gs, independently.
this ^ is a big test
[^aeiou ]
Carrot symbol needs to be at beginning to be a negator.
this is a definitive test
[a-d]
x-ray 3 won't for this test
[a-dx-z0-3]
x-ray 3 won't for this test
[^a-dx-z0-3]
Use ^ as negator
this. is. a. test
\.
[.]
this is a test
\t
catalog of log
\blog\b
matches only the word log
apple grows on apple trees
^apple
apple 1 grows on apple trees
apple 2 grows on apple trees
(?m)^apple
Need to turn on multi-line mode comparison
apple grows on apple
apple$
apple grows on apple
apple grows on apple
(?m)apple$
apple grows on aPple
apple grows on appLE
(?mi)apple$
ABCD 123456789
[0-9]
OR
\d
shortcut for decimal digit (matches UNICODE digits in all languages)
ABCD 123456789
\D
ABCD 123456789
\w
ABCD 123456789
\W
One tab _ )
Two Three
\s
a
a2
a345
678
[a-z][0-9]
Finds a2 and a3
a
a2
a345
678
[a-z][0-9]*
Finds a, a2 and a345
a
a2
a345
678
[a-z][0-9]+
Finds a2 and a345
a
a2
a345
678
[a-z][0-9]?
Finds a, a2 and a345
a
a2
a345
678
ab1
ab234
[a-z]{2}[0-9]?
Finds ab1, ab234
a
a2
a345
678
ab1
ab234
abc56
abcd678
abcde789
[a-z]{2,4}[0-9]?
Finds ab234, abc56, abcd678, bcde789
123
abcd
1234
\b(\d{3}|[a-z]{4})\b
matches any 3 digit text or 4 character text only.