”Regular expressions”

”Regular expressions” (for short, I henceforth use ”RE”)…
That’s a strange expression :-) . But what is it really about?

The use of RE is to find and extract text/data information, mostly from a web form, check a Excel cell, lookup in a database or scan through a text list. The technology behind RE is really fascinating.

I had myself by accident in the middle of the 80s a need to search for text on a cassette tape :-) , a very long time ago… And invented a simple sort of RE.

I had no idea at the time that there already existed an established way about how to implement it. The syntax is mostly standardized now among different languages.

Let’s look at an example; a bank account is showing the following amounts:

-32432
4543
3346
-626
9463
-56.99
25 –Sum
1

Maybe you want to only see the negative numbers in the list. The RE for this might be:

”-”

which catch, or grep :-x (I say ”grep”, because it’s a very common command in Linux/OsX terminal…)

-32432
-626
-56.99
25 –Sum
-36

But that ”Sum” line looks a little ”off”. We might want to filter that away with

RE = ”^-” :

-32432
-626
-56.99
-36

The ”^” tells the computer that we only want ”-” at the absolute beginning to be listed.
We could extend this to only show rounded number, and skip the -5.99 number with

RE = ”^-[0-9]+$” :

-32432
-626
-36

”^-[0-9]+$” …!?

1. ”^” at the very beginning is used to catch all lines starting with ”-”,
2. ”^-” .  Allow to follow with only the numbers ”0”-”9”,
3. ”^-[0-9]” .

But we are not done there. This would also catch ”-56.99”, but we only want rounded numbers, no ”.” decimal point,

4. The ”+” sign means at least one of ”0”-”9”, i.e. ”[0-9]+”, so we get

RE = ”^-[0-9]+” .

5. End of the line is marked by the ”$” character.

So we get

RE = ”^-[0-9]+$” :

-32432
-626
-36

Samples list (may not work in all environments, because not 100% compatible RE engines everywhere):

Positive Integers ^\d+$

Negative Integers ^-\d+$

Integer ^-?\d+$

Positive Number ^\d*\.?\d+$

Negative Number ^-\d*\.?\d+$

Positive Number or Negative Number ^-?\d*\.?\d+$

Phone number ^\+?[\d\s]{3,}$

Phone with code ^\+?[\d\s]+\(?[\d\s]{10,}$

Year 1900-2099 ^(19|20)\d{2}$

Date (dd mm yyyy, d/m/yyyy, etc.)

(In some ”dialects” of RE you can use ”\d” instead of ”[0-9]” which means less typing…)

Site for trying out regular expressions for yourself:

Try regular expressions

Have fun!

Be First to Comment

Kommentera

E-postadressen publiceras inte. Obligatoriska fält är märkta *