RegEx : The Right Way |Tutorial 1

Regular Expressions or RegEx is a sequence of characters that define a search pattern. Regex is every where these days and you can use it to extract information from Text files, Log files, Dictionaries, Spread sheets and webpages. Every major programming language has support for Regular Expressions. Most importantly grep, awk and sed use regex to find/replace matches.

Regular Expressions can help you save a lot of time. Instead of writing complex String pattern searches which span over multiple lines, regex gets the job done really easily and really fast.
Let's look at a simple scenario where you might want to use Regex. Let's say you have a String and you want to check if it is a Website URL. So, here are a few conditions that a URL should satisfy

Should have http:// , https://
May or may not have www.
Should have a .com , .org or something similar
Can have characters, digits, underscores etc.,
Might even have some sort of port numbers at the end http://google.com:80/

So, matching all these individually in any Programming language with various String parsing conditions can be a really challenging task. But, using RegEx the same thing can be achieved much easier and much faster.

That sounds like fun,doesn't it? . Well, Lets get started.

If you have used Linux shell/terminal before you probably would have used Regular Expressions already. Bash Shells have some basic Pattern matching capabilities built in to them. So, Let's look at an example

I am currently inside a folder with some files in it.

$ ls
file.csv       picture.jpg    README_en.txt  touch2.txt  video.mp4
HelloWorld.rb  program1.java  touch1.txt     touch2.vim

If i want to see only the files with the extension .txt, I can do this.

$ ls *.txt
README_en.txt  touch1.txt  touch2.txt

This can be thought of as regex in its basic form. We are giving a search pattern and we are seeing the output that matches this pattern. This '*' here is called a Wild card Character which basically matches anything and everything.

This time let's say I want to search for a txt file whose name is 'touch' followed by something [In this folder we have touch1.txt , touch2.txt ]. Let's say i don't remember the exact number following the touch. To search for that, I can use the ? operator and that will give me this.,

$ ls touch?.txt
touch1.txt  touch2.txt

Now, the '?' operator is also a wild card character but just for one character match. So, if there is a file called touchA.txt or touch%.txt, they will be matched too but touchAB.txt will not be matched.

So, these are how you can improve your search results using search patterns. We can use programs like grep, egrep and sed to do a lot more than just this. So, we will use them in the upcoming tutorials.

That's everything for this tutorial. So, stay tuned for the upcoming ones.
Happy RegExing!

Durga Swaroop Perla

RegEx : The Right Way |Tutorial 1

Published

Category

Tags