Regular Expressions or RegEx
is a sequence of characters that define a search pattern. Regex is every where these days and you can use it to extract information from Text files, Log files, Dictionaries, Spread sheets and webpages. Every major programming language has support for Regular Expressions. Most importantly grep, awk and sed use regex to find/replace matches.
Regular Expressions can help you save a lot of time. Instead of writing complex String pattern searches which span over multiple lines, regex gets the job done really easily and really fast.
Let's look at a simple scenario where you might want to use Regex. Let's say you have a String and you want to check if it is a Website URL. So, here are a few conditions that a URL should satisfy
- Should have http:// , https://
- May or may not have www.
- Should have a .com , .org or something similar
- Can have characters, digits, underscores etc.,
- Might even have some sort of port numbers at the end http://google.com:80/
So, matching all these individually in any Programming language with various String parsing conditions can be a really challenging task. But, using RegEx the same thing can be achieved much easier and much faster.
That sounds like fun,doesn't it? . Well, Lets get started.
If you have used Linux shell/terminal before you probably would have used Regular Expressions already. Bash Shells have some basic Pattern matching capabilities built in to them. So, Let's look at an example
I am currently inside a folder with some files in it.
$ ls
file.csv picture.jpg README_en.txt touch2.txt video.mp4
HelloWorld.rb program1.java touch1.txt touch2.vim
If i want to see only the files with the extension .txt, I can do this.
$ ls *.txt
README_en.txt touch1.txt touch2.txt
This can be thought of as regex in its basic form. We are giving a search pattern and we are seeing the output that matches this pattern. This '*' here is called a Wild card Character which basically matches anything and everything.
This time let's say I want to search for a txt file whose name is 'touch' followed by something [In this folder we have touch1.txt , touch2.txt ]. Let's say i don't remember the exact number following the touch. To search for that, I can use the ? operator and that will give me this.,
$ ls touch?.txt
touch1.txt touch2.txt
Now, the '?' operator is also a wild card character but just for one character match. So, if there is a file called touchA.txt or touch%.txt, they will be matched too but touchAB.txt will not be matched.
So, these are how you can improve your search results using search patterns. We can use programs like grep, egrep and sed to do a lot more than just this. So, we will use them in the upcoming tutorials.
That's everything for this tutorial. So, stay tuned for the upcoming ones.
Happy RegExing!