Wednesday, 20 January 2016

Pattern Search using Regular Expression in Python

In our daily life we found lots of problem of pattern search. like find all mobile numbers, or emails etc  from  given web page or from any file.

Writing manual code for that is not efficient and also very messy. Regular Expression is very popular technique used for pattern search (All compiler & interpreter use it ) & it is very easy to implement & it is very efficient .

I suggest you to write a code for extracting all emails from a webpage without using regular expression & test it .
Emails found on a webpage may follow some of these pattern like

Then run on any big webpage then you realize, your hours of hard work & hundreds of lines of code produce such a inefficient mess .

Now lets learn little bit of reguler expression in python , I hope you are familiar with Ipython notebook , If not take a look on that for few minute its very simple.

Ipython Notebook : Pattern Search , follow this link to see step wise working. you can download run on your local pc for better experience . if you face any problem with that link  follow this [Pattern_Search [Github_link]]

Three main steps is

  1. import re Module
  2. write regular expression for your pattern
  3. search 
Above link provide enough information to understand basics.

Now lets move to our email extractor regular expression . Email_harvester , open this link to see Ipython notes & Demonstration for that. 

Note:For more detail information follow this bookAutomate the boring stuff

you can use This technique for harvesting any information from web, or any files. Here are few exmples or models of email harvester .Email_Collection

No comments:

Post a comment


Blogger Widgets