How To Use Regular Expression With Python - Part 2 - python regex tutorial

Hello friends,





This is our second part of how to use regular expression with python tutorial.

For First part click here

In This Tutorial, we will discuss about how we can use regular expression for simple works.

here, i am using python 2.7 and Ubuntu.



So, let's start. 
part1 slides
In [3]:
#
# -- Useful Reference Syntax --------------
#
# Some special characters are:
#
# abc  Letters
# 123    Digits
# \d      Any Digit
# \D      Any Non-digit character
# .       Any Character
# \.      Period
# [abc]   Only a, b, or c
# [^abc]  Not a, b, nor c
# [a-z]   Characters a to z
# [0-9]   Numbers 0 to 9
# \w      Any Alphanumeric character
# \W      Any Non-alphanumeric character
# {m}     m Repetitions
# {m,n}   m to n Repetitions
# *       Zero or more repetitions
# +       One or more repetitions
# ?       Optional character
# \s      Any Whitespace
# \S      Any Non-whitespace character
# ^...$     Starts and ends
# (...)     Capture Group
# (a(bc)) Capture Sub-group
# (.*)    Capture all
# (abc|def)   Matches abc or def

import re   # Python Module For Regular Expression

# Example String  
text="""My Name Is Suraj Singh Bisht 
But you can call me ssb. and 
my email address is surajsinghbisht054@gmail.com 
"""

# So, let's start
#
# Example 1
#
# First, We will search "suraj" keyword from example string
#
# For This, we will use findall function
#
# re.findall(pattern, string)
#

pattern='suraj'  # Pattern

re.findall(pattern, text) 
Out[3]:
['suraj']
In [18]:
# So, we got ['suraj'] output. but as you can see, suraj is in lowercase.
# means this keyword is from my email address present in example string (surajsinghbisht054@gmail.com).
# hmm, if we want to find upper and lower both then, we can try many other patterns also.
# so, let's try another example
# Example 2

pattern='suraj|Suraj' # Here "|" means 'or' 

re.findall(pattern, text)
Out[18]:
['Suraj', 'suraj']
In [19]:
# Now, Our Output is ['Suraj', 'suraj']
# Let's Try some Other Examples
# Example 3.

pattern='S.' # Here '.' means any character except new line

re.findall(pattern,text)
Out[19]:
['Su', 'Si']
In [20]:
# I am trying my best to explain all topics clearly in examples.
# so, keep reading.

# Example 4.

pattern='s.{3}' # here {m} means m Repetitions, so here re is trying to match any 3 character after "s". 

re.findall(pattern,text)
# so, our result is :
Out[20]:
['s Su', 'sht ', 'ssb.', 'ss i', 's su', 'sing', 'sht0']
In [21]:
#
# Example 5.

pattern='s.{1,5}' # here {m,n} m to n Repetitions, so here re is trying 
# to match any character between 1 to 5 repetitions after "s". 

re.findall(pattern,text)
# so, our result is :
Out[21]:
['s Sura', 'sht ', 'ssb. a', 'ss is ', 'surajs', 'sht054']
In [22]:
#
# Example 6.

pattern='s.*' # here * means zero or more repetitions and "." means any character except new line,

re.findall(pattern,text)
# so, our result is :
Out[22]:
['s Suraj Singh Bisht ', 'ssb. and ', 'ss is surajsinghbisht054@gmail.com ']
In [23]:
#
# Example 7.

pattern='s.+' # here + means One or more repetitions

re.findall(pattern,text)
# so, our result is :
Out[23]:
['s Suraj Singh Bisht ', 'ssb. and ', 'ss is surajsinghbisht054@gmail.com ']
In [25]:
#
# Example 8.

pattern='(s\S+)' # here ( ) means capture in groups and \S means Any Non-whitespace character

re.findall(pattern,text)
# so, our result is :
Out[25]:
['sht', 'ssb.', 'ss', 'surajsinghbisht054@gmail.com']
In [28]:
# I think, now you got it how to use regular expressions. 
# So, Let's try some practical examples
# Here, we will search both "suraj" and "Suraj" in one pattern
# Example 9.

pattern='((s|S)uraj)' # Here i am using combination of (ab(c)) Capture in Sub-group and (abc|def)   Matches abc or def.

re.findall(pattern,text)
# so, our result is :
Out[28]:
[('Suraj', 'S'), ('suraj', 's')]
In [30]:
# let's try some other examples also
# here, we will try to find email address from example string
#
# Example 10.

pattern="\S*@\S*" # Here, \S Means any non-whitespace character and * means one or more repetitions.

re.findall(pattern,text)
# so, our result is :
Out[30]:
['surajsinghbisht054@gmail.com']
In [39]:
#
# Let's Take New Example Text
text="""
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
mobile no.  100-1000-152-86
email       suraj@singh.com
blog        bitforestinfo.blogspot.com"""

# Example 11.
# In this example we will try to search mobile no.

pattern = "\d\d\d-\d\d\d\d-\d\d\d-\d\d" # here \d mean digits

re.findall(pattern,text)
# so, our result is :
Out[39]:
['100-1000-152-86']
In [34]:
# Let's Simplify this pattern
# "\d\d\d-\d\d\d\d-\d\d\d-\d\d" to
#
# \d\d\d-   = \d{3}-
# \d\d\d\d- = \d{4}-
# \d\d\d-   = \d{3}-
# \d\d      = \d{2}

# Example 12.
# In this example we will try to search mobile no again but with other pattern expression.

pattern = "\d{3}-\d{4}-\d{3}-\d{2}" # here \d mean digits

re.findall(pattern,text)
# so, our result is :
Out[34]:
['100-1000-152-86']
In [41]:
# let's try to search both email and phone no. togeher in one pattern 
# Example 13.

pattern="(\S*@\S*)|(\d{3}-\d{4}-\d{3}-\d{2})" # here (\S*@\S*) for email "|" means or 
#and (\d{3}-\d{4}-\d{3}-\d{2}) for phone no.

re.findall(pattern,text)
# so, our result is :
Out[41]:
[('', '100-1000-152-86'), ('suraj@singh.com', '')]
In [53]:
# let's try to capture ABCDEFGHIJKLMNOPQRSTUVWXYZ and bitforestinfo.blogspot.com together
# Example 14.

pattern="([a-z]*.blogspot.com)|([A-Z]+)" # here (\S*@\S*) for email "|" means or 
#and (\d{3}-\d{4}-\d{3}-\d{2}) for phone no.

re.findall(pattern,text)
# so, our result is :
Out[53]:
[('', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), ('bitforestinfo.blogspot.com', '')]
In [58]:
# Now, let's try something new.
# Example 15. 
# here, i  am trying to find website addresses from example string

text="""
www.examplesite.com
ww.examplesite.org
www.examplesite.in
www.examplesite.it
www.examplesite.cu
www.examplesite.st
"""

pattern = "w{2,3}.[a-z]*.{2,5}"

re.findall(pattern,text)
# so, our result is :
Out[58]:
['www.examplesite.com',
 'ww.examplesite.org',
 'www.examplesite.in',
 'www.examplesite.it',
 'www.examplesite.cu',
 'www.examplesite.st']
In [ ]:
# I think, this is enough for today.


In Our next tutorial, we will discuss about some simple and some 
advance techniques of regular expression.

For Third Part Click Here

Thanks For Reading!.

For Any Type Of Suggestion or help
please comment below, 
by S.S.B 
Email : surajsinghbisht054@gmail.com

Share this

Related Posts

Previous
Next Post »