How To Use Regular Expression With Python - Part 3 - python regex tutorial

hello friends,



This is our third part of how to use regular expression with python tutorial. 

For second part click here

In This Tutorial, i will teach you about how to use regular expression for advance Text processing.

but friend, if you are new, then please read all posts of this tutorials number wise. because serial wise regular expression is easy to understand.

so here, i am using python 2.7 and Ubuntu.

part3 slides
In [107]:
#
# -- Useful Reference Syntax --------------
#
# Some special characters are:
#
# abc  Letters
# 123    Digits
# \d      Any Digit
# \D      Any Non-digit character
# .       Any Character
# \.      Period
# [abc]   Only a, b, or c
# [^abc]  Not a, b, nor c
# [a-z]   Characters a to z
# [0-9]   Numbers 0 to 9
# \w      Any Alphanumeric character
# \W      Any Non-alphanumeric character
# {m}     m Repetitions
# {m,n}   m to n Repetitions
# *       Zero or more repetitions
# +       One or more repetitions
# ?       Optional character
# \s      Any Whitespace
# \S      Any Non-whitespace character
# ^...$     Starts and ends
# (...)     Capture Group
# (a(bc)) Capture Sub-group
# (.*)    Capture all
# (abc|def)   Matches abc or def

import re   # Python Module For Regular Expression

example_string="""\
suraj singh bisht\
SURAJ SINGH BISHT\
surajsinghbisht054@gmail.com\
bitforestinfo.blogspot.com\
yashwantsinghbisht054@gmail.com\
0124-100-125-2563\
124-586-9875\
This is an example text\
"""
import re
# Meta Characters

# / Backslash
# ^ Caret
# $ Dollar Sign
# . Dot
# ? Question Mark
# * Asterisk
# | Pipe
# + Plus Sign
# ( Opening Parenthesis
# ) Closing Parenthesis
# [ Opening Square Bracket
# { The Opening Curly Brace

# Hmm, Now, You are wondering what are the usages of meta character.
# if i guess right then please try to rewind you memory and then you will find that
# many character from this list. we have already used in previous examples.
# For More Info Check my previous tutorials about regular expressions
#
# so, let's start this tutorial
#
In [108]:
#
# In this example we will use more useful functions.
# So, keep reading
#
# In this Techniques, We need to compile patterns. To do, we need to transforme patterns in bytecodes
# as shown in the below example

pattern = re.compile('suraj') # Here Compiling pattern in bytecodes

print pattern.match(example_string) # 
<_sre.SRE_Match object at 0x7f583c51b098>
In [17]:
# Here Our OUtput is <_sre.SRE_Match object at 0x7f583c65f578>
# its means we find match and if we got none in output then its mean no string matched found 
#
# Example 1.
#
pattern = re.compile('suraj|SURAJ') # Compiling

k=pattern.match(example_string) 

# Here, k.start()  Parameter For starting index number of matched keyword  and 
# for ending index number of matched keyword k.end() parameter  
print k.start(), k.end() 

# k.span() for retrive both starting index and ending index number both together
print k.span()
0 5
(0, 5)
In [150]:
#
# Example 2.
# 
example_string="""surajsinghbisht054@gmail.com bitforestinfo.blogspot.com yashwantsinghbisht054@gmail.com 0124-100-125-2563 124-586-9875"""

pattern = re.compile('(\w+).(\d\d\d\d-\d\d\d-\d\d\d)')

result=pattern.search(example_string)

print result.group(0) # For all
print result.group(1) # 
print result.group(2) # 
com 0124-100-125
com
0124-100-125
In [22]:
#
# Example 3.
#
pattern.findall(example_string) # hmm, this function is already explained in previous tutorial
Out[22]:
['suraj', 'SURAJ', 'suraj']
In [151]:
# let's Try some other examples
#
# Example 4.
# Here, I am Searching for Email addresses

example_string="""
suraj singh bisht
SURAJ SINGH BISHT
surajsinghbisht054@gmail.com
bitforestinfo.blogspot.com
yashwantsinghbisht054@gmail.com
0124-100-125-2563
124-586-9875
This is an example text
"""

pattern = re.compile('([a-zA-Z0-9]+@[a-z]+.[a-z]+)') # Compiling

pattern.findall(example_string)
Out[151]:
['surajsinghbisht054@gmail.com', 'yashwantsinghbisht054@gmail.com']
In [41]:
# Hmm, we got our result
# now, let's try to get result with its name.
# means here we will give name to pattern group
#
# Example 5.
#
# (?P<pattern_name>here_pattern)
pattern = re.compile('(?P<email>[a-zA-Z0-9]+@[a-z]+.[a-z]+)') # Comp

result=pattern.search(example_string)

result.groupdict()
Out[41]:
{'email': 'surajsinghbisht054@gmail.com'}
In [44]:
# here, i am trying to search phone number
# Example 6.
#
pattern = re.compile('(?P<phone>\d{3}-\d{3}-\d{4})') # Comp

result=pattern.search(example_string)

result.groupdict()
Out[44]:
{'phone': '100-125-2563'}
In [91]:
# Now, here we will try to search Email and number together
# Example 7.
#
pattern = re.compile('(?P<email>([a-zA-Z0-9]+@[a-z]+.[a-z]+))(\W.*\W.*\W)(?P<phone>\d{3}-\d{3}-\d{4})') # Comp

result=pattern.search(example_string)

if result:
    print result.groupdict()
{'phone': '124-586-9875', 'email': 'yashwantsinghbisht054@gmail.com'}
In [123]:
# Let's Try Some Simple Examples 
#
# Example 8.
#
patterns=['[a-zA-Z0-9]+@[a-z]+.[a-z]+',    # For Email
         '\d{3}-\d{3}-\d{4}',              # For Phone
         ]

for pattern in patterns:
    #print pattern,example_string
    result=re.search(pattern, example_string)
    if result:
        print result.start(),result.end()
        print example_string[result.start():result.end()]
surajsinghbisht054@gmail.com
100-125-2563
In [152]:
# In this Examples 
# We are trying to split string in small parts
# and our given pattern is the breaking point of line
# let's try this also
# Example 9.
#
re.split('sur.{2}', example_string) # Here, I want to break string from suraj keyword
Out[152]:
['\n',
 ' singh bisht\nSURAJ SINGH BISHT\n',
 'singhbisht054@gmail.com\nbitforestinfo.blogspot.com\nyashwantsinghbisht054@gmail.com\n0124-100-125-2563\n124-586-9875\nThis is an example text\n']
In [154]:
# one more example
#
# Example 10.
#
example_string ="The is My Cat and This is my dog but this is my horse"
#
# let's try to break this line from "and" and "but" keyword
re.split('and|but', example_string)
Out[154]:
['The is My Cat ', ' This is my dog ', ' this is my horse']
In [185]:
# Now, Move To Next Level and Discuss About Some Special Cases
#
# Python Provides Us with some flag that can help us to modify our search results
#
# Flags
# re.s, re.DOTALL 
# re.I, re.IGNORECASE
# re.L, re.LOCALE
# re.M, re.MULTILINE
# re.U, re.UNICODE
# re.X, re.VERBOSE

# For More Info Visit Here: https://docs.python.org/2/library/re.html#module-contents
#
# Example 11.
#
example_string="""
suraj singh bisht
SURAJ SINGH BISHT
surajsinghbisht054@gmail.com
bitforestinfo.blogspot.com
yashwantsinghbisht054@gmail.com
0124-100-125-2563
124-586-9875
This is an example text
"""

print re.findall('SURAJ', example_string, re.IGNORECASE)
# as you can see In output, my input pattern is in uppercase but my output is in both lower and upper.
['suraj', 'SURAJ', 'suraj']
In [195]:
#
# Example 12.
#
print re.findall('BITF(.*)com', example_string, re.IGNORECASE| re.DOTALL)
#
# normally . character is for capturing all keyword except new line but because of re.DOTALL flag, now
# in this example . is also capturing new lines
['stinfo.blogspot.com\nyashwantsinghbisht054@']

In Our next tutorial, we will discuss about some other techniques of regular expression.

For Fourth Part Click Here
Thanks For Reading!.

For Any Type Of Suggestion or help

please comment below, 
by S.S.B 
Email : surajsinghbisht054@gmail.com

Share this

Related Posts

Previous
Next Post »