How To Use Regular Expression With Python - Part 4 - python regex tutorial

hello friends,




This is our fourth part of how to use regular expression with python tutorial. 

For third part click here

In This Tutorial, i will teach you about how to use regular expression for advance Text processing.

but friend, if you new, then please read this tutorials serial wise. because serial wise regular expression is easy to understand.

so here, i am using python 2.7 and Ubuntu.


part4 slides
In [22]:
#
# -- Useful Reference Syntax --------------
#
# Some special characters are:
#
# abc  Letters
# 123    Digits
# \d      Any Digit
# \D      Any Non-digit character
# .       Any Character
# \.      Period
# [abc]   Only a, b, or c
# [^abc]  Not a, b, nor c
# [a-z]   Characters a to z
# [0-9]   Numbers 0 to 9
# \w      Any Alphanumeric character
# \W      Any Non-alphanumeric character
# {m}     m Repetitions
# {m,n}   m to n Repetitions
# *       Zero or more repetitions
# +       One or more repetitions
# ?       Optional character
# \s      Any Whitespace
# \S      Any Non-whitespace character
# ^...$     Starts and ends
# (...)     Capture Group
# (a(bc)) Capture Sub-group
# (.*)    Capture all
# (abc|def)   Matches abc or def

import re   # Python Module For Regular Expression

example_string="""\
suraj singh bisht\
SURAJ SINGH BISHT\
surajsinghbisht054@gmail.com\
bitforestinfo.blogspot.com\
yashwantsinghbisht054@gmail.com\
0124-100-125-2563\
124-586-9875\
This is an example text\
"""
# Meta Characters

# / Backslash
# ^ Caret
# $ Dollar Sign
# . Dot
# ? Question Mark
# * Asterisk
# | Pipe
# + Plus Sign
# ( Opening Parenthesis
# ) Closing Parenthesis
# [ Opening Square Bracket
# { The Opening Curly Brace

# Hmm, Now, You are wondering what are the usages of meta character.
# if i guess right then please try to rewind you memory and then you will find that
# many character from this list. we have already used in previous examples.
# For More Info Check my previous tutorials about regular expressions
#
# so, let's start this tutorial
#
In [18]:
# Unicode Flag Example
#
# Example 1.
#
string=u'सुरज सिनगह बिसहत'

pattern_unicode = ur'.+'

pattern = re.compile(pattern_unicode, re.UNICODE) # Using re.UNICODE Flag

result=pattern.search(string)
print result.group()
सुरज सिनगह बिसहत
In [42]:
# re.VERBOSE Flag Example
# Example 2.
#
pattern_string="""\d+  # the integral part
\W+           # Capture Whitespace
\d+           # Capture Digit
                   """
pattern=re.compile(pattern_string,re.VERBOSE)

result=pattern.search(example_string)


if result:
    print result.group(0)
0124-100
In [67]:
# Looking Behind and AHead
# (?=pattern) match ahead
# (?<=abc) Match Behind
# Example 3.
# match behind

pattern=re.compile('(?<=abc).{4}')

pattern.findall('abc-efg-hij-klm')
Out[67]:
['-efg']
In [75]:
# match ahead
# (?=pattern) 
# Example 4.
#
pattern=re.compile('.{10}(?=klm)')

pattern.findall('abc-efg-hij-klm')
Out[75]:
['c-efg-hij-']
In [77]:
# re.sub Works like a replace function but
# with power of regular expressions
# Example 5.
#
# re.sub(pattern, repl, string, count=0, flags=0)
#
pattern=re.compile('abc')

pattern.sub('xyz','abc-efg-hij-klm')
Out[77]:
'xyz-efg-hij-klm'
In [79]:
#
# Example 6.
# 
pattern=re.compile('(\w\wc)')

pattern.sub('xyz','abc-efg-hij-klm')
Out[79]:
'xyz-efg-hij-klm'
In [88]:
#
# Example 7.
#
string=u'सुरज 1 सिनगह 2 बिसहत'

pattern_unicode = ur'\d'

pattern = re.compile(pattern_unicode, re.UNICODE) # Using re.UNICODE Flag

print "Spliting  : ", pattern.split(string)
print "Replacing : ", pattern.sub('detected',string)
Spliting  :  [u'\u0938\u0941\u0930\u091c ', u' \u0938\u093f\u0928\u0917\u0939 ', u' \u092c\u093f\u0938\u0939\u0924']
Replacing :  सुरज detected सिनगह detected बिसहत
In [105]:
#
# Example 8.
#
example_string="""
suraj singh bisht
SURAJ SINGH BISHT
surajsinghbisht054@gmail.com
bitforestinfo.blogspot.com
yashwantsinghbisht054@gmail.com
0124-100-125-2563
124-586-9875
This is an example text
"""
patterns="""
(?P<email>([a-zA-Z0-9]+@[a-z]+.[a-z]+)) # For Email
(\W.*\W.*\W)                            # For Whitespace
(?P<phone>\d{3}-\d{3}-\d{4})            # For Phone
"""

print re.search(patterns, example_string, re.VERBOSE|re.MULTILINE).group()
yashwantsinghbisht054@gmail.com
0124-100-125-2563
124-586-9875
In [107]:
#
# Done
#
# So, Our This Tutorial Series is now completed.
# 



For First Part Click Here

Thanks For Reading!.

For Any Type Of Suggestion or help

please comment below, 
by S.S.B 
Email : surajsinghbisht054@gmail.com

Share this

Related Posts

Previous
Next Post »