In This Tutorial, i will teach you about how to use regular expression for advance Text processing.
but readers, if you new, then please read this tutorials serial wise. because serial wise regular expression is easy to understand.
so here, i am using python 2.7 and Ubuntu.
part4 slides
In [22]:
# # -- Useful Reference Syntax -------------- # # Some special characters are: # # abc Letters # 123 Digits # \d Any Digit # \D Any Non-digit character # . Any Character # \. Period # [abc] Only a, b, or c # [^abc] Not a, b, nor c # [a-z] Characters a to z # [0-9] Numbers 0 to 9 # \w Any Alphanumeric character # \W Any Non-alphanumeric character # {m} m Repetitions # {m,n} m to n Repetitions # * Zero or more repetitions # + One or more repetitions # ? Optional character # \s Any Whitespace # \S Any Non-whitespace character # ^...$ Starts and ends # (...) Capture Group # (a(bc)) Capture Sub-group # (.*) Capture all # (abc|def) Matches abc or def
importre# Python Module For Regular Expression
example_string="""\ suraj singh bisht\ SURAJ SINGH BISHT\ surajsinghbisht054@gmail.com\ www.bitforestinfo.com\ yashwantsinghbisht054@gmail.com\ 0124-100-125-2563\ 124-586-9875\ This is an example text\ """ # Meta Characters
# Hmm, Now, You are wondering what are the usages of meta character. # if i guess right then please try to rewind you memory and then you will find that # many character from this list. we have already used in previous examples. # For More Info Check my previous tutorials about regular expressions # # so, let's start this tutorial #
In [18]:
# Unicode Flag Example # # Example 1. # string=u'सुरज सिनगह बिसहत'
pattern_unicode=ur'.+'
pattern=re.compile(pattern_unicode,re.UNICODE)# Using re.UNICODE Flag
result=pattern.search(string) printresult.group()
सुरज सिनगह बिसहत
In [42]:
# re.VERBOSE Flag Example # Example 2. # pattern_string="""\d+ # the integral part \W+ # Capture Whitespace \d+ # Capture Digit """ pattern=re.compile(pattern_string,re.VERBOSE)
result=pattern.search(example_string)
ifresult: printresult.group(0)
0124-100
In [67]:
# Looking Behind and AHead # (?=pattern) match ahead # (?<=abc) Match Behind # Example 3. # match behind
pattern=re.compile('(?<=abc).{4}')
pattern.findall('abc-efg-hij-klm')
Out[67]:
['-efg']
In [75]:
# match ahead # (?=pattern) # Example 4. # pattern=re.compile('.{10}(?=klm)')
pattern.findall('abc-efg-hij-klm')
Out[75]:
['c-efg-hij-']
In [77]:
# re.sub Works like a replace function but # with power of regular expressions # Example 5. # # re.sub(pattern, repl, string, count=0, flags=0) # pattern=re.compile('abc')
pattern.sub('xyz','abc-efg-hij-klm')
Out[77]:
'xyz-efg-hij-klm'
In [79]:
# # Example 6. # pattern=re.compile('(\w\wc)')
pattern.sub('xyz','abc-efg-hij-klm')
Out[79]:
'xyz-efg-hij-klm'
In [88]:
# # Example 7. # string=u'सुरज 1 सिनगह 2 बिसहत'
pattern_unicode=ur'\d'
pattern=re.compile(pattern_unicode,re.UNICODE)# Using re.UNICODE Flag
# # Example 8. # example_string=""" suraj singh bisht SURAJ SINGH BISHT surajsinghbisht054@gmail.com www.bitforestinfo.com yashwantsinghbisht054@gmail.com 0124-100-125-2563 124-586-9875 This is an example text """ patterns=""" (?P<email>([a-zA-Z0-9]+@[a-z]+.[a-z]+)) # For Email (\W.*\W.*\W) # For Whitespace (?P<phone>\d{3}-\d{3}-\d{4}) # For Phone """