how to use python urlparse module?

Namaste Friends,



Today, In This Tutorial I Am Going To Write About What is Urlparse Module? And How To Use Urlparse Module?

So, Let's Start With Basic Function

Q 1. What Is Url parse Module?
Ans. This Module Provide Us A Standard Way To Break Uniform Resource Locator (URL) in Various Components. In Simple Words, This Module Help Us To Simply Extract URL in Various Component To Filter Any Specific Component From URL Or To Combine Back Into a String URL and to convert a relative URL to an Absolute URL given a Base URL.

Q 2. Benefits Of URL Parse Module?
Ans.  With This Module We Can Do Many Things Related To URL String Like Extracting Host name, Port, Path Or Any Other Information From String URL. This Module Also Helps In Converting A Relative URL to an Absolute URL Given A Base URL, Merging URL Strings, URL Splitting Etc.

Now, Let Start Our Practical Example Tutorial



Input : [1]  


#
# ==================================================
#          PYTHON urlparse MODULE TUTORIAL
# ==================================================
# 
# author   : suraj singh bisht
# contact  : SSB
#            surajsinghbisht054@gmail.com
#            https://bitforestinfo.blogspot.com
#            
# Here, For This Tutorial
#
# I am using
#
# Operating System : Ubuntu 16.04
# Python Version   : python 2.7.12
# Editor           : ipython notebook
#

Output : [1]  

  

Input : [2]  

#
# import module
#
import urlparse

Output : [2]  

  

Input : [3]  

#
# Example 1.
#
# In This Example, You Will See, How urlparse.urlpase
# function Can Easily Extract URL in Components.
#

site_url= "https://www.bitforestinfo.com:80/about.html?user=example#maincontent"

# urlparse.urlparse object
url = urlparse.urlparse(site_url)


# Different Components Of url

print "Original   : ", url.geturl()
print "Url Scheme : ", url.scheme
print "Netloc     : ", url.netloc
print "Path       : ", url.path
print "Params     : ", url.params
print "Query      : ", url.query
print "Fragment   : ", url.fragment
print "Port       : ", url.port
print "UserName   : ", url.username
print "Password   : ", url.password
print "HostName   : ", url.hostname

Output : [3]  

Original   :  https://www.bitforestinfo.com:80/about.html?user=example#maincontent
Url Scheme :  https
Netloc     :  www.bitforestinfo.com:80
Path       :  /about.html
Params     :  
Query      :  user=example
Fragment   :  maincontent
Port       :  80
UserName   :  None
Password   :  None
HostName   :  www.bitforestinfo.com

Input : [4]  

#
#                    URL FORMAT DESCRIPTION (According To Urlparse)
#
#
# Example Url : https://www.bitforestinfo.com:80/about.html?user=example#maincontent
# 
# Now, Let Me Extract this Example Url in 6 UrlParse Components
#
#  https : // www.bitforestinfo.com : 80 / about.html? user=example # maincontent
# |_____|    |_____________________| |__| |_________| |____________| |__________|
#    |                   |             |       |             |            |  
#  scheme           hostname          port    path         query       fragment
#
# -------------------------------------------------------------------------
# |  Attribute |  Index  |      Value            |  Value if not present  |
# -------------------------------------------------------------------------
# |  scheme    |   0     |URL scheme specifier   | scheme parameter       |
# |  netloc    |   1     |Network location part  | empty string           |
# |  path      |   2     |Hierarchical path      | empty string           |
# |  params    |   3     |last path element      | empty string           |
# |  query     |   4     |Query component        | empty string           |
# |  fragment  |   5     |Fragment identifier    | empty string           |
# |  username  |         |  User name            | None                   |
# |  password  |         |Password               | None                   |
# |  hostname  |         |Host name (lower case) | None                   |
# |  port      |         |Port number as integer | None                   |
# -------------------------------------------------------------------------
#

Output : [4]  

  

Input : [5]  

#
# Example 2.
#
# In This Example, You Will See, How urlparse.urlsplit can Split
# provided url in various components. This function is similar 
# to previous urlparse() function, but this function does not split 
# the params from the URL.
#


# Urlparse.urlsplit object
url=urlparse.urlsplit(site_url)


# Components
print "Original   : ", url.geturl()
print "Url Scheme : ", url.scheme
print "Netloc     : ", url.netloc
print "Path       : ", url.path
print "Query      : ", url.query
print "Fragment   : ", url.fragment
print "Port       : ", url.port
print "UserName   : ", url.username
print "Password   : ", url.password
print "HostName   : ", url.hostname

Output : [5]  

Original   :  https://www.bitforestinfo.com:80/about.html?user=example#maincontent
Url Scheme :  https
Netloc     :  www.bitforestinfo.com:80
Path       :  /about.html
Query      :  user=example
Fragment   :  maincontent
Port       :  80
UserName   :  None
Password   :  None
HostName   :  www.bitforestinfo.com

Input : [6]  

#
# We Can Also Use Iter Function With Urlsplit Object
#
for i in url:
    print i
    
    

Output : [6]  

https
www.bitforestinfo.com:80
/about.html
user=example
maincontent

Input : [7]  

# 
# Example 3.
#
# In This Example, You will see how we can use 
# urlparse.urlsplit() and urlparse.urlunsplit()
# function for editing url in simple way.
# In Simple Word, First We Will Extract Url in Components
# Using urlparse.urlsplit, and Then, Join All COmponents 
# with Some Editing using urlparse.urlunsplit()
#

# Example Url
url = "https://www.bitforestinfo.com:80/about.html?user=example#maincontent"

# urlparse.urlsplit object
url_split = urlparse.urlsplit(url)

# extract various values in seperate variable
(scheme,netloc, path ,query, frag) = url_split 

# print urlparse.urlsplit values with index
print "\n Print Attributes With Index\n\n Index  | Attribute\n-------------------------"
for a,b in enumerate(url_split):
    print " {}      | {} ".format(a,b)
    
    
# join all URL Components Using urlparse.urlunsplit() function
url_unsplit=urlparse.urlunsplit((scheme,
                                  netloc, 
                                  'project.html' ,
                                  query, 
                                  frag
                                 ))

print "\n\n[+] Printing Unsplit Url With Some Editing in Url\n\nUrl : ",

# print urlunsplit
print url_unsplit

Output : [7]  

 Print Attributes With Index

 Index  | Attribute
-------------------------
 0      | https 
 1      | www.bitforestinfo.com:80 
 2      | /about.html 
 3      | user=example 
 4      | maincontent 


[+] Printing Unparsed Url With Some Editing in Url

Url :  https://www.bitforestinfo.com:80/project.html?user=example#maincontent

Input : [8]  

# 
# Example 4.
#
# In this Example, You will see how to use urlparse.urlparse and
# urlparse.urlunparse functions.
# Well, These Functions are similar to our previous example function
# urlparse.urlsplit() and urlparse.urlunsplit()
# 

# Example Url
url = "https://www.bitforestinfo.com:80/about.html?user=example#maincontent"

# urlparse.urlparse object
url_parse = urlparse.urlparse(url)

print "\n Print Attributes With Index\n\n Index  | Attribute\n-------------------------"
for a,b in enumerate(url_parse):
    print " {}      | {} ".format(a,b)
    
    
    
url_unparsed=urlparse.urlunparse((url_parse[0],
                                url_parse[1],
                                "home.html",
                                url_parse[3],
                                "user=suraj",
                                "MainImage",
                               ))

print "\n\n[+] Printing Unparsed Url With Some Editing in Url\n\nUrl : ",
print url_unparsed

Output : [8]  

 Print Attributes With Index

 Index  | Attribute
-------------------------
 0      | https 
 1      | www.bitforestinfo.com:80 
 2      | /about.html 
 3      |  
 4      | user=example 
 5      | maincontent 


[+] Printing Unparsed Url With Some Editing in Url

Url :  https://www.bitforestinfo.com:80/home.html?user=suraj#MainImage

Input : [9]  

#
# Example 5.
#
# In This Example, You Will See How TO Use urlparse.urljoin function.
# This Function allow us to merge 2 urls in one urls.
#
# Url 1
url_1 = "https://www.bitforestinfo.com/python/projects.html"

# Url 2
url_2 = "/about.html"

# Url 3
url_3 = "http://www.examplesite.com"

# Merge Url 1 and Url 2
print urlparse.urljoin(url_1, url_2)

# Merge Url 3 and Url 2
print urlparse.urljoin(url_3, url_2)

# Merge Url 1 and Url 3
print urlparse.urljoin(url_1, url_3)

Output : [9]  

https://www.bitforestinfo.com/about.html
http://www.examplesite.com/about.html
http://www.examplesite.com

Input : [10]  

# This Tutorial Ends Here,
#
# For Reference:
# 
#              Python Official Documentation
#
#            https://docs.python.org/2/library/urlparse.html
#
#

Output : [10]  

  


Written By:
                  SSB

Share this

Related Posts

Previous
Next Post »