How To Extract Html Link Using Python And HTMLParser Module - | python web scraping | python example - part 3

Posted by Suraj Singh on January 22, 2017 · 6 mins read
Hello readers,


                           Today, This is my third part of web scraping tutorials. and in this tutorial, i am gonna to show you how to create a simple html link extractor using HTMLParser modules. For Second Part Click Here. Here, In This Script, I am Using urllib2 for downloading html data and then, HTMLParser for link extraction. 

Another, if you are new visitor then don't forget to check our blog index.   

So, let's start:


 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#!/usr/bin/python
# ---------------- READ ME ---------------------------------------------
# This Script is Created Only For Practise And Educational Purpose Only
# This Script Is Created For https://www.bitforestinfo.com
# This Script is Written By
__author__='''

######################################################
By
######################################################

Suraj Singh


surajsinghbisht054@gmail.com
https://www.bitforestinfo.com/


######################################################
'''
# Import Module
import urllib2
import sys
from HTMLParser import HTMLParser


# For More Info https://docs.python.org/2/library/htmlparser.html
class link_extractor(HTMLParser):
def handle_starttag(self,tag, attrs):
for attr in attrs:
if 'href' in attr[0]:
print attr[1]



if len(sys.argv)==1:
print "[*] Please Provide Domain Name:\n Usages: python link_hp.py www.examplesite.com\n"
sys.exit(0)

# Retrieve Html Data From Url
def get_html(url):
try:
page = urllib2.urlopen(url).read()
except Exception as e:
print "[Error Found] ",e
page=None
return page

html_data=get_html(sys.argv[1])

parser=link_extractor() # Creating Handler
parser.feed(html_data) # Feeding Data

For Download Raw Script Click Here

In Our Next Tutorial,  we will learn about how to create a complete website crawler and also, we will create python script for downloading all images of any webpage. 
And Believe me this journey is going to be very interesting. because in future tutorials, you will see something really more interesting scripts and solutions.
For More Update, Visit Our Regularly. 
And Subscribe Our Blog, 
Follow Us and share it.
For Any Type of Suggestion Or Help
Contact me:
Suraj
surajsinghbisht054@gmail.com