How To Extract Html Link Using Python Beautiful Soup Module - | python web scraping | python example - part 1

Hello Friends,

                             Today, This is my first part of web scraping tutorials. and in this tutorial, I am gonna to show you how to create a simple HTML link extractor using beautiful Soup Python Module.
but if you are new visitor then don't forget to check our blog index.



                     
Here, In this script first, we will download all Html Content website.

then, simply extract all HTML links using python beautiful soup modules.

today's tutorials examples are very very easy to under stand.

hmm, let's start:

1. Website Link Extractor Written In Python 


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/usr/bin/python
# ---------------- READ ME ---------------------------------------------
# This Script is Created Only For Practise And Educational Purpose Only
# This Script Is Created For https://bitforestinfo.blogspot.in
# This Script is Written By
__author__='''

######################################################
                By S.S.B Group                          
######################################################

    Suraj Singh
    Admin
    S.S.B Group
    surajsinghbisht054@gmail.com
    https://bitforestinfo.blogspot.in/

    Note: We Feel Proud To Be Indian
######################################################
'''
# Imprt Module
import bs4
import urllib2, sys

if len(sys.argv)==1:
    print "[*] Please Provide Domain Name:\n Usages: python link_bs4.py www.examplesite.com\n"
    sys.exit(0)

def parse_url(url):
    try:
     html=urllib2.urlopen(url).read() # Reading Html Codes
    except Exception as e:
     print "[Error] ",e
     sys.exit(0)
    parse=bs4.BeautifulSoup(html)    # Feed Data To bs4
    for i in parse.findAll('a'):  # Searching For link Tag
        if 'href' in i.attrs.keys(): # Searching For Href key
            link=i.attrs['href']
            print link
    return 

parse_url(sys.argv[1])


For Download Raw Script Click Here

In Our Next Tutorial,

                                we will learn about how to create above given script without beautiful soup using python, urllib2 and re.

For More Update, Visit ours Regularly.

For Any Type of Suggestion Or Help
Contact me:
S.S.B
surajsinghbisht054@gmail.com

or post a comment 

Share this

Related Posts

Previous
Next Post »