How to Use Python Beautiful Soup Module - Complete Beautiful Soup Tutorial - Part 2 | python web scraping | python example

Namaste friends,



                   This Is Our Second Part Of Complete Beautiful Soup Tutorials. And In This Part, I Am Going To Show You Some Practical Examples.

Because with practical example, i can explain you in better way,

So, Let's Start But First If You Are New Visitor In Our Blog Then I Will Suggest You To Check Our Index Because Their You can find many interesting stuff written in python. or for first part click here



part2 slides
In [1]:
#
# Author:
#       SSB
#       surajsinghbisht054@gmail.com
#       https://bitforestinfo.blogspot.com
#
# Here, I am Using
# 1. Python 2.7         (Python Version)
# 2. BeautifulSoup 4    (bs4 Version)
# 3. Ipython Notebook   (Code Editor)
# 4. Ubuntu             (Operating System)
#
# So, Let's Start With Practical Examples
In [4]:
# Example 1.
# 
# Here, We Are Trying To Extract All Links from Webpage
# So, let's start
#
# Imprt Module
import bs4
import urllib2, sys

if len(sys.argv)==1:
    print "[*] Please Provide Domain Name:\n Usages: python link_bs4.py www.examplesite.com\n"
    sys.exit(0)

def parse_url(url):
    try:
        html=urllib2.urlopen(url).read() # Reading Html Codes
    except Exception as e:
        print "[Error] ",e
        sys.exit(0)
    parse=bs4.BeautifulSoup(html,'html.parser')   # Feed Data To bs4
    
    for i in parse.findAll('a'): # Searching For link Tag
        
        if 'href' in i.attrs.keys():# Searching For Href key
            
            link=i.attrs['href']
            
            print link
    return 

parse_url("https://bitforestinfo.blogspot.com") # Enter Your Site Address
In [1]:
# Example 2.
# Here, In This Example 
# We will try to scrap data from who.is website
# so, let's start
#
# Import module
from bs4 import BeautifulSoup
import urllib2, sys

# Who.is Url
url="http://who.is/whois/"

# Website Name
website="www.stackoverflow.com"

# Please Wait Message
print "[*] Please Wait.... Connecting To Who.is Server.."

# Download And Read Html Data
htmldata=urllib2.urlopen(url+website).read()

class_name="rawWhois"  # Class Name For Extraction

# BeautifulSoup Contructor
try: # Check If lxml is installed
    import lxml
    # if installed then,use this
    parse=BeautifulSoup(htmldata,'lxml')
    print "[*] Using lxml Module For Fast Extraction"
except:
    # if lxml not installed then try this
    parse=BeautifulSoup(htmldata, "html.parser")
    print "[*] Using built-in Html Parser [Slow Extraction. Please Wait ....]"

    
try:
    container=parse.findAll("div",{'class':class_name}) # Extracting Class
    
    sections=container[1:]                              # Remove First Value
    
    for section in sections:                            # iter all values
        
        extract=section.findAll('div')                  # Search for div tag
        
        heading=extract[0].text                         # Extract Text
        
        print '\n[ ',heading,' ]'                       # Heading
        
        for i in extract[1].findAll('div'):             # Find All div Tag
            
            fortab='\t|'                                # print values
            
            for j in i.findAll('div'):
                
                fortab=fortab+'----'
                
                line=j.text.replace('\n', ' ')
                
                print fortab,'>', line
                
except Exception as e:
    print "[ Error ] ", e
    
    print "[ Last Update : 1 Jan 2017 ]"
    
    print "[ Script Is Not Updated ]"
    
    print "[ Sorry! ]"
[*] Please Wait.... Connecting To Who.is Server..
[*] Using lxml Module For Fast Extraction

[  Registrant Contact Information:  ]
 |---- > Name
 |-------- > Sysadmin Team
 |---- > Organization
 |-------- > Stack Exchange, Inc.
 |---- > Address
 |-------- > 110 William St , Floor 28
 |---- > City
 |-------- > New York
 |---- > State / Province
 |-------- > NY
 |---- > Postal Code
 |-------- > 10038
 |---- > Country
 |-------- > US
 |---- > Phone
 |-------- > +1.2122328280
 |---- > Email
 |-------- > 

[  Administrative Contact Information:  ]
 |---- > Name
 |-------- > Sysadmin Team
 |---- > Organization
 |-------- > Stack Exchange, Inc.
 |---- > Address
 |-------- > 110 William St , Floor 28
 |---- > City
 |-------- > New York
 |---- > State / Province
 |-------- > NY
 |---- > Postal Code
 |-------- > 10038
 |---- > Country
 |-------- > US
 |---- > Phone
 |-------- > +1.2122328280
 |---- > Email
 |-------- > 

[  Technical Contact Information:  ]
 |---- > Name
 |-------- > Sysadmin Team
 |---- > Organization
 |-------- > Stack Exchange, Inc.
 |---- > Address
 |-------- > 110 William St , Floor 28
 |---- > City
 |-------- > New York
 |---- > State / Province
 |-------- > NY
 |---- > Postal Code
 |-------- > 10038
 |---- > Country
 |-------- > US
 |---- > Phone
 |-------- > +1.2122328280
 |---- > Email
 |-------- > 
In [ ]:
 
In [ ]:
 



Done!
I think, This is Enough For Showing How to Use BeautifulSoup.

Have a nice day.

Thanks For Reading.

For More Update, Visit Our Blog Regularly. 
, Subscribe Our Blog, 
Follow Us and share it.
For Any Type of Suggestion, Help Or Question
Contact me:
S.S.B
surajsinghbisht054@gmail.com
or Comment Below

Share this

Related Posts

Previous
Next Post »