Extract who.is raw data using Python And beautifulSoup Module

hello friends,
   


                      Today, I am Gonna Show You How we can create Simple Python Script For Scraping Who.is Site Using BeautifulSoup Module.


In this Python script.

First We Will takes domain names from user.

and then, we will encode that provided domain name in whois website domain name url.

and after encoding, we will open whois website using python urllib2 modules and download all HTML Contents from websites.


after downloading all HTML codes, we will use python beautiful soup module for extracting all data from HTML codes.


here, i am sharing my python codes. i tried my best to create my example codes very easy to understand.

1. Whois Data Extractor Written In Python 


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/python
# ---------------- READ ME ---------------------------------------------
# This Script is Created Only For Practise And Educational Purpose Only
# This Script Is Created For https://bitforestinfo.blogspot.in
# This Script is Written By
__author__='''

######################################################
                By S.S.B Group                          
######################################################

    Suraj Singh
    Admin
    S.S.B Group
    surajsinghbisht054@gmail.com
    https://bitforestinfo.blogspot.in/

    Note: We Feel Proud To Be Indian
######################################################
'''
from bs4 import BeautifulSoup
import urllib2, sys

url="http://who.is/whois/"

if len(sys.argv)==1:
    print "[*] Please Provide Domain Name:\n Usages: python whois.py www.examplesite.com\n"
    sys.exit(0)
website=sys.argv[1]
print "[*] Please Wait.... Connecting To Who.is Server.."
htmldata=urllib2.urlopen(url+website).read()
class_name="rawWhois"  # Class For Extraction

try:
    import lxml
    parse=BeautifulSoup(htmldata,'lxml')
    print "[*] Using lxml Module For Fast Extraction"
except:
    parse=BeautifulSoup(htmldata, "html.parser")
    print "[*] Using built-in Html Parser [Slow Extraction. Please Wait ....]"

try:
    container=parse.findAll("div",{'class':class_name})
    sections=container[1:]
    for section in sections:
        extract=section.findAll('div')
        heading=extract[0].text
        print '\n[ ',heading,' ]'
        for i in extract[1].findAll('div'):
            fortab='\t|'
            for j in i.findAll('div'):
                fortab=fortab+'----'
                line=j.text.replace('\n', ' ')
                print fortab,'>', line
except Exception as e:
    print "[ Error ] ", e
    print "[ Last Update : 1 Jan 2017 ]"
    print "[ Script Is Not Updated ]"
    print "[ Sorry! ]"
     
    


Usages: python whois.py www.examplesite.com


For Downloading This Script Click Here

Done!

Thanks For Support
Please, Feel Free To Leave A Comment If Our Article has Helped You.
Written By:
S.S.B
surajsinghbisht054@gmail.com

Share this

Related Posts

Previous
Next Post »