Extract who.is raw data using Python And beautifulSoup Module

Posted by Suraj Singh on January 14, 2017 · 7 mins read
hello readers,
   


                      Today, I am Gonna Show You How we can create Simple Python Script For Scraping Who.is Site Using BeautifulSoup Module.


In this Python script.

First We Will takes domain names from user.

and then, we will encode that provided domain name in whois website domain name url.

and after encoding, we will open whois website using python urllib2 modules and download all HTML Contents from websites.


after downloading all HTML codes, we will use python beautiful soup module for extracting all data from HTML codes.


here, i am sharing my python codes. i tried my best to create my example codes very easy to understand.

1. Whois Data Extractor Written In Python 


 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/python
# ---------------- READ ME ---------------------------------------------
# This Script is Created Only For Practise And Educational Purpose Only
# This Script Is Created For https://www.bitforestinfo.com
# This Script is Written By
__author__='''

######################################################
By
######################################################

Suraj Singh


surajsinghbisht054@gmail.com
https://www.bitforestinfo.com/


######################################################
'''
from bs4 import BeautifulSoup
import urllib2, sys

url="http://who.is/whois/"

if len(sys.argv)==1:
print "[*] Please Provide Domain Name:\n Usages: python whois.py www.examplesite.com\n"
sys.exit(0)
website=sys.argv[1]
print "[*] Please Wait.... Connecting To Who.is Server.."
htmldata=urllib2.urlopen(url+website).read()
class_name="rawWhois" # Class For Extraction

try:
import lxml
parse=BeautifulSoup(htmldata,'lxml')
print "[*] Using lxml Module For Fast Extraction"
except:
parse=BeautifulSoup(htmldata, "html.parser")
print "[*] Using built-in Html Parser [Slow Extraction. Please Wait ....]"

try:
container=parse.findAll("div",{'class':class_name})
sections=container[1:]
for section in sections:
extract=section.findAll('div')
heading=extract[0].text
print '\n[ ',heading,' ]'
for i in extract[1].findAll('div'):
fortab='\t|'
for j in i.findAll('div'):
fortab=fortab+'----'
line=j.text.replace('\n', ' ')
print fortab,'>', line
except Exception as e:
print "[ Error ] ", e
print "[ Last Update : 1 Jan 2017 ]"
print "[ Script Is Not Updated ]"
print "[ Sorry! ]"




Usages: python whois.py www.examplesite.com


For Downloading This Script Click Here

Done!

Thanks For Support
Please, Feel Free To Leave A Comment If Our Article has Helped You.
Suraj
surajsinghbisht054@gmail.com