hello readers,
Today, I am Gonna Show You How we can create Simple Python Script For Scraping Who.is Site Using BeautifulSoup Module.
In this Python script.
First We Will takes domain names from user.
and then, we will encode that provided domain name in whois website domain name url.
and after encoding, we will open whois website using python urllib2 modules and download all HTML Contents from websites.
after downloading all HTML codes, we will use python beautiful soup module for extracting all data from HTML codes.
here, i am sharing my python codes. i tried my best to create my example codes very easy to understand.
1. Whois Data Extractor Written In Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | #!/usr/bin/python # ---------------- READ ME --------------------------------------------- # This Script is Created Only For Practise And Educational Purpose Only # This Script Is Created For https://www.bitforestinfo.com # This Script is Written By __author__='''
###################################################### By ######################################################
Suraj Singh surajsinghbisht054@gmail.com https://www.bitforestinfo.com/
###################################################### ''' from bs4 import BeautifulSoup import urllib2, sys
url="http://who.is/whois/"
if len(sys.argv)==1: print "[*] Please Provide Domain Name:\n Usages: python whois.py www.examplesite.com\n" sys.exit(0) website=sys.argv[1] print "[*] Please Wait.... Connecting To Who.is Server.." htmldata=urllib2.urlopen(url+website).read() class_name="rawWhois" # Class For Extraction
try: import lxml parse=BeautifulSoup(htmldata,'lxml') print "[*] Using lxml Module For Fast Extraction" except: parse=BeautifulSoup(htmldata, "html.parser") print "[*] Using built-in Html Parser [Slow Extraction. Please Wait ....]"
try: container=parse.findAll("div",{'class':class_name}) sections=container[1:] for section in sections: extract=section.findAll('div') heading=extract[0].text print '\n[ ',heading,' ]' for i in extract[1].findAll('div'): fortab='\t|' for j in i.findAll('div'): fortab=fortab+'----' line=j.text.replace('\n', ' ') print fortab,'>', line except Exception as e: print "[ Error ] ", e print "[ Last Update : 1 Jan 2017 ]" print "[ Script Is Not Updated ]" print "[ Sorry! ]"
|
Usages: python whois.py www.examplesite.com
For Downloading This Script Click Here
Done!Please, Feel Free To Leave A Comment If Our Article has Helped You.
Suraj
surajsinghbisht054@gmail.com