How To Create Google Search Scrapper Using Python - | python web scraping | python example- part 6

Hello Friends,


                           This is our sixth part of web scraping tutorials. and in this tutorial, I am gonna to show you how to create google search result scrapper using python . but if you are new visitor then first check our index or For fifth Part Click Here






So, Let;s Talk About Today's Topic. 

Friends, 

                  Here We will try to create python script that can provide google search results in list form. 
So, In Today's topic, we will learn about how to create a urllib2.build_opener  that can handle

1. HTTP Redirect Function2. Header (like User-agent)3. Cookies 

etc. etc

And Friends, To Create This Script More Easy To Understand And More knowledgeable . I am only using built-in modules.
because with these built-in modules we can understand internal functions more easily and more deeply.

So, Let's Start.

Here, I am Sharing My Demo Codes But If You Want More Better Example Then, You Can Modify these codes yourself or Download This Script From My GitHub repository (link given at the end of these codes ).

1. google_search.py



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Import Modules
import urllib2
import urllib
import cookielib
import re

# Google Url
google ='https://www.google.com/search?'

# Search Query
Query = "BitForestInfo"

# Set User Agent
header = [('User-Agent','Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.8.0')]

# Create Cookie Handler
cj = cookielib.CookieJar()

# Create Url Handler 
url_opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj),  # Connect Cookie Jar
                                  urllib2.HTTPRedirectHandler())    # Address Redirect Handling Function

# Connect Header With Opener
url_opener.addheaders=header

# Encode Query With Url 
query = google + urllib.urlencode({'q':Query})

# Now Open Google Search Page
html = url_opener.open(query)

# Collect Html Code
codes = html.read()

# Compile Pattern
pattern = re.compile('<h3(.*?)</h3')

# list For Collecting results
collect_result = []

# Find Matches
for i in pattern.findall(codes):
    result = re.search('href=.(.*)..(onmousedown).+(>)([^><]+)(<)',i).groups()
    print result[0],result[3]


Warning : I am Creating This Tutorial Only For Practise and Educational Purpose. I will not Take any type of responsibility about any illegal activities.

For Downloading, Raw Script Click Here



For More Update, Visit Our Regularly. 
And Subscribe Our Blog, 

Follow Us and share it.
For Any Type of Suggestion Or Help
Contact me:
S.S.B
surajsinghbisht054@gmail.com

Share this

Related Posts

Previous
Next Post »