How To Download All Images Of Html Page Using Python, re, urllib2 module - | python web scraping | python example - part 4

Hello Friends,

                           Today, This is my fourth part of web scraping tutorials. and in this tutorial, i am gonna to show you how to create a simple webpage image downloading script using python re and urllib2 module. For Third Part Click Here. Here, In This Script, I am Using urllib2 for downloading html data and re for extracting image source link. and then again urllib2 for downloading images in local drive.





hmm, if you are new visitor then don't forget to check our blog index.   

SO, let's start with some basic knowledge of today's tutorial:


here, 

for this script first we will download provide website html data.

then, we will use python regular expression for extracting usable links

after collecting all downloading links of image we will use another function for downloading all link files in a folder.

very simple! don't worry.

here, i am sharing my code.

1. Python Webpage Image Downloader



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#!/usr/bin/python
# ---------------- READ ME ---------------------------------------------
# This Script is Created Only For Practise And Educational Purpose Only
# This Script Is Created For https://bitforestinfo.blogspot.in
# This Script is Written By
__author__='''

######################################################
                By S.S.B Group                          
######################################################

    Suraj Singh
    Admin
    S.S.B Group
    surajsinghbisht054@gmail.com
    https://bitforestinfo.blogspot.in/

    Note: We Feel Proud To Be Indian
######################################################
'''

# Import Module
import urllib2
import sys
import re
import os

if len(sys.argv)==1:
 print "[*] Please Provide Domain Name:\n Usages: python img_webpage.py www.examplesite.com\n"
 sys.exit(0)

# Retrieve Html Data From Url
def get_html(url):
 try:
   page = urllib2.urlopen(url).read()
 except Exception as e:
  print "[Error Found] ",e
  page=None
 return page

html_data=get_html(sys.argv[1])

# Verifying Html Data
if not html_data:
 exit(0)

# Regular Expression
pattern = re.compile('<img .*?>')
image_link=[]
for i in pattern.findall(html_data):
    i=i[i.find('src'):-2]
    img=i.split(' ')[0]
    if 'http' in img[4:10]:
        image_link.append(img)
        
# Downloading Image
def image_download(link):
    img=open(os.path.basename(link),'wb')   
    data=urllib2.urlopen(link)
    img.write(data.read())
    img.close()
    return

for i in image_link:
    print str(i[4:])
    image_download(i[5:-1])

as you can see, my examples codes are very easy to under stand.

if you want to download latest examples then,  Click Here

In Our Next Tutorial,  


                        we will learn about how to create a complete website crawler.
Believe me, this journey is going to be very interesting. because in future tutorials, 
you will see something really more interesting scripts and solutions.

For More Update, Visit Our Regularly. 
And Subscribe Our Blog, 
Follow Us and share it.
For Any Type of Suggestion Or Help
Contact me:
S.S.B
surajsinghbisht054@gmail.com

Share this

Related Posts

Previous
Next Post »