Skip to main content

How To Scrap Html Forms Using Python Mechanize Module (Complete Mechanize Tutorial) - | python web scraping | python example - part 13

Hello Friends,

                           This is our 13th part of web scraping tutorials. and In this Tutorials, I am Going To Show You How To Use Python Mechanize Module. or You Can Say Today's Tutorials Is About How To Deal With HTML Forms like login form, Details Form etc.

Today's Tutorials really gonna very juice and very interesting because here, i am going to show you how to create web scraping on your own. its means you don't need to depend on other persons for creating web scraping for you.

Now, Let's Talk About Today's Topic.

In Today's Topic, we will cover

1. Some Previous Tutorials Stuff
2. Form Handling
3. Session Handling
4. Automation
5. Proxy Handling
 Etc.., Etc

so, don't skip any line or any content.
read carefully and try to understand these examples because i tried my best for creating these examples easy to understand and easy to remember.
and for future update follow us.
but first, if you are new visitor, then first check our index or For 12th Part Click Here

 so, let's start  

mechanize_manual slides
In [32]:
# I Collected Many Content From Overall Internet Sites and Some Personal Experience Also.
# So, Let's Start
# HEre, I am Using Ubuntu
# With Python 2.7 
# With Ipython notebook
# and Latest Version Of Mechanize
# Installation
# For Installing Mechanize
# Open Terminal:
# And Type:
#       $ python -m pip install mechanize
# So, Let's Start
import mechanize
In [7]:
# For Deep Knowledge
#   visit Here :
# Let's Start Our Tutorial
# Creating Cookie Jar
cj = mechanize.CookieJar()

# Or You Can Also Use 
# import cookielib
# cj=cookielib.LWPCookieJar()

# Create Browser Object
br = mechanize.Browser()

# Connect Cookie Jar

# Always Use User-Agent Because This Will Help You To Mask Your Bot Identity With Any Browser.  
# Set User-Agent 
br.addheaders=[('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.8.0')]

# Some More User-Agents List. You Can Use Anyone from this list
[('Mozilla/5.0 (Amiga; U; AmigaOS 1.3; en; rv: Gecko/20081204 SeaMonkey/1.1.14'), 
    ('Mozilla/5.0 (AmigaOS; U; AmigaOS 1.3; en-US; rv: Gecko/20090303 SeaMonkey/1.1.15'), 
    ('Mozilla/5.0 (AmigaOS; U; AmigaOS 1.3; en; rv: Gecko/20081204 SeaMonkey/1.1.14'), 
    ('Mozilla/5.0 (Android 2.2; Windows; U; Windows NT 6.1; en-US'), 
    ('AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4'), 
    ('Mozilla/5.0 (BeOS; U; BeOS BeBox; fr; rv:1.9) Gecko/2008052906 BonEcho/2.0')]

#  ------------------------------------------------------------------
#  ============[ Some Useful Browser options ]=======================
#  ------------------------------------------------------------------

# Set whether to treat HTML http-equiv headers like HTTP headers.

# Handle gzip transfer encoding.

# Set whether to handle HTTP 30x redirections.

# Set whether to add Referer header to each request.

# Set whether to observe rules from robots.txt.

# Set whether to handle HTTP Refresh headers.

# Work With Written Data

# Open Any Website. But I am Open My Own Blog. hehe
response ="")
/usr/local/lib/python2.7/dist-packages/ipykernel/ UserWarning: gzip transfer encoding is experimental!
In [2]:
# Get Html Page Title 
print br.title()

# For Current Url 
print response.geturl()

# get Html Source

# Or Try This Also
print br.response().read()
In [3]:
# Show the response headers

# or Directly
print br.response().info()
In [18]:
# Let's Try Some Other Things
# Here, I will Try To Search To Something Related To PYthon... hmm, anything
# But First, Check How Many Forms Are Available 
# try  this
for availabe_form in br.forms():
    # Form
    print availabe_form
    # Form Attributes (helpful in selecting form)
    print availabe_form.attrs

# Let's Check The Output
<GET application/x-www-form-urlencoded
  <SubmitControl(<None>=Search) (readonly)>>
{'action': '', 'class': 'gsc-search-box'}
<POST application/x-www-form-urlencoded
  <SubmitControl(<None>=Submit) (readonly)>
  <HiddenControl(uri=BitForest) (readonly)>
  <HiddenControl(loc=en_US) (readonly)>>
{'action': '', 'onsubmit': '"", "popupwindow", "scrollbars=yes,width=550,height=520"); return true', 'method': 'post', 'target': 'popupwindow'}
<POST application/x-www-form-urlencoded
  <SubmitControl(<None>=Submit) (readonly)>
  <HiddenControl(uri=BitForest) (readonly)>
  <HiddenControl(loc=en_US) (readonly)>>
{'action': '', 'onsubmit': '"", "popupwindow", "scrollbars=yes,width=550,height=520"); return true', 'method': 'post', 'target': 'popupwindow'}
<contact-form GET application/x-www-form-urlencoded
{'name': 'contact-form'}
In [19]:
# Select the first form 
br.select_form(nr=0)   # Easy Method
In [27]:
# wait.. wait 
# More Examples For Form selection
# br.select_form("form1")         # only works when form has a name
# br.form = list(br.forms())[0]   # use when form is unnamed
In [26]:
# Methods For Finding Form Controls
for control in br.form.controls:
    print control # Control Name
    print control.attrs # Control attributes

# Let's Check The Output
{'autocomplete': 'off', 'name': 'q', 'title': 'search', 'type': 'text', 'class': 'gsc-input', 'value': '', 'size': '10'}
<SubmitControl(<None>=Search) (readonly)>
{'type': 'submit', 'class': 'gsc-search-button', 'value': 'Search', 'title': 'search'}
In [30]:
# Let's search
br.form['q']='python'   # Value For Selected Input

# Clicking submit Button

# or 

# br.submit(name='Button_Name', label='button_label')

#print br.title()
In [ ]:
# Or Use Also Can Do This With Controls 
# HEre Controls Means like radio button, list box and many more

# Find Control Directly
control = br.form.find_control("control_name")

# Check if it's SelectControl
if control.type == "select":  
    print control.attrs

# Assign Value
br[] = ["Item_Name"]  

# or Try This

control.value = ["Any_Value_here"]

# Check Value
print control

# check if it's TextControl
if control.type == "text":  
    control.value = "enter your text here or value"
# Some More Configurations One By One
control.readonly = False
control.disabled = True

# Or directly All
for control in br.form.controls:
   if control.type == "submit":
       control.disabled = True

# Clicking submit Button

print br.title()
In [ ]:
# hooo,
# Wait, There Are More Features also
# so let's check them fastly
# Downloading Files
Downloaded_file = br.retrieve('Enter_File_downloading_url_address_here')[0]
Open_Downloaded_file = open(Downloaded_file)
# or
print Downloaded_file
In [ ]:
# If You Need To Click On Any Linked Text Then Try This:
# But For This, First Search That Text 

# And Then click the link
req = br.click_link(text='Weekend codes')

# Open Clicked Requested Link

# Already Explained Above
print br.response().read()
print br.geturl()

# Back

# Or 
# You Can Also Try This
word = None

for link in br.links():
    linkMatch = re.compile( 'GitHub' ).search( link.url )

    if linkMatch:
        word = br.follow_link( link )
content = word.get_data() # Get Inner Content
print content
In [ ]:
# If You Want To Use Proxy Then
# Proxy
# br.set_proxies({"http": ""})
# Proxy password
# br.add_proxy_password("joe", "password")
# Proxy and user/password
# br.set_proxies({"http": ""})
In [ ]:

For More Update, Visit Our Blog Regularly. 
, Subscribe Our Blog, 
Follow Us and share it
For Any Type of Suggestion, Help Or Question
Contact me:

or Comment Below

Related Post

Top Visited

Big List Of Google Dorks For Sqli Injection

List of Keyboard Shortcuts Keys for GNOME Desktop (Kali linux / Linux / Ubuntu/*nix )

how to install burp suite in Linux/Ubuntu 16.04

Create Simple Packet Sniffer Using Python

How to create Phishing Page Using Kali Linux | Webpage Page Cloning Using Kali Linux Social Engineering Toolkit

Best 1000 User-agents List For Web Scraping

How To Install GDB Peda?

2 Easiest Way To Enable Monitor Mode in Kali Linux | Airmon-ng | Iwconfig

Latest Google Dorks List

How To Create Snake Game Using Python And Tkinter - Simple python games