Manual Section... (1) - page: urlgrabber
NAMEurlgrabber - a high-level cross-protocol url-grabber.
SYNOPSISurlgrabber [OPTIONS] URL [FILE]
DESCRIPTIONurlgrabber is a binary program and python module for fetching files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features.
- help page specifying available options to the binary program.
- ignored except for file:// urls, in which case it specifies whether urlgrab should still make a copy of the file, or simply point to the existing copy.
- if it's an int, it's the bytes/second throttle limit. If it's a float, it is first multiplied by bandwidth. If throttle == 0, throttling is disabled. If None, the module-level default (which can be set with set_throttle) is used.
- the nominal max bandwidth in bytes/second. If throttle is a float and bandwidth == 0, throttling is disabled. If None, the module-level default (which can be set with set_bandwidth) is used.
- a tuple of the form first_byte,last_byte describing a byte range to retrieve. Either or both of the values may be specified. If first_byte is None, byte offset 0 is assumed. If last_byte is None, the last byte available is assumed. Note that both first and last_byte values are inclusive so a range of (10,11) would return the 10th and 11th bytes of the resource.
- the user-agent string provide if the url is HTTP.
- the number of times to retry the grab before bailing. If this is zero, it will retry forever. This was intentional... really, it was :). If this value is not supplied or is supplied but is None retrying does not occur.
- a sequence of errorcodes (values of e.errno) for which it should retry. See the doc on URLGrabError for more details on this. retrycodes defaults to -1,2,4,5,6,7 if not specified explicitly.
MODULE USE EXAMPLESIn its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's file if you're just reading:
from urlgrabber import urlopen fo = urlopen(url) data = fo.read() fo.close()
from urlgrabber import urlopen local_filename = urlgrab(url) # grab a local copy of the file data = urlread(url) # just read the data into a string
* it's a little ugly to modify the default grabber because you have to reach into the module to do it * you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior
from urlgrabber.grabber import URLGrabber g = URLGrabber() data = g.urlread(url)
from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url)
from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url, filename=None, reget=None)
AUTHORSWritten by: Michael D. Stenner <firstname.lastname@example.org> Ryan Tomayko <email@example.com>
This manual page was written by Kevin Coyner <firstname.lastname@example.org> for the Debian system (but may be used by others). It borrows heavily on the documentation included in the urlgrabber module. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation.
RESOURCESMain web site: http://linux.duke.edu/projects/urlgrabber/
This document was created by man2html, using the manual pages.
Time: 15:26:21 GMT, June 11, 2010