Here, we present a number of examples on how to use
urlgrabber. I (Michael) feel strongly that software should
work in its most basic form right out of the box with little
knowledge or expertise. Then, if you need to do more exotic
things, you should be able to do that, too, although perhaps
with a little
reading. With that in mind, lets consider how you might use
urlgrabber at first, and then you might take advantage of some of its
features as needed.
Basic use
In its simplest form, urlgrabber can be a replacement for
urllib2's open, or even python's file if you're just reading.
from urlgrabber import urlopen
fo = urlopen(url)
data = fo.read()
fo.close()
Here, the url can be http, https, ftp, or file. It's also pretty
smart so if you just give it something like /tmp/foo or
C:\bar, it will figure it out.
For even more fun, you can also do:
from urlgrabber import urlopen
local_filename = urlgrab(url) # grab a local copy of the file
data = urlread(url) # just read the data into a string
Now, like urllib2, what's really happening here is that you're using a
module-level object (called a grabber) that kindof serves as a
default. That's just fine, but you might want to get your own private
version for a couple of reasons.
it's a little ugly to modify the default grabber because you have to reach into the module to do it
you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior
Therefore, you're probably better off making your own. This also
gives you lots of flexibility for later, as you'll see.
from urlgrabber.grabber import URLGrabber
g = URLGrabber()
data = g.urlread(url)
This is nice because you can specify options when you create the
grabber. For example, lets turn on simple reget mode so that if we
have part of a file, we only need to fetch the rest.
from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url)
The available options are listed in the module
documentation, and can usually be specified as a default at the
grabber-level or as options to the method.
from urlgrabber.grabber import URLGrabber
g = URLGrabber(reget='simple')
local_filename = g.urlgrab(url, filename=None, reget=None)