Open a network object denoted by a URL for reading. If the URL does not have a scheme identifier, or if it has file: as its scheme identifier, this opens a local file (without universal newlines); otherwise it opens a socket to a server somewhere on the network. If the connection cannot be made the IOError exception is raised. If all went well, a file-like object is returned. This supports the following methods: read(), readline(), readlines(), fileno(), close(), info(), getcode() and geturl(). It also has proper support for theiterator protocol. One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.
Except for the info(), getcode() and geturl() methods, these methods have the same interface as for file objects — see section File Objects in this manual. (It is not a built-in file object, however, so it can’t be used at those few places where a true built-in file object is required.)
The info() method returns an instance of the class mimetools.Message containing meta-information associated with the URL. When the method is HTTP, these headers are those returned by the server at the head of the retrieved HTML page (including Content-Length and Content-Type). When the method is FTP, a Content-Length header will be present if (as is now usual) the server passed back a file length in response to the FTP retrieval request. A Content-Type header will be present if the MIME type can be guessed. When the method is local-file, returned headers will include a Date representing the file’s last-modified time, a Content-Length giving file size, and a Content-Type containing a guess at the file’s type. See also the description of the mimetools module.
The geturl() method returns the real URL of the page. In some cases, the HTTP server redirects a client to another URL. The urlopen()function handles this transparently, but in some cases the caller needs to know which URL the client was redirected to. The geturl()method can be used to get at this redirected URL.
The getcode() method returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.
If the url uses the http: scheme identifier, the optional data argument may be given to specify a POST request (normally the request type is GET). The data argument must be in standard application/x-www-form-urlencoded format; see the urlencode() function below.
The urlopen() function works transparently with proxies which do not require authentication. In a Unix or Windows environment, set the http_proxy, or ftp_proxy environment variables to a URL that identifies the proxy server before starting the Python interpreter. For example (the '%' is the command prompt):
% http_proxy="http://www.someproxy.com:3128"% export http_proxy% python...
The no_proxy environment variable can be used to specify hosts which shouldn’t be reached via proxy; if set, it should be a comma-separated list of hostname suffixes, optionally with :port appended, for example cern.ch,ncsa.uiuc.edu,some.host:8080.
In a Windows environment, if no proxy environment variables are set, proxy settings are obtained from the registry’s Internet Settings section.
In a Mac OS X environment, urlopen() will retrieve proxy information from the OS X System Configuration Framework, which can be managed with Network System Preferences panel.
Alternatively, the optional proxies argument may be used to explicitly specify proxies. It must be a dictionary mapping scheme names to proxy URLs, where an empty dictionary causes no proxies to be used, and None (the default value) causes environmental proxy settings to be used as discussed above. For example:
#Use http://www.someproxy.com:3128 for http proxying
proxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url, proxies=proxies)
# Don't use any proxies
filehandle = urllib.urlopen(some_url, proxies={})
# Use proxies from environment - both versions are equivalent
filehandle = urllib.urlopen(some_url, proxies=None)
filehandle = urllib.urlopen(some_url)
Proxies which require authentication for use are not currently supported; this is considered an implementation limitation.
The context parameter may be set to a ssl.SSLContext instance to configure the SSL settings that are used if urlopen() makes a HTTPS connection.
Changed in version 2.3: Added the proxies support.
Changed in version 2.6: Added getcode() to returned object and support for the no_proxy environment variable.
Changed in version 2.7.9: The context parameter was added.
Deprecated since version 2.6: The urlopen() function has been removed in Python 3 in favor of urllib2.urlopen().