Using the Twisted Web Client

Overview

This document describes how to use the HTTP client included in TwistedWeb. After reading it, you should be able to make HTTP and HTTPSrequests using Twisted Web. You will be able to specify the requestmethod, headers, and body and you will be able to retrieve the responsecode, headers, and body.

A number of higher-level features are also explained, including proxying,automatic content encoding negotiation, and cookie handling.

Prerequisites

This document assumes that you are familiar with Deferreds and Failures , and producers and consumers .It also assumes you are familiar with the basic concepts of HTTP, suchas requests and responses, methods, headers, and message bodies. TheHTTPS section of this document also assumes you are somewhat familiar withSSL and have read about using SSL in Twisted .

The Agent

Issuing Requests

The twisted.web.client.Agent class is the entrypoint into the client API. Requests are issued using the request method, whichtakes as parameters a request method, a request URI, the request headers,and an object which can produce the request body (if there is to be one).The agent is responsible for connection setup. Because of this, itrequires a reactor as an argument to its initializer. An example ofcreating an agent and issuing a request using it might look like this:

request.py

from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

agent = Agent(reactor)

d = agent.request(
    'GET',
    'http://example.com/',
    Headers({'User-Agent': ['Twisted Web Client Example']}),
    None)

def cbResponse(ignored):
    print 'Response received'
d.addCallback(cbResponse)

def cbShutdown(ignored):
    reactor.stop()
d.addBoth(cbShutdown)

reactor.run()

As may be obvious, this issues a new GET request for /to the web server on example.com . Agent isresponsible for resolving the hostname into an IP address and connectingto it on port 80 (for HTTP URIs), port 443 (for HTTPSURIs), or on the port number specified in the URI itself. It is alsoresponsible for cleaning up the connection afterwards. This code sendsa request which includes one custom header, User-Agent . Thelast argument passed to Agent.request is None ,though, so the request has no body.

Sending a request which does include a body requires passing an objectproviding twisted.web.iweb.IBodyProducerto Agent.request . This interface extends the more generalIPushProducerby adding a new length attribute and adding severalconstraints to the way the producer and consumer interact.

  • The length attribute must be a non-negative integer or the constanttwisted.web.iweb.UNKNOWN_LENGTH . If the length is known,it will be used to specify the value for theContent-Length header in the request. If the length isunknown the attribute should be set to UNKNOWN_LENGTH .Since more servers support Content-Length , if a length can beprovided it should be.
  • An additional method is required on IBodyProducerimplementations: startProducing . This method is used toassociate a consumer with the producer. It should return aDeferred which fires when all data has been produced.
  • IBodyProducer implementations should never call theconsumer’s unregisterProducer method. Instead, when ithas produced all of the data it is going to produce, it should onlyfire the Deferred returned by startProducing .

For additional details about the requirements of IBodyProducer implementations, seethe API documentation.

Here’s a simple IBodyProducer implementation whichwrites an in-memory string to the consumer:

stringprod.py

from zope.interface import implements

from twisted.internet.defer import succeed
from twisted.web.iweb import IBodyProducer

class StringProducer(object):
    implements(IBodyProducer)

    def __init__(self, body):
        self.body = body
        self.length = len(body)

    def startProducing(self, consumer):
        consumer.write(self.body)
        return succeed(None)

    def pauseProducing(self):
        pass

    def stopProducing(self):
        pass

This producer can be used to issue a request with a body:

sendbody.py

from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

from stringprod import StringProducer

agent = Agent(reactor)
body = StringProducer("hello, world")
d = agent.request(
    'GET',
    'http://example.com/',
    Headers({'User-Agent': ['Twisted Web Client Example'],
             'Content-Type': ['text/x-greeting']}),
    body)

def cbResponse(ignored):
    print 'Response received'
d.addCallback(cbResponse)

def cbShutdown(ignored):
    reactor.stop()
d.addBoth(cbShutdown)

reactor.run()

If you want to upload a file or you just have some data in a string, youdon’t have to copy StringProducer though. Instead, you canuse FileBodyProducer .This IBodyProducer implementation works with any file-likeobject (so use it with a StringIO if your upload data isalready in memory as a string); the idea is the sameas StringProducer from the previous example, but with alittle extra code to only send data as fast as the server will take it.

filesendbody.py

from StringIO import StringIO

from twisted.internet import reactor
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

from twisted.web.client import FileBodyProducer

agent = Agent(reactor)
body = FileBodyProducer(StringIO("hello, world"))
d = agent.request(
    'GET',
    'http://example.com/',
    Headers({'User-Agent': ['Twisted Web Client Example'],
             'Content-Type': ['text/x-greeting']}),
    body)

def cbResponse(ignored):
    print 'Response received'
d.addCallback(cbResponse)

def cbShutdown(ignored):
    reactor.stop()
d.addBoth(cbShutdown)

reactor.run()

FileBodyProducer closes the file when it no longer needs it.

If the connection or the request take too much time, you can cancel theDeferred returned by the Agent.request method.This will abort the connection, and the Deferred will errbackwith CancelledError .

Receiving Responses

So far, the examples have demonstrated how to issue a request. However,they have ignored the response, except for showing that it is aDeferred which seems to fire when the response has beenreceived. Next we’ll cover what that response is and how to interpretit.

Agent.request , as with most Deferred -returningAPIs, can return a Deferred which fires with aFailure . If the request fails somehow, this will bereflected with a failure. This may be due to a problem looking up thehost IP address, or it may be because the HTTP server is not acceptingconnections, or it may be because of a problem parsing the response, orany other problem which arises which prevents the response from beingreceived. It does not include responses with an error status.

If the request succeeds, though, the Deferred will fire witha Response . Thishappens as soon as all the response headers have been received. Ithappens before any of the response body, if there is one, is processed.The Response object has several attributes giving theresponse information: its code, version, phrase, and headers, as well asthe length of the body to expect. In addition to these, theResponse also contains a reference to the request that it isa response to; one particularly useful attribute on the request is absoluteURI :The absolute URI to which the request was made. TheResponse object has a method which makes the response bodyavailable: deliverBody . Using theattributes of the response object and this method, here’s an examplewhich displays part of the response to a request:

response.py

from pprint import pformat

from twisted.internet import reactor
from twisted.internet.defer import Deferred
from twisted.internet.protocol import Protocol
from twisted.web.client import Agent
from twisted.web.http_headers import Headers

class BeginningPrinter(Protocol):
    def __init__(self, finished):
        self.finished = finished
        self.remaining = 1024 * 10

    def dataReceived(self, bytes):
        if self.remaining:
            display = bytes[:self.remaining]
            print 'Some data received:'
            print display
            self.remaining -= len(display)

    def connectionLost(self, reason):
        print 'Finished receiving body:', reason.getErrorMessage()
        self.finished.callback(None)

agent = Agent(reactor)
d = agent.request(
    'GET',
    'http://example.com/',
    Headers({'User-Agent': ['Twisted Web Client Example']}),
    None)

def cbRequest(response):
    print 'Response version:', response.version
    print 'Response code:', response.code
    print 'Response phrase:', response.phrase
    print 'Response headers:'
    print pformat(list(response.headers.getAllRawHeaders()))
    finished = Deferred()
    response.deliverBody(BeginningPrinter(finished))
    return finished
d.addCallback(cbRequest)

def cbShutdown(ignored):
    reactor.stop()
d.addBoth(cbShutdown)

reactor.run()

The BeginningPrinter protocol in this example is passed toResponse.deliverBody and the response body is then deliveredto its dataReceived method as it arrives. When the body hasbeen completely delivered, the protocol’s connectionLostmethod is called. It is important to inspect the Failurepassed to connectionLost . If the response body has beencompletely received, the failure will wrap a twisted.web.client.ResponseDone exception. Thisindicates that it is known that all data has been received. Itis also possible for the failure to wrap a twisted.web.http.PotentialDataLoss exception: thisindicates that the server framed the response such that there is no wayto know when the entire response body has been received. OnlyHTTP/1.0 servers should behave this way. Finally, it is possible forthe exception to be of another type, indicating guaranteed data loss forsome reason (a lost connection, a memory error, etc).

Just as protocols associated with a TCP connection are given a transport,so will be a protocol passed to deliverBody . Since it makesno sense to write more data to the connection at this stage of therequest, though, the transport only provides IPushProducer . This allows theprotocol to control the flow of the response data: a call to thetransport’s pauseProducing method will pause delivery; alater call to resumeProducing will resume it. If it isdecided that the rest of the response body is not desired,stopProducing can be used to stop delivery permanently;after this, the protocol’s connectionLost method will becalled.

An important thing to keep in mind is that the body will only be readfrom the connection after Response.deliverBody is called.This also means that the connection will remain open until this is done(and the body read). So, in general, any response with a bodymust have that body read using deliverBody . If theapplication is not interested in the body, it should issue aHEAD request or use a protocol which immediately callsstopProducing on its transport.

If the body of the response isn’t going to be consumed incrementally, then readBody can be used to get the body as a byte-string.This function returns a Deferred that fires with the body after the request has been completed.

responseBody.py

from sys import argv
from pprint import pformat

from twisted.internet.task import react
from twisted.web.client import Agent, readBody
from twisted.web.http_headers import Headers


def cbRequest(response):
    print 'Response version:', response.version
    print 'Response code:', response.code
    print 'Response phrase:', response.phrase
    print 'Response headers:'
    print pformat(list(response.headers.getAllRawHeaders()))
    d = readBody(response)
    d.addCallback(cbBody)
    return d

def cbBody(body):
    print 'Response body:'
    print body

def main(reactor, url=b"http://example.com/"):
    agent = Agent(reactor)
    d = agent.request(
        'GET', url,
        Headers({'User-Agent': ['Twisted Web Client Example']}),
        None)
    d.addCallback(cbRequest)
    return d

react(main, argv[1:])

HTTP over SSL

Everything you’ve read so far applies whether the scheme of the requestURI is HTTP or HTTPS . However, to control the SSLnegotiation performed when an HTTPS URI is requested, there’sone extra object to pay attention to: the SSL context factory.

Agent ‘s constructor takes an optional second argument, acontext factory. This is an object like the context factory describedin Using SSL in Twisted but hasone small difference. The getContext method of this factoryaccepts the address from the URL being requested. This allows it toreturn a context object which verifies that the server’s certificatematches the URL being requested.

Here’s an example which shows how to use Agent to requestan HTTPS URL with no certificate verification.

from twisted.python.log import err
from twisted.web.client import Agent
from twisted.internet import reactor
from twisted.internet.ssl import ClientContextFactory

class WebClientContextFactory(ClientContextFactory):
    def getContext(self, hostname, port):
        return ClientContextFactory.getContext(self)

def display(response):
    print "Received response"
    print response

def main():
    contextFactory = WebClientContextFactory()
    agent = Agent(reactor, contextFactory)
    d = agent.request("GET", "https://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()

The important point to notice here is that getContext nowaccepts two arguments, a hostname and a port number. These two arguments,a str and an int , give the address to which aconnection is being established to request an HTTPS URL. Because an agentmight make multiple requests over a single connection,getContext may not be called once for each request. A secondor later request for a URL with the same hostname as a previous requestmay re-use an existing connection, and therefore will re-use thepreviously returned context object.

To configure SSL options or enable certificate verification or hostnamechecking, provide a context factory which creates suitably configuredcontext objects.

HTTP Persistent Connection

HTTP persistent connections use the same TCP connection to send andreceive multiple HTTP requests/responses. This reduces latency and TCPconnection establishment overhead.

The constructor of twisted.web.client.Agenttakes an optional parameter pool, which should be an instanceof HTTPConnectionPool , which will be usedto manage the connections. If the pool is created with theparameter persistent set to True (thedefault), it will not close connections when the request is done, andinstead hold them in its cache to be re-used.

Here’s an example which sends requests over a persistent connection:

from twisted.internet import reactor
from twisted.internet.defer import Deferred, DeferredList
from twisted.internet.protocol import Protocol
from twisted.web.client import Agent, HTTPConnectionPool

class IgnoreBody(Protocol):
    def __init__(self, deferred):
        self.deferred = deferred

    def dataReceived(self, bytes):
        pass

    def connectionLost(self, reason):
        self.deferred.callback(None)


def cbRequest(response):
    print 'Response code:', response.code
    finished = Deferred()
    response.deliverBody(IgnoreBody(finished))
    return finished

pool = HTTPConnectionPool(reactor)
agent = Agent(reactor, pool=pool)

def requestGet(url):
    d = agent.request('GET', url)
    d.addCallback(cbRequest)
    return d

# Two requests to the same host:
d = requestGet('http://localhost:8080/foo').addCallback(
    lambda ign: requestGet("http://localhost:8080/bar"))
def cbShutdown(ignored):
    reactor.stop()
d.addCallback(cbShutdown)

reactor.run()

Here, the two requests are to the same host, one after the eachother. In most cases, the same connection will be used for the secondrequest, instead of two different connections when using anon-persistent pool.

Multiple Connections to the Same Server

twisted.web.client.HTTPConnectionPool instanceshave an attributecalled maxPersistentPerHost which limits thenumber of cached persistent connections to the same server. The defaultvalue is 2. This is effective only when the persistent option isTrue. You can change the value like bellow:

from twisted.web.client import HTTPConnectionPool

pool = HTTPConnectionPool(reactor, persistent=True)
pool.maxPersistentPerHost = 1

With the default value of 2, the pool keeps around two connections tothe same host at most. Eventually the cached persistent connections willbe closed, by default after 240 seconds; you can change this timeoutvalue with the cachedConnectionTimeoutattribute of the pool. To force all connections to close usethe closeCachedConnectionsmethod.

Automatic Retries

If a request fails without getting a response, and the request issomething that hopefully can be retried without having any side-effects(e.g. a request with method GET), it will be retried automatically whensending a request over a previously-cached persistent connection. You candisable this behavior by setting retryAutomaticallyto False . Note that each request will only be retriedonce.

Following redirects

By itself, Agent doesn’t follow HTTP redirects (responseswith 301, 302, 303, 307 status codes and a location headerfield). You need to use the twisted.web.client.RedirectAgent class to do so. Itimplements a rather strict behavior of the RFC, meaning it will redirect301 and 302 as 307, only on GET and HEADrequests.

The following example shows how to have a redirect-enabled agent.

from twisted.python.log import err
from twisted.web.client import Agent, RedirectAgent
from twisted.internet import reactor

def display(response):
    print "Received response"
    print response

def main():
    agent = RedirectAgent(Agent(reactor))
    d = agent.request("GET", "http://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()

In contrast, twisted.web.client.BrowserLikeRedirectAgent implementsmore lenient behaviour that closely emulates what web browsers do; inother words 301 and 302 POST redirects are treated like 303,meaning the method is changed to GET before making the redirectrequest.

As mentioned previously, Response contains a reference to boththe request that it is a responseto, and the previously received response , accessible by previousResponse .In most cases there will not be a previous response, but in the case ofRedirectAgent the response history can be obtained byfollowing the previous responses from response to response.

Using a HTTP proxy

To be able to use HTTP proxies with an agent, you can use the twisted.web.client.ProxyAgent class. It supports thesame interface as Agent , but takes the endpoint of the proxyas initializer argument.

Here’s an example demonstrating the use of an HTTP proxy running onlocalhost:8000.

from twisted.python.log import err
from twisted.web.client import ProxyAgent
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ClientEndpoint

def display(response):
    print "Received response"
    print response

def main():
    endpoint = TCP4ClientEndpoint(reactor, "localhost", 8000)
    agent = ProxyAgent(endpoint)
    d = agent.request("GET", "https://example.com/")
    d.addCallbacks(display, err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()

Please refer to the endpoints documentation formore information about how they work and the twisted.internet.endpoints API documentation to learnwhat other kinds of endpoints exist.

Handling HTTP cookies

An existing agent instance can be wrapped withtwisted.web.client.CookieAgent to automaticallystore, send and track HTTP cookies. A CookieJarinstance, from the Python standard library modulecookielib , isused to store the cookie information. An example of usingCookieAgent to perform a request and display the collectedcookies might look like this:

cookies.py

from cookielib import CookieJar

from twisted.internet import reactor
from twisted.python import log
from twisted.web.client import Agent, CookieAgent

def displayCookies(response, cookieJar):
    print 'Received response'
    print response
    print 'Cookies:', len(cookieJar)
    for cookie in cookieJar:
        print cookie

def main():
    cookieJar = CookieJar()
    agent = CookieAgent(Agent(reactor), cookieJar)

    d = agent.request('GET', 'http://www.google.com/')
    d.addCallback(displayCookies, cookieJar)
    d.addErrback(log.err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()

Automatic Content Encoding Negotiation

twisted.web.client.ContentDecoderAgent addssupport for sending Accept-Encoding request headers andinterpreting Content-Encoding response headers. These headersallow the server to encode the response body somehow, typically with somecompression scheme to save on transfercosts. ContentDecoderAgent provides this functionality as awrapper around an existing agent instance. Together with one or moredecoder objects (such astwisted.web.client.GzipDecoder ), this wrapperautomatically negotiates an encoding to use and decodes the response bodyaccordingly. To application code using such an agent, there is no visibledifference in the data delivered.

gzipdecoder.py

from twisted.python import log
from twisted.internet import reactor
from twisted.internet.defer import Deferred
from twisted.internet.protocol import Protocol
from twisted.web.client import Agent, ContentDecoderAgent, GzipDecoder

class BeginningPrinter(Protocol):
    def __init__(self, finished):
        self.finished = finished
        self.remaining = 1024 * 10


    def dataReceived(self, bytes):
        if self.remaining:
            display = bytes[:self.remaining]
            print 'Some data received:'
            print display
            self.remaining -= len(display)


    def connectionLost(self, reason):
        print 'Finished receiving body:', reason.type, reason.value
        self.finished.callback(None)



def printBody(response):
    finished = Deferred()
    response.deliverBody(BeginningPrinter(finished))
    return finished


def main():
    agent = ContentDecoderAgent(Agent(reactor), [('gzip', GzipDecoder)])

    d = agent.request('GET', 'http://www.yahoo.com/')
    d.addCallback(printBody)
    d.addErrback(log.err)
    d.addCallback(lambda ignored: reactor.stop())
    reactor.run()

if __name__ == "__main__":
    main()

Implementing support for new content encodings is as simple as writing anew class like GzipDecoder that can decode a response usingthe new encoding. As there are not many content encodings in widespreaduse, gzip is the only encoding supported by Twisted itself.

Conclusion

You should now understand the basics of the Twisted Web HTTP client. Inparticular, you should understand:

  • How to issue requests with arbitrary methods, headers, and bodies.
  • How to access the response version, code, phrase, headers, and body.
  • How to store, send, and track cookies.
  • How to control the streaming of the response body.
  • How to enable the HTTP persistent connection, and control thenumber of connections.


https://twistedmatrix.com/documents/current/web/howto/client.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值