在使用请求模块时,是否有办法打印原始HTTP请求?

我不仅仅需要标题,我还需要请求行、标题和内容打印输出。是否有可能看到最终从HTTP请求构造什么?


当前回答

注意:这个答案已经过时了。请求的新版本支持直接获取请求内容,如AntonioHerraizS的回答文档。

从请求中获取请求的真实原始内容是不可能的,因为它只处理更高级别的对象,比如头和方法类型。Requests使用urllib3发送请求,但是urllib3也不处理原始数据——它使用httplib。下面是一个典型的请求堆栈跟踪:

-> r= requests.get("http://google.com")
  /usr/local/lib/python2.7/dist-packages/requests/api.py(55)get()
-> return request('get', url, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/api.py(44)request()
-> return session.request(method=method, url=url, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/sessions.py(382)request()
-> resp = self.send(prep, **send_kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/sessions.py(485)send()
-> r = adapter.send(request, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/adapters.py(324)send()
-> timeout=timeout
  /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(478)urlopen()
-> body=body, headers=headers)
  /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(285)_make_request()
-> conn.request(method, url, **httplib_request_kw)
  /usr/lib/python2.7/httplib.py(958)request()
-> self._send_request(method, url, body, headers)

在httplib机制内部,我们可以看到HTTPConnection。_send_request间接使用HTTPConnection。_send_output,它最终创建原始请求和主体(如果存在的话),并使用HTTPConnection。分别发送。发送最终到达套接字。

由于没有钩子来做你想做的事情,作为最后的手段,你可以monkey patch httplib来获取内容。这是一个脆弱的解决方案,如果更改了httplib,您可能需要对其进行调整。如果你打算使用这个解决方案来分发软件,你可能会考虑打包httplib而不是使用系统的,这很简单,因为它是一个纯python模块。

唉,话不多说,解决方案:

import requests
import httplib

def patch_send():
    old_send= httplib.HTTPConnection.send
    def new_send( self, data ):
        print data
        return old_send(self, data) #return is not necessary, but never hurts, in case the library is changed
    httplib.HTTPConnection.send= new_send

patch_send()
requests.get("http://www.python.org")

它产生输出:

GET / HTTP/1.1
Host: www.python.org
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.1.0 CPython/2.7.3 Linux/3.2.0-23-generic-pae

其他回答

更好的想法是使用requests_toolbelt库,它可以将请求和响应都转储为字符串,以便打印到控制台。它处理了上面的解决方案不能很好处理的文件和编码的所有棘手情况。

其实很简单:

import requests
from requests_toolbelt.utils import dump

resp = requests.get('https://httpbin.org/redirect/5')
data = dump.dump_all(resp)
print(data.decode('utf-8'))

来源:https://toolbelt.readthedocs.org/en/latest/dumputils.html

你可以通过输入:

pip install requests_toolbelt

注意:这个答案已经过时了。请求的新版本支持直接获取请求内容,如AntonioHerraizS的回答文档。

从请求中获取请求的真实原始内容是不可能的,因为它只处理更高级别的对象,比如头和方法类型。Requests使用urllib3发送请求,但是urllib3也不处理原始数据——它使用httplib。下面是一个典型的请求堆栈跟踪:

-> r= requests.get("http://google.com")
  /usr/local/lib/python2.7/dist-packages/requests/api.py(55)get()
-> return request('get', url, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/api.py(44)request()
-> return session.request(method=method, url=url, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/sessions.py(382)request()
-> resp = self.send(prep, **send_kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/sessions.py(485)send()
-> r = adapter.send(request, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/adapters.py(324)send()
-> timeout=timeout
  /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(478)urlopen()
-> body=body, headers=headers)
  /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(285)_make_request()
-> conn.request(method, url, **httplib_request_kw)
  /usr/lib/python2.7/httplib.py(958)request()
-> self._send_request(method, url, body, headers)

在httplib机制内部,我们可以看到HTTPConnection。_send_request间接使用HTTPConnection。_send_output,它最终创建原始请求和主体(如果存在的话),并使用HTTPConnection。分别发送。发送最终到达套接字。

由于没有钩子来做你想做的事情,作为最后的手段,你可以monkey patch httplib来获取内容。这是一个脆弱的解决方案,如果更改了httplib,您可能需要对其进行调整。如果你打算使用这个解决方案来分发软件,你可能会考虑打包httplib而不是使用系统的,这很简单,因为它是一个纯python模块。

唉,话不多说,解决方案:

import requests
import httplib

def patch_send():
    old_send= httplib.HTTPConnection.send
    def new_send( self, data ):
        print data
        return old_send(self, data) #return is not necessary, but never hurts, in case the library is changed
    httplib.HTTPConnection.send= new_send

patch_send()
requests.get("http://www.python.org")

它产生输出:

GET / HTTP/1.1
Host: www.python.org
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.1.0 CPython/2.7.3 Linux/3.2.0-23-generic-pae

请求支持所谓的事件钩子(从2.23开始,实际上只有响应钩子)。钩子可以在请求上使用,打印完整的请求-响应对的数据,包括有效的URL,标题和主体,例如:

import textwrap
import requests

def print_roundtrip(response, *args, **kwargs):
    format_headers = lambda d: '\n'.join(f'{k}: {v}' for k, v in d.items())
    print(textwrap.dedent('''
        ---------------- request ----------------
        {req.method} {req.url}
        {reqhdrs}

        {req.body}
        ---------------- response ----------------
        {res.status_code} {res.reason} {res.url}
        {reshdrs}

        {res.text}
    ''').format(
        req=response.request, 
        res=response, 
        reqhdrs=format_headers(response.request.headers), 
        reshdrs=format_headers(response.headers), 
    ))

requests.get('https://httpbin.org/', hooks={'response': print_roundtrip})

运行它输出:

---------------- request ----------------
GET https://httpbin.org/
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/
Date: Thu, 14 May 2020 17:16:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 9593
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

<!DOCTYPE html>
<html lang="en">
...
</html>

如果响应是二进制的,您可能希望将res.text更改为res.content。

import requests

response = requests.post('http://httpbin.org/post', data={'key1': 'value1'})
print(response.request.url)
print(response.request.body)
print(response.request.headers)

响应对象有一个.request属性,它是被发送的PreparedRequest对象。

test_print.py内容:

import logging
import pytest
import requests
from requests_toolbelt.utils import dump


def print_raw_http(response):
    data = dump.dump_all(response, request_prefix=b'', response_prefix=b'')
    return '\n' * 2 + data.decode('utf-8')

@pytest.fixture
def logger():
    log = logging.getLogger()
    log.addHandler(logging.StreamHandler())
    log.setLevel(logging.DEBUG)
    return log

def test_print_response(logger):
    session = requests.Session()
    response = session.get('http://127.0.0.1:5000/')
    assert response.status_code == 300, logger.warning(print_raw_http(response))

hello.py内容:

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

Run:

 $ python -m flask hello.py
 $ python -m pytest test_print.py

Stdout:

------------------------------ Captured log call ------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:225 Starting new HTTP connection (1): 127.0.0.1:5000
DEBUG    urllib3.connectionpool:connectionpool.py:437 http://127.0.0.1:5000 "GET / HTTP/1.1" 200 13
WARNING  root:test_print_raw_response.py:25 

GET / HTTP/1.1
Host: 127.0.0.1:5000
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive


HTTP/1.0 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 13
Server: Werkzeug/1.0.1 Python/3.6.8
Date: Thu, 24 Sep 2020 21:00:54 GMT

Hello, World!