最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【整理】Python中用于解析Http数据包的模块/库

Python crifan 7412浏览 0评论

对于常见的http的数据包,比如

用Firebug抓取出来访问

http://www.songtaste.com/music/

时的:

http request请求头信息:

GET /music/ HTTP/1.1
Host: www.songtaste.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: valid=1; valid=1; __utma=148846773.1834243366.1349920257.1355474419.1355710389.96; __utmz=148846773.1354861662.90.30.utmcsr=songtaste.com|utmccn=(referral)|utmcmd=referral|utmcct=/album/597900; pgv_pvi=6270354432; bdshare_firstime=1353551079520; CookName=crifan; CookID=351979; CookPwd=6dd0c41612d2532ec3647e8e72fcb997; CookIcon=351979.gif; CookDmID=33897180; PHPSESSID=063cb9fa1dba368386f0f69842c8f3d7; __utmb=148846773.1.10.1355710389; __utmc=148846773

http response响应头信息:

HTTP/1.1 200 OK
Via: 1.1 SC-SZ-06
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Length: 18574
Age: 0
Date: Mon, 17 Dec 2012 02:13:15 GMT
Content-Type: text/html
Accept-Ranges: bytes
X-Powered-By: PHP/5.2.13
Pragma: no-cache
srvtag: CAIYUN-SR023
X-Cache: MISS via VCache YQ146

 

对于这样的原始的http数据包,想要通过Python去解析的话,有哪些可用的库或模块。

去找了下,Python中的解析http数据包的库或模块有以下一些:

 

httpheader Python module

http://deron.meranda.us/python/httpheader/

源码下载:

httpheader.py

演示如何使用:

Using HTTP range requests

 

WebOb

http://webob.org/

下载页面:

http://pypi.python.org/pypi/WebOb/1.2.3

 

利用标准库中的BaseHTTPRequestHandler和StringIO

参考:

Parse raw HTTP Headers

有,利用标准库中的BaseHTTPRequestHandler和StringIO,演示代码:

from BaseHTTPServer import BaseHTTPRequestHandler
from StringIO import StringIO

class HTTPRequest(BaseHTTPRequestHandler):
    def __init__(self, request_text):
        self.rfile = StringIO(request_text)
        self.raw_requestline = self.rfile.readline()
        self.error_code = self.error_message = None
        self.parse_request()

    def send_error(self, code, message):
        self.error_code = code
        self.error_message = message

用法:

# Using this new class is really easy!

request = HTTPRequest(request_text)

print request.error_code       # None  (check this first)
print request.command          # "GET"
print request.path             # "/who/ken/trust.html"
print request.request_version  # "HTTP/1.1"
print len(request.headers)     # 3
print request.headers.keys()   # ['accept-charset', 'host', 'accept']
print request.headers['host']  # "cm.bell-labs.com"

# Parsing can result in an error code and message

request = HTTPRequest('GET\r\nHeader: Value\r\n\r\n')

print request.error_code     # 400
print request.error_message  # "Bad request syntax ('GET')"

转载请注明:在路上 » 【整理】Python中用于解析Http数据包的模块/库

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.422 seconds, using 22.07MB memory