对于常见的http的数据包,比如
用Firebug抓取出来访问
http://www.songtaste.com/music/
时的:
http request请求头信息:
GET /music/ HTTP/1.1 Host: www.songtaste.com User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Connection: keep-alive Cookie: valid=1; valid=1; __utma=148846773.1834243366.1349920257.1355474419.1355710389.96; __utmz=148846773.1354861662.90.30.utmcsr=songtaste.com|utmccn=(referral)|utmcmd=referral|utmcct=/album/597900; pgv_pvi=6270354432; bdshare_firstime=1353551079520; CookName=crifan; CookID=351979; CookPwd=6dd0c41612d2532ec3647e8e72fcb997; CookIcon=351979.gif; CookDmID=33897180; PHPSESSID=063cb9fa1dba368386f0f69842c8f3d7; __utmb=148846773.1.10.1355710389; __utmc=148846773
http response响应头信息:
HTTP/1.1 200 OK Via: 1.1 SC-SZ-06 Connection: Keep-Alive Proxy-Connection: Keep-Alive Content-Length: 18574 Age: 0 Date: Mon, 17 Dec 2012 02:13:15 GMT Content-Type: text/html Accept-Ranges: bytes X-Powered-By: PHP/5.2.13 Pragma: no-cache srvtag: CAIYUN-SR023 X-Cache: MISS via VCache YQ146
对于这样的原始的http数据包,想要通过Python去解析的话,有哪些可用的库或模块。
去找了下,Python中的解析http数据包的库或模块有以下一些:
httpheader Python module
http://deron.meranda.us/python/httpheader/
源码下载:
演示如何使用:
WebOb
下载页面:
http://pypi.python.org/pypi/WebOb/1.2.3
利用标准库中的BaseHTTPRequestHandler和StringIO
参考:
有,利用标准库中的BaseHTTPRequestHandler和StringIO,演示代码:
from BaseHTTPServer import BaseHTTPRequestHandler from StringIO import StringIO class HTTPRequest(BaseHTTPRequestHandler): def __init__(self, request_text): self.rfile = StringIO(request_text) self.raw_requestline = self.rfile.readline() self.error_code = self.error_message = None self.parse_request() def send_error(self, code, message): self.error_code = code self.error_message = message
用法:
# Using this new class is really easy! request = HTTPRequest(request_text) print request.error_code # None (check this first) print request.command # "GET" print request.path # "/who/ken/trust.html" print request.request_version # "HTTP/1.1" print len(request.headers) # 3 print request.headers.keys() # ['accept-charset', 'host', 'accept'] print request.headers['host'] # "cm.bell-labs.com"
和
# Parsing can result in an error code and message request = HTTPRequest('GET\r\nHeader: Value\r\n\r\n') print request.error_code # 400 print request.error_message # "Bad request syntax ('GET')"
转载请注明:在路上 » 【整理】Python中用于解析Http数据包的模块/库