- 9.6.3.2.1. getUrlRespHtml用法示例:指定Referer
- 9.6.3.2.2. getUrlRespHtml用法示例:禁止自动跳转
- 9.6.3.2.3. getUrlRespHtml用法示例:手动设置Accept
- 9.6.3.2.4. getUrlRespHtml用法示例:不保持连接
- 9.6.3.2.5. getUrlRespHtml用法示例:设置Accept-Language
- 9.6.3.2.6. getUrlRespHtml用法示例:添加特定的User-Agent的header
- 9.6.3.2.7. getUrlRespHtml用法示例:设置ContentType
- 9.6.3.2.8. getUrlRespHtml用法示例:设置其他的特定的header
很多时候,在折腾抓取网页和模拟登陆时,都会遇到,需要额外再指定一些header,用于实现一些特定的目的。
比如,添加对应的Referer,以便成功模拟网页逻辑,获得所需返回的内容的:
string tmpRespHtml = ""; Dictionary<string, string> headerDict; //(1)to get cookies string pageRankMainUrl = "http://pagerank.webmasterhome.cn/"; tmpRespHtml = getUrlRespHtml(pageRankMainUrl); //(2)ask page rank string firstBaseUrl = "http://pagerank.webmasterhome.cn/?domain="; //http://pagerank.webmasterhome.cn/?domain=answers.yahoo.com string firstWholeUrl = firstBaseUrl + noHttpPreDomainUrl; headerDict = new Dictionary<string, string>(); headerDict.Add("referer", pageRankMainUrl); tmpRespHtml = getUrlRespHtml(firstWholeUrl, headerDict: headerDict);
如第 9.6.2.2 节 “默认是允许自动跳转的”所述,默认是启用了自动跳转的,想要禁止自动跳转,可以通过header去设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("AllowAutoRedirect", "false"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
此处默认的Accept是"*/*",如果想要指定不同的类型,可以手动通过header去设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Accept", "text/html"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
关于Accept更多可能的取值,自己参考官网的解释:14.1 Accept
此处默认的KeepAlive是true的,如果不想继续保持连接,则可以通过header去禁止:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Keep-Alive", "false"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
此处默认没有指定Accept-Language,有需要的话,可以去通过header设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Accept-Language", "en-US"); //"zh-CN" string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
关于Accept-Language更多可能的取值,自己参考官网的解释:14.4 Accept-Language
如第 9.6.2.1 节 “内部已默认指定了IE8的User-Agent”所述,我此处的getUrlRespHtml,默认添加的User-Agent是IE8的。
如果有需要,你可以自己换成别的,比如Firefox的User-Agent:
//Mozilla Firefox const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"; Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("User-Agent", constUserAgent_Firefox); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
其中,关于各种浏览器的User-Agent,你可以自己去网络上找到。也可以参考我代码中的值:
//IE7 const string constUserAgent_IE7_x64 = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)"; //IE8 const string constUserAgent_IE8_x64 = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E"; //IE9 const string constUserAgent_IE9_x64 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64 const string constUserAgent_IE9_x86 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86 //Chrome const string constUserAgent_Chrome = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4"; //Mozilla Firefox const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";
此处默认情况下,对于GET,没有指定ContentType,对于POST,已经指定了"application/x-www-form-urlencoded"。
如果你有别的特殊需求,需要设置ContentType的话,可以去通过header设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Content-Type", "text/plain"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
关于Content-Type更多可能的取值,自己参考官网的解释:14.17 Content-Type
在很多时候,都需要设置,某些其他的,非标准的,header信息,则也可以去通过header设置。
比如,之前折腾InsertSkydriveFiles时所用到的:
string createFolerUrl = "https://skydrive.live.com/API/2/AddFolder?lct=1"; Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Accept", "application/json"); headerDict.Add("Referer", constSkydriveUrl); headerDict.Add("Canary", gCanary); headerDict.Add("Appid", gAppid); headerDict.Add("X-Requested-With", "XMLHttpRequest"); headerDict.Add("Cache-Control", "no-cache"); string postDataStr = genCreateFolderPostData(folderName, parentId, cid); respJson = getUrlRespHtml(createFolerUrl, headerDict:headerDict, postDataStr:postDataStr);