9.6.3.2. getUrlRespHtml用法示例:传入各种header信息

很多时候,在折腾抓取网页和模拟登陆时,都会遇到,需要额外再指定一些header,用于实现一些特定的目的。

9.6.3.2.1. getUrlRespHtml用法示例:指定Referer

比如,添加对应的Referer,以便成功模拟网页逻辑,获得所需返回的内容的:


            string tmpRespHtml = "";
            Dictionary<string, string> headerDict;
            //(1)to get cookies
            string pageRankMainUrl = "http://pagerank.webmasterhome.cn/";
            tmpRespHtml = getUrlRespHtml(pageRankMainUrl);
            //(2)ask page rank
            string firstBaseUrl = "http://pagerank.webmasterhome.cn/?domain=";
            //http://pagerank.webmasterhome.cn/?domain=answers.yahoo.com
            string firstWholeUrl = firstBaseUrl + noHttpPreDomainUrl;
            headerDict = new Dictionary<string, string>();
            headerDict.Add("referer", pageRankMainUrl);
            tmpRespHtml = getUrlRespHtml(firstWholeUrl, headerDict: headerDict);

                
[注意]header中的Referer支持大小写任意

由具体的实现代码:


                    string lowecaseHeader = header.ToLower();
                    // following are allow the caller overwrite the default header setting
                    if (lowecaseHeader == "referer")
                    {
                        req.Referer = headerValue;
                    }

                    

可以看出,此处的"referer",写成常见的首字母大写"Referer"也是可以的。

9.6.3.2.2. getUrlRespHtml用法示例:禁止自动跳转

第 9.6.2.2 节 “默认是允许自动跳转的”所述,默认是启用了自动跳转的,想要禁止自动跳转,可以通过header去设置:


    Dictionary<string, string> headerDict = new Dictionary<string, string>();
    headerDict.Add("AllowAutoRedirect", "false");
    string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);

                
[注意]header中的AutoRedirect支持多种写法

由具体的实现代码:


                    else if (
                            (lowecaseHeader == "allow-autoredirect") ||
                            (lowecaseHeader == "allowautoredirect") ||
                            (lowecaseHeader == "allow autoredirect")
                            )
                    {
                        bool isAllow = false;
                        if (bool.TryParse(headerValue, out isAllow))
                        {
                            req.AllowAutoRedirect = isAllow;
                        }
                    }

                    

可以看出,此处的"AllowAutoRedirect",写成别的形式,也是支持的,比如:"allowautoredirect","allow-autoredirect", "Allow-Autoredirect","allow autoredirect","Allow Autoredirect"

9.6.3.2.3. getUrlRespHtml用法示例:手动设置Accept

此处默认的Accept是"*/*",如果想要指定不同的类型,可以手动通过header去设置:


    Dictionary<string, string> headerDict = new Dictionary<string, string>();
    headerDict.Add("Accept", "text/html");
    string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);

                

关于Accept更多可能的取值,自己参考官网的解释:14.1 Accept

[注意]header中的Accept支持大小写任意

由具体的实现代码:


                    else if (lowecaseHeader == "accept")
                    {
                        req.Accept = headerValue;
                    }

                    

可以看出,此处的"Accept",写成别的形式,也是支持的,比如:"accept"

9.6.3.2.4. getUrlRespHtml用法示例:不保持连接

此处默认的KeepAlive是true的,如果不想继续保持连接,则可以通过header去禁止:


    Dictionary<string, string> headerDict = new Dictionary<string, string>();
    headerDict.Add("Keep-Alive", "false");
    string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);

                
[注意]header中的KeepAlive支持多种写法

由具体的实现代码:


                    else if (
                            (lowecaseHeader == "keep-alive") ||
                            (lowecaseHeader == "keepalive") ||
                            (lowecaseHeader == "keep alive")
                            )
                    {
                        bool isKeepAlive = false;
                        if (bool.TryParse(headerValue, out isKeepAlive))
                        {
                            req.KeepAlive = isKeepAlive;
                        }
                    }

                    

可以看出,此处的"Keep-Alive",写成别的形式,也是支持的,比如:"keep-alive","keepalive","KeepAlive","keep alive","Keep Alive"

9.6.3.2.5. getUrlRespHtml用法示例:设置Accept-Language

此处默认没有指定Accept-Language,有需要的话,可以去通过header设置:


    Dictionary<string, string> headerDict = new Dictionary<string, string>();
    headerDict.Add("Accept-Language", "en-US"); //"zh-CN"
    string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);

                

关于Accept-Language更多可能的取值,自己参考官网的解释:14.4 Accept-Language

[注意]header中的Accept-Language支持多种写法

由具体的实现代码:


                    else if (
                            (lowecaseHeader == "accept-language") ||
                            (lowecaseHeader == "acceptlanguage") ||
                            (lowecaseHeader == "accept language")
                            )

                    {
                        req.Headers["Accept-Language"] = headerValue;
                    }

                    

可以看出,此处的"Accept-Language",写成别的形式,也是支持的,比如:"accept-language","acceptlanguage","AcceptLanguage","accept language","Accept Language"

9.6.3.2.6. getUrlRespHtml用法示例:添加特定的User-Agent的header

第 9.6.2.1 节 “内部已默认指定了IE8的User-Agent”所述,我此处的getUrlRespHtml,默认添加的User-Agent是IE8的。

如果有需要,你可以自己换成别的,比如Firefox的User-Agent:


//Mozilla Firefox
const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";
Dictionary<string, string> headerDict = new Dictionary<string, string>();
headerDict.Add("User-Agent", constUserAgent_Firefox);
string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);

                

其中,关于各种浏览器的User-Agent,你可以自己去网络上找到。也可以参考我代码中的值:


    //IE7
    const string constUserAgent_IE7_x64 = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)";
    //IE8
    const string constUserAgent_IE8_x64 = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E";
    //IE9
    const string constUserAgent_IE9_x64 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64
    const string constUserAgent_IE9_x86 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86
    //Chrome
    const string constUserAgent_Chrome = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4";
    //Mozilla Firefox
    const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";

                
[注意]header中的User-Agent支持多种写法

由具体的实现代码:


                    else if (
                            (lowecaseHeader == "user-agent") ||
                            (lowecaseHeader == "useragent") ||
                            (lowecaseHeader == "user agent")
                            )
                    {
                        req.UserAgent = headerValue;
                    }

                    

可以看出,此处的"User-Agent",写成别的形式,也是支持的,比如:"user-agent","user agent", "User Agent","UserAgent","useragent"

9.6.3.2.7. getUrlRespHtml用法示例:设置ContentType

此处默认情况下,对于GET,没有指定ContentType,对于POST,已经指定了"application/x-www-form-urlencoded"。

如果你有别的特殊需求,需要设置ContentType的话,可以去通过header设置:


    Dictionary<string, string> headerDict = new Dictionary<string, string>();
    headerDict.Add("Content-Type", "text/plain");
    string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);

                

关于Content-Type更多可能的取值,自己参考官网的解释:14.17 Content-Type

[注意]header中的Content-Type支持多种写法

由具体的实现代码:


                    else if (
                            (lowecaseHeader == "content-type") ||
                            (lowecaseHeader == "contenttype") ||
                            (lowecaseHeader == "content type")
                            )
                    {
                        req.ContentType = headerValue;
                    }

                    

可以看出,此处的"Content-Type",写成别的形式,也是支持的,比如:"content-type","contenttype","ContentType","content type","Content Type"

9.6.3.2.8. getUrlRespHtml用法示例:设置其他的特定的header

在很多时候,都需要设置,某些其他的,非标准的,header信息,则也可以去通过header设置。

比如,之前折腾InsertSkydriveFiles时所用到的:


    string createFolerUrl = "https://skydrive.live.com/API/2/AddFolder?lct=1";
    
    Dictionary<string, string> headerDict = new Dictionary<string, string>();
    headerDict.Add("Accept", "application/json");
    headerDict.Add("Referer", constSkydriveUrl);
    headerDict.Add("Canary", gCanary);
    headerDict.Add("Appid", gAppid);
    headerDict.Add("X-Requested-With", "XMLHttpRequest");
    headerDict.Add("Cache-Control", "no-cache");

    string postDataStr = genCreateFolderPostData(folderName, parentId, cid);

    respJson = getUrlRespHtml(createFolerUrl, headerDict:headerDict, postDataStr:postDataStr);

                
[注意]指定某些特定的header

由具体的实现代码:


                    else
                    {
                        req.Headers[header] = headerValue;
                    }

                    

可以看出,此处,不限制你所指定的,其他某些特殊的header,但是你自己要清楚,你设置的什么header,是用来干什么用的。