版本:v1.0
摘要
本文主要介绍了我,crifan,的C#库:crifanLib.cs的功能和用法。
2013-08-20
修订历史 | ||
---|---|---|
修订 1.0 | 2013-08-20 | crl |
|
版权 © 2013 Crifan, http://crifan.com
目录
范例清单
目录
之前在折腾WLW (Windows Live Writer) Plugin–InsertSkydriveFiles的过程中,先后遇到很多个问题,然后基本上也都自己解决了。对应的也写了相应的代码和函数。
后来又折腾了很多其他C#方面的东西,比如:
downloadSonstasteMusic(下载Songtaste歌曲)
前前后后,就把其中比较常用或通用的功能,整理提取出来,放到一个单独的文件中,即crifanLib.cs
此文就是专门针对每个函数,进行详细的解释其用法和给出示例。
该文件,之前以帖子的方式发布到这里的:crifan的C#函数库:crifanLib.cs
后来,就放到Google Code上去了,即:
所有的,完整的crifanLib.cs
的内容,都是:
其中,当前,截止到2013-08-20,crifanLib.cs
的最新版本是:
如果你在使用这些函数的遇到说某某函数,类等找不到,那很可能是没有包含对应的此处的引用。
那么则请自行参考crifanLib.cs
中的using部分,添加对应的引用。
经过后来的版本升级,此时的crifanLib.cs
中,已经包含了很多宏定义。
这些宏定义,主要用于,打开,关闭,某些库函数的,以便实现:
当你不想要使用某些函数,以及其会依赖到相关的库,的时候,则可以直接注释掉对应的宏,以实现此目的。
举例,比如,你此处,不想用.NET是3.5或更高的版本,也不想要使用JSON相关的函数,则可以在crifanLib.cs
中,把JSON的宏注释掉,即:
//#define USE_JSON
如此,就不会使用到JSON相关的函数了:此刻的效果,主要是:
#if USE_JSON using System.Web.Script.Serialization; // json lib, need: .NET 3.5+ #endif
此处,就把crifanLib.cs
目前所有依赖的库,即所有的using,都贴出来,供需要的人,自己添加自己所需要的:
//comment out following macros if not use them #define USE_GETURLRESPONSE_BW //for getUrlResponse use backgroundworker version //#define USE_HTML_PARSER_SGML //need SgmlReaderDll.dll //#define USE_HTML_PARSER_HTMLAGILITYPACK //need HtmlAgilityPack.dll //#define USE_DATAGRIDVIEW //#define USE_JSON using System; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Web; // for server using System.Net; // for client using System.IO; using System.Runtime.Serialization.Formatters.Binary; using System.Text; using System.Drawing; using System.Windows.Forms; using System.Reflection; using System.Diagnostics; using System.ComponentModel; using System.Globalization; #if USE_JSON using System.Web.Script.Serialization; // json lib, need: .NET 3.5+ #endif #if USE_HTML_PARSER_SGML using Sgml; using System.Xml; #endif #if USE_HTML_PARSER_HTMLAGILITYPACK using HtmlAgilityPack; #endif #if USE_DATAGRIDVIEW using Excel = Microsoft.Office.Interop.Excel; using Microsoft.Office.Interop.Excel; #endif
如上所述,crifanLib.cs
中包含了一些宏,用于控制一些相关的功能,是否使用。
此处,就对于这些宏,进行详细的解释:
默认关闭此宏。
其背景是:
原先的getUrlResponse,是用于获得URL的响应,属于耗时操作,其在C#中使用时,一般都是出于默认的UI进程中。
导致结果是:当调用到getUrlResponse(以及相关的getUrlRespHtml等)函数时,UI失去响应,导致用户体验很不好。
所以后来又实现了一个BackgroundWorker版本的getUrlResponse
使得,当调用getUrlResponse,UI也可以得到响应了。
所以,如果你想要用BackgroundWorker版本的getUrlResponse,就可以打开此宏:
#define USE_GETURLRESPONSE_BW //for getUrlResponse use backgroundworker version
如果没此需求,就关闭此宏:
//#define USE_GETURLRESPONSE_BW //for getUrlResponse use backgroundworker version
之前解析HTML,最初用的就是这个sgml库:SgmlReaderDll.dll
但是明显不是很好用。
后来发现了另外一个库:HtmlAgilityPack.dll,后,发现比较好用,就更多的时候,都用HtmlAgilityPack.dll这个库了。
所以,推荐做法:
当涉及到HTMl解析的时候,推荐用HtmlAgilityPack,不太推荐用Sgml
所以,一般设置为:
//#define USE_HTML_PARSER_SGML //need SgmlReaderDll.dll #define USE_HTML_PARSER_HTMLAGILITYPACK //need HtmlAgilityPack.dll
即可。
当然,如果你两个库都使用,也是可以的。
DataGridView是表格控件。
之前的很多折腾:
整理出了这些函数:
所以,当你需要的时候,可以打开此宏:
#define USE_DATAGRIDVIEW
去使用相关函数。
此处,顺便也把对应的,全局变量,初始化代码,私有函数等等,贴出来,供参考:
public struct pairItem { public string key; public string value; }; private Dictionary<string, DateTime> calcTimeList; const char replacedChar = '_'; string[] cookieFieldArr = { "expires", "domain", "secure", "path", "httponly", "version" }; //IE7 const string constUserAgent_IE7_x64 = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)"; //IE8 const string constUserAgent_IE8_x64 = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E"; //IE9 const string constUserAgent_IE9_x64 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64 const string constUserAgent_IE9_x86 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86 //Chrome const string constUserAgent_Chrome = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4"; //Mozilla Firefox const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"; private string gUserAgent; private WebProxy gProxy = null; //detault values: //getUrlResponse private const Dictionary<string, string> defHeaderDict = null; private const Dictionary<string, string> defPostDict = null; private const int defTimeout = 30 * 1000; private const string defPostDataStr = null; private const int defReadWriteTimeout = 30 * 1000; //getUrlRespHtml private const string defCharset = null; //getUrlRespHtml_multiTry private const int defMaxTryNum = 5; private const int defRetryFailSleepTime = 100; //sleep time in ms when retry fail for getUrlRespHtml List<string> cookieFieldList = new List<string>(); CookieCollection curCookies = null; //private long totalLength = 0; //private long currentLength = 0; #if USE_GETURLRESPONSE_BW //indicate background worker complete or not bool bNotCompleted_resp = true; //store response of http request private HttpWebResponse gCurResp = null; #endif private BackgroundWorker gBgwDownload; //indicate download complete or not bool bNotCompleted_download = true; //store current read out data len private int gRealReadoutLen = 0; Action<int> gFuncUpdateProgress = null; public crifanLib() { //!!! for load embedded dll: (1) register resovle handler AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve); //http related gUserAgent = constUserAgent_IE8_x64; //set max enough to avoid http request is used out -> avoid dead while get response System.Net.ServicePointManager.DefaultConnectionLimit = 200; curCookies = new CookieCollection(); // init const cookie keys foreach (string key in cookieFieldArr) { cookieFieldList.Add(key); } //init for calc time calcTimeList = new Dictionary<string, DateTime>(); #if USE_GETURLRESPONSE_BW gBgwDownload = new BackgroundWorker(); #endif //debug //gProxy = new WebProxy("127.0.0.1", 8087); } /*------------------------Private Functions------------------------------*/ //!!! for load embedded dll: (2) implement this handler System.Reflection.Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args) { string dllName = args.Name.Contains(",") ? args.Name.Substring(0, args.Name.IndexOf(',')) : args.Name.Replace(".dll", ""); dllName = dllName.Replace(".", "_"); if (dllName.EndsWith("_resources")) return null; System.Resources.ResourceManager rm = new System.Resources.ResourceManager(GetType().Namespace + ".Properties.Resources", System.Reflection.Assembly.GetExecutingAssembly()); byte[] bytes = (byte[])rm.GetObject(dllName); return System.Reflection.Assembly.Load(bytes); } // replace the replacedChar back to original ',' private string _recoverExpireField(Match foundPprocessedExpire) { string recovedStr = ""; recovedStr = foundPprocessedExpire.Value.Replace(replacedChar, ','); return recovedStr; } //replace ',' with replacedChar private string _processExpireField(Match foundExpire) { string replacedComma = ""; replacedComma = foundExpire.Value.ToString().Replace(',', replacedChar); return replacedComma; } //replace "0A" (in \x0A) into '\n' private string _replaceEscapeSequenceToChar(Match foundEscapeSequence) { char[] hexValues = new char[2]; //string hexChars = foundEscapeSequence.Value.ToString(); string matchedEscape = foundEscapeSequence.ToString(); hexValues[0] = matchedEscape[2]; hexValues[1] = matchedEscape[3]; string hexValueString = new string(hexValues); int convertedInt = int.Parse(hexValueString, NumberStyles.HexNumber, NumberFormatInfo.InvariantInfo); char hexChar = Convert.ToChar(convertedInt); string hexStr = hexChar.ToString(); return hexStr; } //check whether need add/retain this cookie // not add for: // ck is null or ck name is null // domain is null and curDomain is not set // expired and retainExpiredCookie==false private bool needAddThisCookie(Cookie ck, string curDomain) { bool needAdd = false; if ((ck == null) || (ck.Name == "")) { needAdd = false; } else { if (ck.Domain != "") { needAdd = true; } else// ck.Domain == "" { if (curDomain != "") { ck.Domain = curDomain; needAdd = true; } else // curDomain == "" { // not set current domain, omit this // should not add empty domain cookie, for this will lead execute CookieContainer.Add() fail !!! needAdd = false; } } } return needAdd; } //quote the input dict values //note: the return result for first para no '&' private string _quoteParas(Dictionary<string, string> paras, bool spaceToPercent20 = true) { string quotedParas = ""; bool isFirst = true; string val = ""; foreach (string para in paras.Keys) { if (paras.TryGetValue(para, out val)) { string encodedVal = ""; if (spaceToPercent20) { //encodedVal = HttpUtility.UrlPathEncode(val); //encodedVal = Uri.EscapeDataString(val); //encodedVal = Uri.EscapeUriString(val); encodedVal = HttpUtility.UrlEncode(val).Replace("+", "%20"); } else { encodedVal = HttpUtility.UrlEncode(val); //space to + } if (isFirst) { isFirst = false; quotedParas += para + "=" + encodedVal; } else { quotedParas += "&" + para + "=" + encodedVal; } } else { break; } } return quotedParas; } /* get url's response * */ private HttpWebResponse _getUrlResponse(string url, Dictionary<string, string> headerDict = defHeaderDict, Dictionary<string, string> postDict = defPostDict, int timeout = defTimeout, string postDataStr = defPostDataStr, int readWriteTimeout = defReadWriteTimeout) { //CookieCollection parsedCookies; HttpWebResponse resp = null; HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); req.AllowAutoRedirect = true; req.Accept = "*/*"; //req.ContentType = "text/plain"; //const string gAcceptLanguage = "en-US"; // zh-CN/en-US //req.Headers["Accept-Language"] = gAcceptLanguage; req.KeepAlive = true; req.UserAgent = gUserAgent; req.Headers["Accept-Encoding"] = "gzip, deflate"; //req.AutomaticDecompression = DecompressionMethods.GZip; req.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate; req.Proxy = gProxy; if (timeout > 0) { req.Timeout = timeout; } if (readWriteTimeout > 0) { //default ReadWriteTimeout is 300000=300 seconds = 5 minutes !!! //too long, so here change to 300000 = 30 seconds //for support TimeOut for later StreamReader's ReadToEnd req.ReadWriteTimeout = readWriteTimeout; } if (curCookies != null) { req.CookieContainer = new CookieContainer(); req.CookieContainer.PerDomainCapacity = 40; // following will exceed max default 20 cookie per domain req.CookieContainer.Add(curCookies); } if ((headerDict != null) && (headerDict.Count > 0)) { foreach (string header in headerDict.Keys) { string headerValue = ""; if (headerDict.TryGetValue(header, out headerValue)) { string lowecaseHeader = header.ToLower(); // following are allow the caller overwrite the default header setting if (lowecaseHeader == "referer") { req.Referer = headerValue; } else if ( (lowecaseHeader == "allow-autoredirect") || (lowecaseHeader == "allowautoredirect") || (lowecaseHeader == "allow autoredirect") ) { bool isAllow = false; if (bool.TryParse(headerValue, out isAllow)) { req.AllowAutoRedirect = isAllow; } } else if (lowecaseHeader == "accept") { req.Accept = headerValue; } else if ( (lowecaseHeader == "keep-alive") || (lowecaseHeader == "keepalive") || (lowecaseHeader == "keep alive") ) { bool isKeepAlive = false; if (bool.TryParse(headerValue, out isKeepAlive)) { req.KeepAlive = isKeepAlive; } } else if ( (lowecaseHeader == "accept-language") || (lowecaseHeader == "acceptlanguage") || (lowecaseHeader == "accept language") ) { req.Headers["Accept-Language"] = headerValue; } else if ( (lowecaseHeader == "user-agent") || (lowecaseHeader == "useragent") || (lowecaseHeader == "user agent") ) { req.UserAgent = headerValue; } else if ( (lowecaseHeader == "content-type") || (lowecaseHeader == "contenttype") || (lowecaseHeader == "content type") ) { req.ContentType = headerValue; } else { req.Headers[header] = headerValue; } } else { break; } } } if (((postDict != null) && (postDict.Count > 0)) || (!string.IsNullOrEmpty(postDataStr))) { req.Method = "POST"; if (req.ContentType == null) { req.ContentType = "application/x-www-form-urlencoded"; } if ((postDict != null) && (postDict.Count > 0)) { postDataStr = _quoteParas(postDict); } //byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData); byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr); req.ContentLength = postBytes.Length; try { Stream postDataStream = req.GetRequestStream(); postDataStream.Write(postBytes, 0, postBytes.Length); postDataStream.Close(); } catch (WebException webEx) { //for prev has set ReadWriteTimeout //so here also may timeout if (webEx.Status == WebExceptionStatus.Timeout) { req = null; } } } else { req.Method = "GET"; } if (req != null) { //may timeout, has fixed in: //http://www.crifan.com/fixed_problem_sometime_httpwebrequest_getresponse_timeout/ try { resp = (HttpWebResponse)req.GetResponse(); updateLocalCookies(resp.Cookies, ref curCookies); } catch (WebException webEx) { if (webEx.Status == WebExceptionStatus.Timeout) { resp = null; } } } return resp; } #if USE_GETURLRESPONSE_BW private void getUrlResponse_bw(string url, Dictionary<string, string> headerDict = defHeaderDict, Dictionary<string, string> postDict = defPostDict, int timeout = defTimeout, string postDataStr = defPostDataStr, int readWriteTimeout = defReadWriteTimeout) { // Create a background thread BackgroundWorker bgwGetUrlResp = new BackgroundWorker(); bgwGetUrlResp.DoWork += new DoWorkEventHandler(bgwGetUrlResp_DoWork); bgwGetUrlResp.RunWorkerCompleted += new RunWorkerCompletedEventHandler( bgwGetUrlResp_RunWorkerCompleted ); //init bNotCompleted_resp = true; // run in another thread object paraObj = new object[] { url, headerDict, postDict, timeout, postDataStr, readWriteTimeout }; bgwGetUrlResp.RunWorkerAsync(paraObj); } private void bgwGetUrlResp_DoWork(object sender, DoWorkEventArgs e) { object[] paraObj = (object[])e.Argument; string url = (string)paraObj[0]; Dictionary<string, string> headerDict = (Dictionary<string, string>)paraObj[1]; Dictionary<string, string> postDict = (Dictionary<string, string>)paraObj[2]; int timeout = (int)paraObj[3]; string postDataStr = (string)paraObj[4]; int readWriteTimeout = (int)paraObj[5]; e.Result = _getUrlResponse(url, headerDict, postDict, timeout, postDataStr, readWriteTimeout); } //void m_bgWorker_ProgressChanged(object sender, ProgressChangedEventArgs e) //{ // bRespNotCompleted = true; //} private void bgwGetUrlResp_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e) { // The background process is complete. We need to inspect // our response to see if an error occurred, a cancel was // requested or if we completed successfully. // Check to see if an error occurred in the // background process. if (e.Error != null) { //MessageBox.Show(e.Error.Message); return; } // Check to see if the background process was cancelled. if (e.Cancelled) { //MessageBox.Show("Cancelled ..."); } else { bNotCompleted_resp = false; // Everything completed normally. // process the response using e.Result //MessageBox.Show("Completed..."); gCurResp = (HttpWebResponse)e.Result; } } #endif private void getUrlRespStreamBytes_bw(ref Byte[] respBytesBuf, string url, Dictionary<string, string> headerDict, Dictionary<string, string> postDict, int timeout, Action<int> funcUpdateProgress) { // Create a background thread gBgwDownload = new BackgroundWorker(); gBgwDownload.DoWork += bgwDownload_DoWork; gBgwDownload.RunWorkerCompleted += bgwDownload_RunWorkerCompleted; gBgwDownload.WorkerReportsProgress = true; gBgwDownload.ProgressChanged += bgwDownload_ProgressChanged; //init bNotCompleted_download = true; gFuncUpdateProgress = funcUpdateProgress; // run in another thread object paraObj = new object[] {respBytesBuf, url, headerDict, postDict, timeout}; gBgwDownload.RunWorkerAsync(paraObj); } private void bgwDownload_ProgressChanged(object sender, ProgressChangedEventArgs e) { if (gFuncUpdateProgress != null) { // This function fires on the UI thread so it's safe to edit // the UI control directly, no funny business with Control.Invoke. // Update the progressBar with the integer supplied to us from the // ReportProgress() function. Note, e.UserState is a "tag" property // that can be used to send other information from the // BackgroundThread to the UI thread. gFuncUpdateProgress(e.ProgressPercentage); } } private void bgwDownload_DoWork(object sender, DoWorkEventArgs e) { // // The sender is the BackgroundWorker object we need it to // // report progress and check for cancellation. // BackgroundWorker gBgwDownload = sender as BackgroundWorker; object[] paraObj = (object[])e.Argument; Byte[] respBytesBuf = (Byte[])paraObj[0]; string url = (string)paraObj[1]; Dictionary<string, string> headerDict = (Dictionary<string, string>)paraObj[2]; Dictionary<string, string> postDict = (Dictionary<string, string>)paraObj[3]; int timeout = (int)paraObj[4]; //e.Result = _getUrlRespStreamBytes(ref respBytesBuf, url, headerDict, postDict, timeout); int curReadoutLen; int realReadoutLen = 0; int curBufPos = 0; long totalLength = 0; long currentLength = 0; try { //HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout); HttpWebResponse resp = getUrlResponse(url, headerDict, postDict); long expectReadoutLen = resp.ContentLength; totalLength = expectReadoutLen; currentLength = 0; Stream binStream = resp.GetResponseStream(); //int streamDataLen = (int)binStream.Length; // erro: not support seek operation do { //let up layer update its UI, otherwise up layer UI will no response during this func exec time //now has make this function to call by backgroundworker, so not need this to update UI //System.Windows.Forms.Application.DoEvents(); // here download logic is: // once request, return some data // request multiple time, until no more data curReadoutLen = binStream.Read(respBytesBuf, curBufPos, (int)expectReadoutLen); if (curReadoutLen > 0) { curBufPos += curReadoutLen; currentLength = curBufPos; expectReadoutLen = expectReadoutLen - curReadoutLen; realReadoutLen += curReadoutLen; int currentPercent = (int)((currentLength * 100) / totalLength); if (currentPercent < 0) { currentPercent = 0; } if (currentPercent > 100) { currentPercent = 100; } gBgwDownload.ReportProgress(currentPercent); } } while (curReadoutLen > 0); } catch (Exception ex) { string errorMessage = ex.Message; realReadoutLen = -1; } //return realReadoutLen; e.Result = realReadoutLen; //gBgwDownload.ReportProgress(100); } private void bgwDownload_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e) { // The background process is complete. We need to inspect // our response to see if an error occurred, a cancel was // requested or if we completed successfully. // Check to see if an error occurred in the // background process. if (e.Error != null) { //MessageBox.Show(e.Error.Message); return; } // Check to see if the background process was cancelled. if (e.Cancelled) { //MessageBox.Show("Cancelled ..."); } else { bNotCompleted_download = false; // Everything completed normally. // process the response using e.Result //MessageBox.Show("Completed..."); gRealReadoutLen = (int)e.Result; } }
/* * [Function] * find root TreeNode of current TreeNode * [Input] * some TreeNode * * [Output] * root TreeNode of input TreeNode * * [Note] */ public TreeNode findRootTreeNode(TreeNode curTreeNode) { TreeNode rootTreeNode = curTreeNode.Parent; if (rootTreeNode == null) { //root parent is null rootTreeNode = curTreeNode; } else { //child parent is not null while (rootTreeNode.Parent != null) { rootTreeNode = rootTreeNode.Parent; } } return rootTreeNode; }
例 1.1. findRootTreeNode的使用范例
//get input TreeNode's BrowseNode's SearchIndex private string getSearchIndex(TreeNode curTreeNode) { string strSearchIndex = ""; //find the root node TreeNode rootTreeNode = crl.findRootTreeNode(curTreeNode);
/* * [Function] * un highlight tree node * [Input] * some TreeNode * * [Output] * restore color to background color * * [Note] */ public Color unHighlightNode(TreeView trvValue, TreeNode treeNode) { Color oldColor = trvValue.BackColor; if (treeNode != null) { oldColor = treeNode.BackColor; treeNode.BackColor = trvValue.BackColor; treeNode.ForeColor = Color.Black; } return oldColor; }
例 1.2. unHighlightNode的使用范例
else if (e.ClickedItem == tsmiRemoveFromSelection) { if (curSelTreeNodeList.Contains(curSelTreeNode)) { //remove selection curSelTreeNodeList.Remove(curSelTreeNode); //unhightlight node crl.unHighlightNode(trvCategoryTree, curSelTreeNode); } }
/* * [Function] * highlight tree node * [Input] * some TreeNode * * [Output] * set color to highlighted color * * [Note] */ public Color highlightNode(TreeView trvValue, TreeNode someNode) { Color oldColor = trvValue.BackColor; //"{Name=Window, ARGB=(255, 255, 255, 255)}" if (someNode != null) { oldColor = someNode.BackColor; //"{Name=0, ARGB=(0, 0, 0, 0)}" // HTML #3399FF -> RGB(51,153,255) //"{Name=MenuHighlight, ARGB=(255, 51, 153, 255)}" someNode.BackColor = SystemColors.MenuHighlight; //node.BackColor = nodeHlBackColor; //node.ForeColor = Color.FromArgb(255, 255, 255); someNode.ForeColor = Color.White; } return oldColor; }
例 1.3. highlightNode的使用范例
if (e.ClickedItem == tsmiAddToSelection) { if (!curSelTreeNodeList.Contains(curSelTreeNode)) { // add to selection curSelTreeNodeList.Add(curSelTreeNode); //hightlight node crl.highlightNode(trvCategoryTree, curSelTreeNode); } }
目录
public float ounceToKiloGram(float ounce) { float kiloGram = ounce * 0.028349523125F; return kiloGram; }
例 2.1. ounceToKiloGram的使用范例
float kiloGram = -1.0F; string weightNumberStr = ""; //type1: //http://www.amazon.com/Kindle-Fire-HD/dp/B0083PWAPW/ref=lp_1055398_1_1?ie=UTF8&qid=1369487181&sr=1-1 //<td style="font-weight: bold;text-align:left; font-size: 12px; border-bottom: 1px solid #e2e2e2;" align="right">Weight</td><td style="font-size:12px;">13.9 ounces (395 grams)</td> //http://www.amazon.com/Kindle-Paperwhite-Touch-light/dp/B007OZNZG0/ref=lp_1055398_1_2?ie=UTF8&qid=1369487181&sr=1-2 //<td style="font-weight: bold;text-align:left; font-size: 12px; border-bottom: 1px solid #e2e2e2;" align="right">Weight</td><td style="font-size:12px;">7.5 ounces (213 grams)</td> if (!calculatedKiloGram) { if (crl.extractSingleStr(@"Weight</td><td style=""[^<>]+?"">([\.\d]+) ounces", productHtml, out weightNumberStr)) { float onces = float.Parse(weightNumberStr); kiloGram = crl.ounceToKiloGram(onces);
public float kiloGramToOunce(float kiloGram) { float ounce = kiloGram * 35.27396194958F; return ounce; }
public float poundToKiloGram(float pound) { float kiloGram = pound * 0.45359237F; return kiloGram; }
例 2.3. poundToKiloGram 的使用范例
else if (unitType.Equals("pounds")) { float pound = float.Parse(weightNumberStr); kiloGram = crl.poundToKiloGram(pound); }
public float kiloGramToPound(float kiloGram) { float pound = kiloGram * 0.45359237F; return pound; }
public float inchToCm(float inch) { float cm = inch * 2.54F; return cm; }
例 2.5. inchToCm 的使用范例
dimensionInch.length = float.Parse(lengthInchStr); dimensionInch.width = float.Parse(widthInchStr); dimensionInch.height = float.Parse(heightInchStr); dimensionCm.length = crl.inchToCm(dimensionInch.length); dimensionCm.width = crl.inchToCm(dimensionInch.width); dimensionCm.height = crl.inchToCm(dimensionInch.height);
//equivalent of Math.Random() in Javascript //get a 17 bit double value x, 0 < x < 1, eg:0.68637410117610087 public double mathRandom() { Random rdm = new Random(); double betweenZeroToOne17Bit = rdm.NextDouble(); return betweenZeroToOne17Bit; }
目录
此处是和时间(Time,DateTime等)有关的函数
使用前,先做最开始的初始化:
private Dictionary<string, DateTime> calcTimeList; //init for calc time calcTimeList = new Dictionary<string, DateTime>();
每次使用之前,使用:
// init for calculate time span public void elapsedTimeSpanInit(string keyName) { calcTimeList.Add(keyName, DateTime.Now); }
然后就可以获得对应的时间消耗了:
// got calculated time span public double getElapsedTimeSpan(string keyName) { double milliSec = 0.0; if (calcTimeList.ContainsKey(keyName)) { DateTime startTime = calcTimeList[keyName]; DateTime endTime = DateTime.Now; milliSec = (endTime - startTime).TotalMilliseconds; } return milliSec; }
//refer: http://bytes.com/topic/c-sharp/answers/713458-c-function-equivalent-javascript-gettime-function //get current time in milli-second-since-epoch(1970/01/01) public double getCurTimeInMillisec() { DateTime st = new DateTime(1970, 1, 1); TimeSpan t = (DateTime.Now - st); return t.TotalMilliseconds; // milli seconds since epoch }
例 4.2. getCurTimeInMillisec 的使用范例
double curMilliSecDouble = crl.getCurTimeInMillisec(); //1343392590725.6758
// parse the milli second to local DateTime value public DateTime milliSecToDateTime(double milliSecSinceEpoch) { DateTime st = new DateTime(1970, 1, 1, 0, 0, 0); st = st.AddMilliseconds(milliSecSinceEpoch); return st; }
例 4.3. milliSecToDateTime 的使用范例
double doubleVal = 0.0; if (Double.TryParse(dateValue, out doubleVal)) { // try whether is double/int64 milliSecSinceEpoch parsedDatetime = milliSecToDateTime(doubleVal); parseOK = true; }
//parse xxx in "new Date(xxx)" of javascript to C# DateTime //input example: //new Date(1329198041411.84) / new Date(1329440307389.9) / new Date(1329440307483) public bool parseJsNewDate(string newDateStr, out DateTime parsedDatetime) { bool parseOK = false; parsedDatetime = new DateTime(); if ((newDateStr != "") && (newDateStr.Trim() != "")) { string dateValue = ""; if (extractSingleStr(@".*new\sDate\((.+?)\).*", newDateStr, out dateValue)) { double doubleVal = 0.0; if (Double.TryParse(dateValue, out doubleVal)) { // try whether is double/int64 milliSecSinceEpoch parsedDatetime = milliSecToDateTime(doubleVal); parseOK = true; } else if (DateTime.TryParse(dateValue, out parsedDatetime)) { // try normal DateTime string //refer: http://www.w3schools.com/js/js_obj_date.asp //October 13, 1975 11:13:00 //79,5,24 / 79,5,24,11,33,0 //1329198041411.3344 / 1329198041411.84 / 1329198041411 parseOK = true; } } } return parseOK; }
例 4.4. parseJsNewDate 的使用范例
DateTime expireTime; if (parseJsNewDate(expire, out expireTime)) { parsedCk.Expires = expireTime; }
目录
此处是和字符串(string等)有关的函数
//input: [4] Valid: B0009IQZFM //output: ============================ [4] Valid: B0009IQZFM ============================= public string formatString(string strToFormat, char cPaddingChar = '*', int iTotalWidth = 80) { //auto added space strToFormat = " " + strToFormat + " "; //" [4] Valid: B0009IQZFM " //1. padding left int iPaddingLen = (iTotalWidth - strToFormat.Length)/2; int iLefTotalLen = iPaddingLen + strToFormat.Length; string strLefPadded = strToFormat.PadLeft(iLefTotalLen, cPaddingChar); //"============================ [4] Valid: B0009IQZFM " //2. padding right string strFormatted = strLefPadded.PadRight(iTotalWidth, cPaddingChar); //"============================ [4] Valid: B0009IQZFM =============================" return strFormatted; }
例 5.1. formatstring 的使用范例
string strFullCategoryName = String.Format("FullCategoryName={0}", curFullCategoryName); string strFormattedFullCategoryName = crl.formatString(strFullCategoryName, '=');
//init the string array to empty public string[] emptyStringArray(string[] strArr) { if (strArr != null) { for (int idx = 0; idx < strArr.Length; idx++) { strArr[idx] = String.Empty; //strArr[idx] = ""; } } return strArr; }
例 5.2. emptyStringArray 的使用范例
//5 bullet //public string[] bulletArr; // total 5 (or more, but only record 5) productInfo.bulletArr = new string[5]; crl.emptyStringArray(productInfo.bulletArr);
// encode "!" to "%21" public string encodeExclamationMark(string inputStr) { return inputStr.Replace("!", "%21"); }
例 5.3. encodeExclamationMark 的使用范例
getItemsUrl += "id=" + encodeExclamationMark(folderId).ToLower();
// encode "%21" to "!" public string decodeExclamationMark(string inputStr) { return inputStr.Replace("%21", "!"); }
//using Regex to extract single string value // caller should make sure the string to extract is Groups[1] == include single () !!! public bool extractSingleStr(string pattern, string extractFrom, out string extractedStr) { bool extractOK = false; Regex rx = new Regex(pattern); Match found = rx.Match(extractFrom); if (found.Success) { extractOK = true; extractedStr = found.Groups[1].ToString(); } else { extractOK = false; extractedStr = ""; } return extractOK; }
例 5.5. extractSingleStr 的使用范例
string resPreloadUrl = ""; //var srf_uPreload = 'https://skydrive.live.com/handlers/resourcespreload.mvc?view=Folders.All&id;=250206&mkt;=EN-US'; string resPreloadP = @"var\ssrf_uPreload\s=\s'(.+?)';"; extractSingleStr(resPreloadP, html, out resPreloadUrl);
//quote the input dict values //note: the return result for first para no '&' public string quoteParas(Dictionary<string, string> paras, bool spaceToPercent20 = true) { string quotedParas = ""; bool isFirst = true; string val = ""; foreach (string para in paras.Keys) { if (paras.TryGetValue(para, out val)) { string encodedVal = ""; if (spaceToPercent20) { //encodedVal = HttpUtility.UrlPathEncode(val); //encodedVal = Uri.EscapeDataString(val); //encodedVal = Uri.EscapeUriString(val); encodedVal = HttpUtility.UrlEncode(val).Replace("+", "%20"); } else { encodedVal = HttpUtility.UrlEncode(val); //space to + } if (isFirst) { isFirst = false; quotedParas += para + "=" + encodedVal; } else { quotedParas += "&" + para + "=" + encodedVal; } } else { break; } } return quotedParas; }
例 5.6. quoteParas 的使用范例
Dictionary<string, string> postDataDict = genPostsrfPostDict(html, login, passwd, isKeepLogin); postData += quoteParas(postDataDict);
//remove invalid char in path and filename public string removeInvChrInPath(string origFileOrPathStr) { string validFileOrPathStr = origFileOrPathStr; //filter out invalid title and artist char //char[] invalidChars = { '\\', '/', ':', '*', '?', '<', '>', '|', '\b' }; char[] invalidChars = Path.GetInvalidPathChars(); char[] invalidCharsInName = Path.GetInvalidFileNameChars(); foreach (char chr in invalidChars) { validFileOrPathStr = validFileOrPathStr.Replace(chr.ToString(), ""); } foreach (char chr in invalidCharsInName) { validFileOrPathStr = validFileOrPathStr.Replace(chr.ToString(), ""); } return validFileOrPathStr; }
例 5.7. removeInvChrInPath 的使用范例
string mid_tit; if (crl.extractSingleStr(@"<p\s+?class=""mid_tit"">(?<mid_tit>.+?)<p>", respHtml, out mid_tit)) { albumInfo.name = crl.removeInvChrInPath(mid_tit); } string h1user; if (crl.extractSingleStr(@"<h1\s+?class=""h1user"">(?<h1user>.+?)</h1>", respHtml, out h1user)) { albumInfo.author = crl.removeInvChrInPath(h1user); }
//convert \xXX into corresponding char //eg: \x0A -> '\n' public string filterEscapeSequence(string esacapeSequenceStr) { string filteredStr = Regex.Replace(esacapeSequenceStr, @"\\x\w{2}", new MatchEvaluator(_replaceEscapeSequenceToChar)); return filteredStr; }
//extract filename from url //eg: //http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KC/KC-slate-01-lg._V401028090_.jpg //KC-slate-01-lg._V401028090_.jpg //file:///C:/Users/CLi/AppData/Local/Temp/WindowsLiveWriter-1737927945/supfilesC19F10/now-the-service-status-is-active_thu%5B1%5D.png //now-the-service-status-is-active_thu%5B1%5D.png public string extractFilenameFromUrl(string fullUrl) { string filename = ""; string[] slashList = fullUrl.Split('/'); filename = slashList[slashList.Length - 1]; return filename; }
例 5.9. extractFilenameFromUrl 的使用范例
string imageUrl = imageUrlList[idx]; gLogger.Info(String.Format("[{0}]={1}", idx, imageUrl)); string picFilename = crl.extractFilenameFromUrl(imageUrl);
此处是和数组(Array)有关的函数
//given a string array 'origStrArr', get a sub string array from 'startIdx', length is 'len' public string[] getSubStrArr(string[] origStrArr, int startIdx, int len) { string[] subStrArr = new string[] { }; if ((origStrArr != null) && (origStrArr.Length > 0) && (len > 0)) { List<string> strList = new List<string>(); int endPos = startIdx + len; if (endPos > origStrArr.Length) { endPos = origStrArr.Length; } for (int i = startIdx; i < endPos; i++) { //refer: http://zhidao.baidu.com/question/296384408.html strList.Add(origStrArr[i]); } subStrArr = new string[len]; strList.CopyTo(subStrArr); } return subStrArr; }
例 6.1. getSubStrArr 的使用范例
string[] fieldExpressions = getSubStrArr(expressions, 1, expressions.Length - 1);
目录
//extrat the Host from input url //example: from https://skydrive.live.com/, extracted Host is "skydrive.live.com" public string extractHost(string url) { string domain = ""; if ((url != "") && (url.Contains("/"))) { string[] splited = url.Split('/'); domain = splited[2]; } return domain; }
//extrat the domain from input url //example: from https://skydrive.live.com/, extracted domain is ".live.com" public string extractDomain(string url) { string host = ""; string domain = ""; host = extractHost(url); if (host.Contains(".")) { domain = host.Substring(host.IndexOf('.')); } return domain; }
例 7.2. extractDomain 的使用范例
private string gCurDomain; //update latest cookies gCurDomain = commLib.extractDomain(getItemsUrl);
//extrat the domain url from original url //from //http://answers.yahoo.com/question/index?qid=20130323071141AA8PffP //get //http://answers.yahoo.com public string getDomainUrl(string url) { string domainUrl = ""; Regex urlRx = new Regex(@"((https)|(http)|(ftp))://[\w\-\.]+"); Match foundUrl = urlRx.Match(url); if (foundUrl.Success) { //int slashIndex = foundUrl.Index + foundUrl.Length; domainUrl = url.Substring(0, foundUrl.Length); } else { domainUrl = ""; } return domainUrl; }
//add recognized cookie field: expires/domain/path/secure/httponly/version, into cookie public bool addFieldToCookie(ref Cookie ck, pairItem pairInfo) { bool added = false; if (pairInfo.key != "") { string lowerKey = pairInfo.key.ToLower(); switch (lowerKey) { case "expires": DateTime expireDatetime; if (DateTime.TryParse(pairInfo.value, out expireDatetime)) { // note: here coverted to local time: GMT +8 ck.Expires = expireDatetime; //update expired filed if (DateTime.Now.Ticks > ck.Expires.Ticks) { ck.Expired = true; } added = true; } break; case "domain": ck.Domain = pairInfo.value; added = true; break; case "secure": ck.Secure = true; added = true; break; case "path": ck.Path = pairInfo.value; added = true; break; case "httponly": ck.HttpOnly = true; added = true; break; case "version": int versionValue; if (int.TryParse(pairInfo.value, out versionValue)) { ck.Version = versionValue; added = true; } break; default: break; } } return added; }//addFieldToCookie
例 7.4. addFieldToCookie 的使用范例
public bool parseSingleCookie(string cookieStr, ref Cookie ck) { bool parsedOk = true; //Cookie ck = new Cookie(); //string[] expressions = cookieStr.Split(";".ToCharArray(),StringSplitOptions.RemoveEmptyEntries); //refer: http://msdn.microsoft.com/en-us/library/b873y76a.aspx string[] expressions = cookieStr.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries); //get cookie name and value pairItem pair = new pairItem(); if (parseCookieNameValue(expressions[0], out pair)) { ck.Name = pair.key; ck.Value = pair.value; string[] fieldExpressions = getSubStrArr(expressions, 1, expressions.Length - 1); foreach (string eachExpression in fieldExpressions) { //parse key and value if (parseCookieField(eachExpression, out pair)) { // add to cookie field if possible addFieldToCookie(ref ck, pair); }
public bool isValidCookieField(string cookieKey) { return cookieFieldList.Contains(cookieKey.ToLower()); }
例 7.5. isValidCookieField 的使用范例
pair.key = ckFieldExpr.Substring(0, equalPos); pair.key = pair.key.Trim(); if (isValidCookieField(pair.key)) { // only process while is valid cookie field pair.value = ckFieldExpr.Substring(equalPos + 1); pair.value = pair.value.Trim(); parsedOK = true; }
//cookie field example: //WLSRDAuth=FAAaARQL3KgEDBNbW84gMYrDN0fBab7xkQNmAAAEgAAACN7OQIVEO14E2ADnX8vEiz8fTuV7bRXem4Yeg/DI6wTk5vXZbi2SEOHjt%2BbfDJMZGybHQm4NADcA9Qj/tBZOJ/ASo5d9w3c1bTlU1jKzcm2wecJ5JMJvdmTCj4J0oy1oyxbMPzTc0iVhmDoyClU1dgaaVQ15oF6LTQZBrA0EXdBxq6Mu%2BUgYYB9DJDkSM/yFBXb2bXRTRgNJ1lruDtyWe%2Bm21bzKWS/zFtTQEE56bIvn5ITesFu4U8XaFkCP/FYLiHj6gpHW2j0t%2BvvxWUKt3jAnWY1Tt6sXhuSx6CFVDH4EYEEUALuqyxbQo2ugNwDkP9V5O%2B5FAyCf; path=/; domain=.livefilestore.com; HttpOnly;, //WLSRDSecAuth=FAAaARQL3KgEDBNbW84gMYrDN0fBab7xkQNmAAAEgAAACJFcaqD2IuX42ACdjP23wgEz1qyyxDz0kC15HBQRXH6KrXszRGFjDyUmrC91Zz%2BgXPFhyTzOCgQNBVfvpfCPtSccxJHDIxy47Hq8Cr6RGUeXSpipLSIFHumjX5%2BvcJWkqxDEczrmBsdGnUcbz4zZ8kP2ELwAKSvUteey9iHytzZ5Ko12G72%2Bbk3BXYdnNJi8Nccr0we97N78V0bfehKnUoDI%2BK310KIZq9J35DgfNdkl12oYX5LMIBzdiTLwN1%2Bx9DgsYmmgxPbcuZPe/7y7dlb00jNNd8p/rKtG4KLLT4w3EZkUAOcUwGF746qfzngDlOvXWVvZjGzA; path=/; domain=.livefilestore.com; HttpOnly; secure;, //RPSShare=1; path=/;, //ANON=A=DE389D4D076BF47BCAE4DC05FFFFFFFF&E=c44&W=1; path=/; domain=.livefilestore.com;, //NAP=V=1.9&E=bea&C=VTwb1vAsVjCeLWrDuow-jCNgP5eS75JWWvYVe3tRppviqKixCvjqgw&W=1; path=/; domain=.livefilestore.com;, //RPSMaybe=; path=/; domain=.livefilestore.com; expires=Thu, 30-Oct-1980 16:00:00 GMT; //check whether the cookie name is valid or not public bool isValidCookieName(string ckName) { bool isValid = true; if (ckName == null) { isValid = false; } else { string invalidP = @"\W+"; Regex rx = new Regex(invalidP); Match foundInvalid = rx.Match(ckName); if (foundInvalid.Success) { isValid = false; } } return isValid; }
例 7.6. isValidCookieName 的使用范例
name = foundSetck.Groups[1].ToString(); value = foundSetck.Groups[2].ToString(); domain = foundSetck.Groups[3].ToString(); path = foundSetck.Groups[4].ToString(); expire = foundSetck.Groups[5].ToString(); secure = foundSetck.Groups[6].ToString(); // must: name valid and domain is not null if (isValidCookieName(name) && (domain != "")) { parseOK = true; parsedCk.Name = name; parsedCk.Value = value; parsedCk.Domain = domain; parsedCk.Path = path;
// parse the cookie name and value public bool parseCookieNameValue(string ckNameValueExpr, out pairItem pair) { bool parsedOK = false; if (ckNameValueExpr == "") { pair.key = ""; pair.value = ""; parsedOK = false; } else { ckNameValueExpr = ckNameValueExpr.Trim(); int equalPos = ckNameValueExpr.IndexOf('='); if (equalPos > 0) // is valid expression { pair.key = ckNameValueExpr.Substring(0, equalPos); pair.key = pair.key.Trim(); if (isValidCookieName(pair.key)) { // only process while is valid cookie field pair.value = ckNameValueExpr.Substring(equalPos + 1); pair.value = pair.value.Trim(); parsedOK = true; } else { pair.key = ""; pair.value = ""; parsedOK = false; } } else { pair.key = ""; pair.value = ""; parsedOK = false; } } return parsedOK; }
例 7.7. parseCookieNameValue 的使用范例
//string[] expressions = cookieStr.Split(";".ToCharArray(),StringSplitOptions.RemoveEmptyEntries); //refer: http://msdn.microsoft.com/en-us/library/b873y76a.aspx string[] expressions = cookieStr.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries); //get cookie name and value pairItem pair = new pairItem(); if (parseCookieNameValue(expressions[0], out pair)) {
// parse cookie field expression public bool parseCookieField(string ckFieldExpr, out pairItem pair) { bool parsedOK = false; if (ckFieldExpr == "") { pair.key = ""; pair.value = ""; parsedOK = false; } else { ckFieldExpr = ckFieldExpr.Trim(); //some specials: secure/httponly if (ckFieldExpr.ToLower() == "httponly") { pair.key = "httponly"; //pair.value = ""; pair.value = "true"; parsedOK = true; } else if (ckFieldExpr.ToLower() == "secure") { pair.key = "secure"; //pair.value = ""; pair.value = "true"; parsedOK = true; } else // normal cookie field { int equalPos = ckFieldExpr.IndexOf('='); if (equalPos > 0) // is valid expression { pair.key = ckFieldExpr.Substring(0, equalPos); pair.key = pair.key.Trim(); if (isValidCookieField(pair.key)) { // only process while is valid cookie field pair.value = ckFieldExpr.Substring(equalPos + 1); pair.value = pair.value.Trim(); parsedOK = true; } else { pair.key = ""; pair.value = ""; parsedOK = false; } } else { pair.key = ""; pair.value = ""; parsedOK = false; } } } return parsedOK; }//parseCookieField
例 7.8. parseCookieField 的使用范例
foreach (string eachExpression in fieldExpressions) { //parse key and value if (parseCookieField(eachExpression, out pair)) { // add to cookie field if possible addFieldToCookie(ref ck, pair); } else { // if any field fail, consider it is a abnormal cookie string, so quit with false parsedOk = false; break; } }
//parse single cookie string to a cookie //example: //MSPShared=1; expires=Wed, 30-Dec-2037 16:00:00 GMT;domain=login.live.com;path=/;HTTPOnly= ;version=1 //PPAuth=CkLXJYvPpNs3w!fIwMOFcraoSIAVYX3K!CdvZwQNwg3Y7gv74iqm9MqReX8XkJqtCFeMA6GYCWMb9m7CoIw!ID5gx3pOt8sOx1U5qQPv6ceuyiJYwmS86IW*l3BEaiyVCqFvju9BMll7!FHQeQholDsi0xqzCHuW!Qm2mrEtQPCv!qF3Sh9tZDjKcDZDI9iMByXc6R*J!JG4eCEUHIvEaxTQtftb4oc5uGpM!YyWT!r5jXIRyxqzsCULtWz4lsWHKzwrNlBRbF!A7ZXqXygCT8ek6luk7rarwLLJ!qaq2BvS; domain=login.live.com;secure= ;path=/;HTTPOnly= ;version=1 public bool parseSingleCookie(string cookieStr, ref Cookie ck) { bool parsedOk = true; //Cookie ck = new Cookie(); //string[] expressions = cookieStr.Split(";".ToCharArray(),StringSplitOptions.RemoveEmptyEntries); //refer: http://msdn.microsoft.com/en-us/library/b873y76a.aspx string[] expressions = cookieStr.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptyEntries); //get cookie name and value pairItem pair = new pairItem(); if (parseCookieNameValue(expressions[0], out pair)) { ck.Name = pair.key; ck.Value = pair.value; string[] fieldExpressions = getSubStrArr(expressions, 1, expressions.Length - 1); foreach (string eachExpression in fieldExpressions) { //parse key and value if (parseCookieField(eachExpression, out pair)) { // add to cookie field if possible addFieldToCookie(ref ck, pair); } else { // if any field fail, consider it is a abnormal cookie string, so quit with false parsedOk = false; break; } } } else { parsedOk = false; } return parsedOk; }//parseSingleCookie
例 7.9. parseSingleCookie 的使用范例
Cookie ck = new Cookie(); // recover it back string recoveredCookieStr = Regex.Replace(cookieStr, @"xpires=\w{3}" + replacedChar + @"\s\d{2}-\w{3}-\d{4}", new MatchEvaluator(_recoverExpireField)); if (parseSingleCookie(recoveredCookieStr, ref ck)) { if (needAddThisCookie(ck, curDomain)) { parsedCookies.Add(ck); } }
// parse the Set-Cookie string (in http response header) to cookies // Note: auto omit to parse the abnormal cookie string // normal example for 'setCookieStr': // MSPOK= ; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=login.live.com;path=/;HTTPOnly= ;version=1,PPAuth=Cuyf3Vp2wolkjba!TOr*0v22UMYz36ReuiwxZZBc8umHJYPlRe4qupywVFFcIpbJyvYZ5ZDLBwV4zRM1UCjXC4tUwNuKvh21iz6gQb0Tu5K7Z62!TYGfowB9VQpGA8esZ7iCRucC7d5LiP3ZAv*j4Z3MOecaJwmPHx7!wDFdAMuQUZURhHuZWJiLzHP1j8ppchB2LExnlHO6IGAdZo1f0qzSWsZ2hq*yYP6sdy*FdTTKo336Q1B0i5q8jUg1Yv6c2FoBiNxhZSzxpuU0WrNHqSytutP2k4!wNc6eSnFDeouX; domain=login.live.com;secure= ;path=/;HTTPOnly= ;version=1,PPLState=1; domain=.live.com;path=/;version=1,MSPShared=1; expires=Wed, 30-Dec-2037 16:00:00 GMT;domain=login.live.com;path=/;HTTPOnly= ;version=1,MSPPre= ;domain=login.live.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,MSPCID= ; HTTPOnly= ; domain=login.live.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,RPSTAuth=EwDoARAnAAAUWkziSC7RbDJKS1VkhugDegv7L0eAAOfCAY2+pKwbV5zUlu3XmBbgrQ8EdakmdSqK9OIKfMzAbnU8fuwwEi+FKtdGSuz/FpCYutqiHWdftd0YF21US7+1bPxuLJ0MO+wVXB8GtjLKZaA0xCXlU5u01r+DOsxSVM777DmplaUc0Q4O1+Pi9gX9cyzQLAgRKmC/QtlbVNKDA2YAAAhIwqiXOVR/DDgBocoO/n0u48RFGh79X2Q+gO4Fl5GMc9Vtpa7SUJjZCCfoaitOmcxhEjlVmR/2ppdfJx3Ykek9OFzFd+ijtn7K629yrVFt3O9q5L0lWoxfDh5/daLK7lqJGKxn1KvOew0SHlOqxuuhYRW57ezFyicxkxSI3aLxYFiqHSu9pq+TlITqiflyfcAcw4MWpvHxm9on8Y1dM2R4X3sxuwrLQBpvNsG4oIaldTYIhMEnKhmxrP6ZswxzteNqIRvMEKsxiksBzQDDK/Cnm6QYBZNsPawc6aAedZioeYwaV3Z/i3tNrAUwYTqLXve8oG6ZNXL6WLT/irKq1EMilK6Cw8lT3G13WYdk/U9a6YZPJC8LdqR0vAHYpsu/xRF39/On+xDNPE4keIThJBptweOeWQfsMDwvgrYnMBKAMjpLZwE=; domain=.live.com;path=/;HTTPOnly= ;version=1,RPSTAuthTime=1328679636; domain=login.live.com;path=/;HTTPOnly= ;version=1,MSPAuth=2OlAAMHXtDIFOtpaK1afG2n*AAxdfCnCBlJFn*gCF8gLnCa1YgXEfyVh2m9nZuF*M7npEwb4a7Erpb*!nH5G285k7AswJOrsr*gY29AVAbsiz2UscjIGHkXiKrTvIzkV2M; domain=.live.com;path=/;HTTPOnly= ;version=1,MSPProf=23ci9sti6DZRrkDXfTt1b3lHhMdheWIcTZU2zdJS9!zCloHzMKwX30MfEAcCyOjVt*5WeFSK3l2ZahtEaK7HPFMm3INMs3r!JxI8odP9PYRHivop5ryohtMYzWZzj3gVVurcEr5Bg6eJJws7rXOggo3cR4FuKLtXwz*FVX0VWuB5*aJhRkCT1GZn*L5Pxzsm9X; domain=.live.com;path=/;HTTPOnly= ;version=1,MSNPPAuth=CiGSMoUOx4gej8yQkdFBvN!gvffvAhCPeWydcrAbcg!O2lrhVb4gruWSX5NZCBPsyrtZKmHLhRLTUUIxxPA7LIhqW5TCV*YcInlG2f5hBzwzHt!PORYbg79nCkvw65LKG399gRGtJ4wvXdNlhHNldkBK1jVXD4PoqO1Xzdcpv4sj68U6!oGrNK5KgRSMXXpLJmCeehUcsRW1NmInqQXpyanjykpYOcZy0vq!6PIxkj3gMaAvm!1vO58gXM9HX9dA0GloNmCDnRv4qWDV2XKqEKp!A7jiIMWTmHup1DZ!*YCtDX3nUVQ1zAYSMjHmmbMDxRJECz!1XEwm070w16Y40TzuKAJVugo!pyF!V2OaCsLjZ9tdGxGwEQRyi0oWc*Z7M0FBn8Fz0Dh4DhCzl1NnGun9kOYjK5itrF1Wh17sT!62ipv1vI8omeu0cVRww2Kv!qM*LFgwGlPOnNHj3*VulQOuaoliN4MUUxTA4owDubYZoKAwF*yp7Mg3zq5Ds2!l9Q$$; domain=.live.com;path=/;HTTPOnly= ;version=1,MH=MSFT; domain=.live.com;path=/;version=1,MHW=; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.live.com;path=/;version=1,MHList=; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.live.com;path=/;version=1,NAP=V=1.9&E=bea&C=zfjCKKBD0TqjZlWGgRTp__NiK08Lme_0XFaiKPaWJ0HDuMi2uCXafQ&W=1;domain=.live.com;path=/,ANON=A=DE389D4D076BF47BCAE4DC05FFFFFFFF&E=c44&W=1;domain=.live.com;path=/,MSPVis=$9;domain=login.live.com;path=/,pres=; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.live.com;path=/;version=1,LOpt=0; domain=login.live.com;path=/;version=1,WLSSC=EgBnAQMAAAAEgAAACoAASfCD+8dUptvK4kvFO0gS3mVG28SPT3Jo9Pz2k65r9c9KrN4ISvidiEhxXaPLCSpkfa6fxH3FbdP9UmWAa9KnzKFJu/lQNkZC3rzzMcVUMjbLUpSVVyscJHcfSXmpGGgZK4ZCxPqXaIl9EZ0xWackE4k5zWugX7GR5m/RzakyVIzWAFwA1gD9vwYA7Vazl9QKMk/UCjJPECcAAAoQoAAAFwBjcmlmYW4yMDAzQGhvdG1haWwuY29tAE8AABZjcmlmYW4yMDAzQGhvdG1haWwuY29tAAAACUNOAAYyMTM1OTIAAAZlCAQCAAB3F21AAARDAAR0aWFuAAR3YW5nBMgAAUkAAAAAAAAAAAAAAaOKNpqLi/UAANQKMk/Uf0RPAAAAAAAAAAAAAAAADgA1OC4yNDAuMjM2LjE5AAUAAAAAAAAAAAAAAAABBAABAAABAAABAAAAAAAAAAA=; domain=.live.com;secure= ;path=/;HTTPOnly= ;version=1,MSPSoftVis=@72198325083833620@:@; domain=login.live.com;path=/;version=1 // here now support parse the un-correct Set-Cookie: // MSPRequ=/;Version=1;version<=1328770452&id=250915&co=1; path=/;version=1,MSPVis=$9; Version=1;version=1$250915;domain=login.live.com;path=/,MSPSoftVis=@72198325083833620@:@; domain=login.live.com;path=/;version=1,MSPBack=1328770312; domain=login.live.com;path=/;version=1 public CookieCollection parseSetCookie(string setCookieStr, string curDomain) { CookieCollection parsedCookies = new CookieCollection(); // process for expires and Expires field, for it contains ',' //refer: http://www.yaosansi.com/post/682.html // may contains expires or Expires, so following use xpires string commaReplaced = Regex.Replace(setCookieStr, @"xpires=\w{3},\s\d{2}-\w{3}-\d{4}", new MatchEvaluator(_processExpireField)); string[] cookieStrArr = commaReplaced.Split(','); foreach (string cookieStr in cookieStrArr) { Cookie ck = new Cookie(); // recover it back string recoveredCookieStr = Regex.Replace(cookieStr, @"xpires=\w{3}" + replacedChar + @"\s\d{2}-\w{3}-\d{4}", new MatchEvaluator(_recoverExpireField)); if (parseSingleCookie(recoveredCookieStr, ref ck)) { if (needAddThisCookie(ck, curDomain)) { parsedCookies.Add(ck); } } } return parsedCookies; }//parseSetCookie
函数所输入的setCookieStr的值,是类似这种的:
MSPOK= ; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=login.live.com;path=/;HTTPOnly= ;version=1,PPAuth=Cuyf3Vp2wolkjba!TOr*0v22UMYz36ReuiwxZZBc8umHJYPlRe4qupywVFFcIpbJyvYZ5ZDLBwV4zRM1UCjXC4tUwNuKvh21iz6gQb0Tu5K7Z62!TYGfowB9VQpGA8esZ7iCRucC7d5LiP3ZAv*j4Z3MOecaJwmPHx7!wDFdAMuQUZURhHuZWJiLzHP1j8ppchB2LExnlHO6IGAdZo1f0qzSWsZ2hq*yYP6sdy*FdTTKo336Q1B0i5q8jUg1Yv6c2FoBiNxhZSzxpuU0WrNHqSytutP2k4!wNc6eSnFDeouX; domain=login.live.com;secure= ;path=/;HTTPOnly= ;version=1,PPLState=1; domain=.live.com;path=/;version=1,MSPShared=1; expires=Wed, 30-Dec-2037 16:00:00 GMT;domain=login.live.com;path=/;HTTPOnly= ;version=1,MSPPre= ;domain=login.live.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,MSPCID= ; HTTPOnly= ; domain=login.live.com;path=/;Expires=Thu, 30-Oct-1980 16:00:00 GMT,RPSTAuth=EwDoARAnAAAUWkziSC7RbDJKS1VkhugDegv7L0eAAOfCAY2+pKwbV5zUlu3XmBbgrQ8EdakmdSqK9OIKfMzAbnU8fuwwEi+FKtdGSuz/FpCYutqiHWdftd0YF21US7+1bPxuLJ0MO+wVXB8GtjLKZaA0xCXlU5u01r+DOsxSVM777DmplaUc0Q4O1+Pi9gX9cyzQLAgRKmC/QtlbVNKDA2YAAAhIwqiXOVR/DDgBocoO/n0u48RFGh79X2Q+gO4Fl5GMc9Vtpa7SUJjZCCfoaitOmcxhEjlVmR/2ppdfJx3Ykek9OFzFd+ijtn7K629yrVFt3O9q5L0lWoxfDh5/daLK7lqJGKxn1KvOew0SHlOqxuuhYRW57ezFyicxkxSI3aLxYFiqHSu9pq+TlITqiflyfcAcw4MWpvHxm9on8Y1dM2R4X3sxuwrLQBpvNsG4oIaldTYIhMEnKhmxrP6ZswxzteNqIRvMEKsxiksBzQDDK/Cnm6QYBZNsPawc6aAedZioeYwaV3Z/i3tNrAUwYTqLXve8oG6ZNXL6WLT/irKq1EMilK6Cw8lT3G13WYdk/U9a6YZPJC8LdqR0vAHYpsu/xRF39/On+xDNPE4keIThJBptweOeWQfsMDwvgrYnMBKAMjpLZwE=; domain=.live.com;path=/;HTTPOnly= ;version=1,RPSTAuthTime=1328679636; domain=login.live.com;path=/;HTTPOnly= ;version=1,MSPAuth=2OlAAMHXtDIFOtpaK1afG2n*AAxdfCnCBlJFn*gCF8gLnCa1YgXEfyVh2m9nZuF*M7npEwb4a7Erpb*!nH5G285k7AswJOrsr*gY29AVAbsiz2UscjIGHkXiKrTvIzkV2M; domain=.live.com;path=/;HTTPOnly= ;version=1,MSPProf=23ci9sti6DZRrkDXfTt1b3lHhMdheWIcTZU2zdJS9!zCloHzMKwX30MfEAcCyOjVt*5WeFSK3l2ZahtEaK7HPFMm3INMs3r!JxI8odP9PYRHivop5ryohtMYzWZzj3gVVurcEr5Bg6eJJws7rXOggo3cR4FuKLtXwz*FVX0VWuB5*aJhRkCT1GZn*L5Pxzsm9X; domain=.live.com;path=/;HTTPOnly= ;version=1,MSNPPAuth=CiGSMoUOx4gej8yQkdFBvN!gvffvAhCPeWydcrAbcg!O2lrhVb4gruWSX5NZCBPsyrtZKmHLhRLTUUIxxPA7LIhqW5TCV*YcInlG2f5hBzwzHt!PORYbg79nCkvw65LKG399gRGtJ4wvXdNlhHNldkBK1jVXD4PoqO1Xzdcpv4sj68U6!oGrNK5KgRSMXXpLJmCeehUcsRW1NmInqQXpyanjykpYOcZy0vq!6PIxkj3gMaAvm!1vO58gXM9HX9dA0GloNmCDnRv4qWDV2XKqEKp!A7jiIMWTmHup1DZ!*YCtDX3nUVQ1zAYSMjHmmbMDxRJECz!1XEwm070w16Y40TzuKAJVugo!pyF!V2OaCsLjZ9tdGxGwEQRyi0oWc*Z7M0FBn8Fz0Dh4DhCzl1NnGun9kOYjK5itrF1Wh17sT!62ipv1vI8omeu0cVRww2Kv!qM*LFgwGlPOnNHj3*VulQOuaoliN4MUUxTA4owDubYZoKAwF*yp7Mg3zq5Ds2!l9Q$$; domain=.live.com;path=/;HTTPOnly= ;version=1,MH=MSFT; domain=.live.com;path=/;version=1,MHW=; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.live.com;path=/;version=1,MHList=; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.live.com;path=/;version=1,NAP=V=1.9&E=bea&C=zfjCKKBD0TqjZlWGgRTp__NiK08Lme_0XFaiKPaWJ0HDuMi2uCXafQ&W=1;domain=.live.com;path=/,ANON=A=DE389D4D076BF47BCAE4DC05FFFFFFFF&E=c44&W=1;domain=.live.com;path=/,MSPVis=$9;domain=login.live.com;path=/,pres=; expires=Thu, 30-Oct-1980 16:00:00 GMT;domain=.live.com;path=/;version=1,LOpt=0; domain=login.live.com;path=/;version=1,WLSSC=EgBnAQMAAAAEgAAACoAASfCD+8dUptvK4kvFO0gS3mVG28SPT3Jo9Pz2k65r9c9KrN4ISvidiEhxXaPLCSpkfa6fxH3FbdP9UmWAa9KnzKFJu/lQNkZC3rzzMcVUMjbLUpSVVyscJHcfSXmpGGgZK4ZCxPqXaIl9EZ0xWackE4k5zWugX7GR5m/RzakyVIzWAFwA1gD9vwYA7Vazl9QKMk/UCjJPECcAAAoQoAAAFwBjcmlmYW4yMDAzQGhvdG1haWwuY29tAE8AABZjcmlmYW4yMDAzQGhvdG1haWwuY29tAAAACUNOAAYyMTM1OTIAAAZlCAQCAAB3F21AAARDAAR0aWFuAAR3YW5nBMgAAUkAAAAAAAAAAAAAAaOKNpqLi/UAANQKMk/Uf0RPAAAAAAAAAAAAAAAADgA1OC4yNDAuMjM2LjE5AAUAAAAAAAAAAAAAAAABBAABAAABAAABAAAAAAAAAAA=; domain=.live.com;secure= ;path=/;HTTPOnly= ;version=1,MSPSoftVis=@72198325083833620@:@; domain=login.live.com;path=/;version=1
此处同时支持解析那些“非正常”的Set-Cookie:
MSPRequ=/;Version=1;version<=1328770452&id=250915&co=1; path=/;version=1,MSPVis=$9; Version=1;version=1$250915;domain=login.live.com;path=/,MSPSoftVis=@72198325083833620@:@; domain=login.live.com;path=/;version=1,MSPBack=1328770312; domain=login.live.com;path=/;version=1
例 7.10. parseSetCookie 的使用范例
resp = (HttpWebResponse)req.GetResponse(); //update latest cookies gCurDomain = commLib.extractDomain(getItemsUrl); CookieCollection parsedCookies = commLib.parseSetCookie(resp.Headers["Set-Cookie"], gCurDomain); commLib.updateLocalCookies(parsedCookies, ref skydriveCookies);
另外一个例子:
resp = (HttpWebResponse)req.GetResponse(); // here resp.Cookies may be uncorrect, so parse the returned Set-Cookie to get real cookies parsedCookies = commLib.parseSetCookie(resp.Headers["Set-Cookie"], gCurDomain); commLib.updateLocalCookies(parsedCookies, ref skydriveCookies);
【已解决】又发现一个C#中解析Set-Cookie的一个bug:无故地添加cookie的path域中的例子:
HttpWebResponse addNk1Response = crl.getUrlResponse(addNk1Url, headerDict: headerDict, postDict: postDict);//<script>location.href='/add/'</script> String curDomain = crl.extractHost(addPhpUrl);//new.guguyu.com CookieCollection parsedCookies = crl.parseSetCookie(addNk1Response.Headers["Set-Cookie"], curDomain); CookieCollection curCookies = crl.getCurCookies(); crl.updateLocalCookies(parsedCookies, ref curCookies); crl.setCurCookies(curCookies);
为了更加方便使用,又添加了一个重载函数:
// parse Set-Cookie string part into cookies // leave current domain to empty, means omit the parsed cookie, which is not set its domain value public CookieCollection parseSetCookie(string setCookieStr) { return parseSetCookie(setCookieStr, ""); }
所以上述调用此函数时,也可以不指定对应的domain:
resp = (HttpWebResponse)req.GetResponse(); //update latest cookies CookieCollection parsedCookies = commLib.parseSetCookie(resp.Headers["Set-Cookie"]); commLib.updateLocalCookies(parsedCookies, ref skydriveCookies);
当然,此时要注意,domain为空的cookie,一般来说,在后续的http的请求中,往往都是由于domain不匹配,而变成无效的cookie。
所以此处你需要知道自己在干什么,搞清楚了,再去使用此不指定domain的版本的parseSetCookie。
//parse Javascript string "$Cookie.setCookie(XXX);" to a cookie // input example: //$Cookie.setCookie('wla42','cHJveHktYmF5LnB2dC1jb250YWN0cy5tc24uY29tfGJ5MioxLDlBOEI4QkY1MDFBMzhBMzYsMSwwLDA=','live.com','/',new Date(1328842189083.44),1); //$Cookie.setCookie('wla42','YnkyKjEsOUE4QjhCRjUwMUEzOEEzNiwwLCww','live.com','/',new Date(1329198041411.84),1); //$Cookie.setCookie('wla42', 'YnkyKjEsOUE4QjhCRjUwMUEzOEEzNiwwLCww', 'live.com', '/', new Date(1329440307389.9), 1); //$Cookie.setCookie('wla42', 'cHJveHktYmF5LnB2dC1jb250YWN0cy5tc24uY29tfGJ5MioxLDlBOEI4QkY1MDFBMzhBMzYsMSwwLDA=', 'live.com', '/', new Date(1329440307483.5), 1); //$Cookie.setCookie('wls', 'A|eyJV-t:a*nS', '.live.com', '/', null, 1); //$Cookie.setCookie('MSNPPAuth','','.live.com','/',new Date(1327971507311.9),1); public bool parseJsSetCookie(string singleSetCookieStr, out Cookie parsedCk) { bool parseOK = false; parsedCk = new Cookie(); string name = ""; string value = ""; string domain = ""; string path = ""; string expire = ""; string secure = ""; // 1=name 2=value 3=domain 4=path 5=expire 6=secure string setckP = @"\$Cookie\.setCookie\('(\w+)',\s*'(.*?)',\s*'([\w\.]+)',\s*'(.+?)',\s*(.+?),\s*(\d?)\);"; Regex setckRx = new Regex(setckP); Match foundSetck = setckRx.Match(singleSetCookieStr); if (foundSetck.Success) { name = foundSetck.Groups[1].ToString(); value = foundSetck.Groups[2].ToString(); domain = foundSetck.Groups[3].ToString(); path = foundSetck.Groups[4].ToString(); expire = foundSetck.Groups[5].ToString(); secure = foundSetck.Groups[6].ToString(); // must: name valid and domain is not null if (isValidCookieName(name) && (domain != "")) { parseOK = true; parsedCk.Name = name; parsedCk.Value = value; parsedCk.Domain = domain; parsedCk.Path = path; // note, here even parse expire field fail //do not consider it must fail to parse the whole cookie if (expire.Trim() == "null") { // do nothing } else { DateTime expireTime; if (parseJsNewDate(expire, out expireTime)) { parsedCk.Expires = expireTime; } } if (secure == "1") { parsedCk.Secure = true; } else { parsedCk.Secure = false; } }//if (isValidCookieName(name) && (domain != "")) }//foundSetck.Success return parseOK; }
//check whether a cookie is expired //if expired property is set, then just return it value //if not set, check whether is a session cookie, if is, then not expired //if expires is set, check its real time is expired or not public bool isCookieExpired(Cookie ck) { bool isExpired = false; if ((ck != null) && (ck.Name != "")) { if (ck.Expired) { isExpired = true; } else { DateTime initExpiresValue = (new Cookie()).Expires; DateTime expires = ck.Expires; if (expires.Equals(initExpiresValue)) { // expires is not set, means this is session cookie, so here no expire } else { // has set expire value if (DateTime.Now.Ticks > expires.Ticks) { isExpired = true; } } } } else { isExpired = true; } return isExpired; }
例 7.12. isCookieExpired 的使用范例
//extract cookies for upload file cookiesForUploadFile = new CookieCollection(); foreach (Cookie ck in skydriveCookies) { if ((ck.Domain == constDomainLiveCom) && (!commLib.isCookieExpired(ck))) { Cookie ckToAdd = new Cookie(ck.Name, ck.Value, ck.Path, ck.Domain); ckToAdd.HttpOnly = ck.HttpOnly; ckToAdd.Expires = ck.Expires; ckToAdd.Secure = ck.Secure; ckToAdd.Version = ck.Version; cookiesForUploadFile.Add(ckToAdd); } } //!!! if not seperatly set new domain value, then will overwirtten the original domain of cookie in skydriveCookies foreach (Cookie ckNew in cookiesForUploadFile) { ckNew.Domain = constDomainUsersStorageLive; }
//add a single cookie to cookies, if already exist, update its value public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies, bool overwriteDomain) { bool found = false; if (cookies.Count > 0) { foreach (Cookie originalCookie in cookies) { if (originalCookie.Name == toAdd.Name) { // !!! for different domain, cookie is not same, // so should not set the cookie value here while their domains is not same // only if it explictly need overwrite domain if ((originalCookie.Domain == toAdd.Domain) || ((originalCookie.Domain != toAdd.Domain) && overwriteDomain)) { //here can not force convert CookieCollection to HttpCookieCollection, //then use .remove to remove this cookie then add // so no good way to copy all field value originalCookie.Value = toAdd.Value; originalCookie.Domain = toAdd.Domain; originalCookie.Expires = toAdd.Expires; originalCookie.Version = toAdd.Version; originalCookie.Path = toAdd.Path; //following fields seems should not change //originalCookie.HttpOnly = toAdd.HttpOnly; //originalCookie.Secure = toAdd.Secure; found = true; break; } } } } if (!found) { if (toAdd.Domain != "") { // if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!! cookies.Add(toAdd); } } }//addCookieToCookies //add singel cookie to cookies, default no overwrite domain public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies) { addCookieToCookies(toAdd, ref cookies, false); }
例 7.13. addCookieToCookies 的使用范例
//ref CookieCollection localCookies foreach (Cookie newCookie in cookiesToUpdate) { if (isContainCookie(newCookie, omitUpdateCookies)) { // need omit process this } else { addCookieToCookies(newCookie, ref localCookies); } }
//check whether the cookies contains the ckToCheck cookie //support: //ckTocheck is Cookie/string //cookies is Cookie/string/CookieCollection/string[] public bool isContainCookie(object ckToCheck, object cookies) { bool isContain = false; if ((ckToCheck != null) && (cookies != null)) { string ckName = ""; Type type = ckToCheck.GetType(); //string typeStr = ckType.ToString(); //if (ckType.FullName == "System.string") if (type.Name.ToLower() == "string") { ckName = (string)ckToCheck; } else if (type.Name == "Cookie") { ckName = ((Cookie)ckToCheck).Name; } if (ckName != "") { type = cookies.GetType(); // is single Cookie if (type.Name == "Cookie") { if (ckName == ((Cookie)cookies).Name) { isContain = true; } } // is CookieCollection else if (type.Name == "CookieCollection") { foreach (Cookie ck in (CookieCollection)cookies) { if (ckName == ck.Name) { isContain = true; break; } } } // is single cookie name string else if (type.Name.ToLower() == "string") { if (ckName == (string)cookies) { isContain = true; } } // is cookie name string[] else if (type.Name.ToLower() == "string[]") { foreach (string name in ((string[])cookies)) { if (ckName == name) { isContain = true; break; } } } } } return isContain; }//isContainCookie
例 7.14. isContainCookie 的使用范例
foreach (Cookie newCookie in cookiesToUpdate) { if (isContainCookie(newCookie, omitUpdateCookies)) { // need omit process this } else { addCookieToCookies(newCookie, ref localCookies); } }
主要用于管理本地Cookie。
比如提交某http请求后,返回一些cookie,然后加入到本地Cookies数组变量中,用于后续使用。
// update cookiesToUpdate to localCookies // if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies, object omitUpdateCookies) { if (cookiesToUpdate.Count > 0) { if (localCookies == null) { localCookies = cookiesToUpdate; } else { foreach (Cookie newCookie in cookiesToUpdate) { if (isContainCookie(newCookie, omitUpdateCookies)) { // need omit process this } else { addCookieToCookies(newCookie, ref localCookies); } } } } }//updateLocalCookies //update cookiesToUpdate to localCookies public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies) { updateLocalCookies(cookiesToUpdate, ref localCookies, null); }
例 7.15. updateLocalCookies 的使用范例
resp = (HttpWebResponse)req.GetResponse(); updateLocalCookies(resp.Cookies, ref curCookies);
// given a cookie name ckName, get its value from CookieCollection cookies public bool getCookieVal(string ckName, ref CookieCollection cookies, out string ckVal) { //string ckVal = ""; ckVal = ""; bool gotValue = false; foreach (Cookie ck in cookies) { if (ck.Name == ckName) { gotValue = true; ckVal = ck.Value; break; } } return gotValue; }
// serialize an object to string public bool serializeObjToStr(Object obj, out string serializedStr) { bool serializeOk = false; serializedStr = ""; try { MemoryStream memoryStream = new MemoryStream(); BinaryFormatter binaryFormatter = new BinaryFormatter(); binaryFormatter.Serialize(memoryStream, obj); serializedStr = System.Convert.ToBase64String(memoryStream.ToArray()); serializeOk = true; } catch { serializeOk = false; } return serializeOk; }
例 8.1. serializeObjToStr 的使用范例
[Serializable] public struct loginInfo_t { public bool valid; public string username; public string cid; public string appid; public string bitProtocol; public string canary; public CookieCollection cookies; public DateTime createdTime; // record the login info(cookie) create time public DateTime lastUpldateTime;// last update the login info(cookie)'s time }; private bool updateLoginInfo(skydrive.loginInfo_t loginInfo) { bool updateOk = false; string serializedStr = ""; loginInfo.lastUpldateTime = DateTime.Now; if (skydrive.commLib.serializeObjToStr(loginInfo, out serializedStr)) { Settings.Default.loginInfoStr = serializedStr; Settings.Default.Save(); updateOk = true; }
// deserialize the string to an object public bool deserializeStrToObj(string serializedStr, out object deserializedObj) { bool deserializeOk = false; deserializedObj = null; try { byte[] restoredBytes = System.Convert.FromBase64String(serializedStr); MemoryStream restoredMemoryStream = new MemoryStream(restoredBytes); BinaryFormatter binaryFormatter = new BinaryFormatter(); deserializedObj = binaryFormatter.Deserialize(restoredMemoryStream); deserializeOk = true; } catch { deserializeOk = false; } return deserializeOk; }
例 8.2. deserializeStrToObj 的使用范例
//restore login info object deserializedObj = null; if (skydrive.commLib.deserializeStrToObj(Settings.Default.loginInfoStr, out deserializedObj)) { loginInfo = (skydrive.loginInfo_t)deserializedObj;
目录
此处是和网络(Http等)有关的函数
/* set proxy * Note: * 1. current only support http proxy * 2. current only support single proxy */ public void setProxy(string proxyIp, int proxyPort) { gProxy = new WebProxy(proxyIp, proxyPort); }
然后后续的(去用getUrlRespHtml等等)去访问网络,就会自动使用该代理了。
/* * Note: currently support auto handle cookies * currently only support single caller -> multiple caller of these functions will cause cookies accumulated * you can clear previous cookies to avoid unexpected result by call clearCurCookies */ public void clearCurCookies() { if (curCookies != null) { curCookies = null; curCookies = new CookieCollection(); } }
例 9.2. clearCurCookies 的使用范例
//http://www.crifan.com/example_of_how_to_use_ie9_f12_to_capture_the_real_music_mp3_address_of_some_songtaste_musc/ // here must clear previous cookies // otherwise access html with previous cookies will get fault html: //信息提示: 对不起,该用户不存在! 3 秒钟以后系统将自动跳转! crl.clearCurCookies(); string respHtml = ""; respHtml = crl.getUrlRespHtml(songInfo.url, stHtmlCharset);
另外InsertSkydriveFiles中的一个例子:
private void clearGolobalValues() { //gCurDomain = ""; skydriveCookies = null; commLib.clearCurCookies();
/* get current cookies */ public CookieCollection getCurCookies() { return curCookies; }
例 9.3. getCurCookies 的使用范例
string primeRespHtml = getSkydriveRespHtmlLogin(ref resp); skydriveCookies = getCurCookies();
另外【已解决】又发现一个C#中解析Set-Cookie的一个bug:无故地添加cookie的path域中的一个例子:
crl = new crifanLib(); HttpWebResponse addNk1Response = crl.getUrlResponse(addNk1Url, headerDict: headerDict, postDict: postDict);//<script>location.href='/add/'</script> String curDomain = crl.extractHost(addPhpUrl);//new.guguyu.com CookieCollection parsedCookies = crl.parseSetCookie(addNk1Response.Headers["Set-Cookie"], curDomain); CookieCollection curCookies = crl.getCurCookies(); crl.updateLocalCookies(parsedCookies, ref curCookies); crl.setCurCookies(curCookies);
主要用于,重置当前的cookie,设置为所需的状态。
/* set current cookies */ public void setCurCookies(CookieCollection cookies) { curCookies = cookies; }
例 9.4. setCurCookies 的使用范例
skydriveCookies = new CookieCollection(); skydriveCookies = loginInfo.cookies; setCurCookies(skydriveCookies);
另外【已解决】又发现一个C#中解析Set-Cookie的一个bug:无故地添加cookie的path域中的一个例子:
crl = new crifanLib(); HttpWebResponse addNk1Response = crl.getUrlResponse(addNk1Url, headerDict: headerDict, postDict: postDict);//<script>location.href='/add/'</script> String curDomain = crl.extractHost(addPhpUrl);//new.guguyu.com CookieCollection parsedCookies = crl.parseSetCookie(addNk1Response.Headers["Set-Cookie"], curDomain); CookieCollection curCookies = crl.getCurCookies(); crl.updateLocalCookies(parsedCookies, ref curCookies); crl.setCurCookies(curCookies);
/* get url's response * */ public HttpWebResponse getUrlResponse(string url, Dictionary<string, string> headerDict = defHeaderDict, Dictionary<string, string> postDict = defPostDict, int timeout = defTimeout, string postDataStr = defPostDataStr, int readWriteTimeout = defReadWriteTimeout) { #if USE_GETURLRESPONSE_BW //BackgroundWorker Version getUrlResponse HttpWebResponse localCurResp = null; getUrlResponse_bw(url, headerDict, postDict, timeout, postDataStr, readWriteTimeout); while (bNotCompleted_resp) { System.Windows.Forms.Application.DoEvents(); } localCurResp = gCurResp; //clear gCurResp = null; return localCurResp; #else //non-BackgroundWorker Version getUrlResponse return _getUrlResponse(url, headerDict, postDict, timeout, postDataStr);; #endif }
从上面的代码中可以看出,此处的getUrlResponse内部的实现,是依赖于是否设置宏USE_GETURLRESPONSE_BW,而去调用对应的BackgroundWorker版本的,还是非BackgroundWorker版本的_getUrlResponse
此处,getUrlResponse,是用来返回HttpWebResponse的,且支持N多参数。
下面就对于getUrlResponse的各个参数,进行详细解释一下:
headerDict的意思是,header的dict,即用于存放对应的header信息
默认的headerDict的值为defHeaderDict
defHeaderDict值是null:
private const Dictionary<string, string> defHeaderDict = null;
作用是,当不指定对应的header信息时,默认为空
常见用法中,一般也不需要指定此headerDict
当然,有时候,需要用到一些header,比如其中最最常见的referer等等。
postDict即POST的dict,用于存放post数据。
默认的postDict的值为defPostDict
defPostDict值是null:
private const Dictionary<string, string> defPostDict = null;
一般的GET时,无需指定此参数。
只有当是POST时,才可能会用到此postDict。
timeout用于指定网络超时的最大允许时间,单位是毫秒ms。
默认的timeout的值为defTimeout
defTimeout值是30000毫秒==30秒:
private const int defTimeout = 30 * 1000;
注意,此timeout,是针对于http网络发送请求后,得到服务器的响应之前,这段时间,是否超时,即和GetResponse和GetRequestStream有关。
一般来说,也不需要设置此timeout,即无需改变对应的默认超时时间。
当然,如果有需要,可以根据你自己的情况修改为更合适的值。
postDataStr是用来传递,特殊的POST的数据是以回车为分隔符的那些POST数据的。
postDataStr的默认值为defPostDataStr
defPostDataStr值也是null:
private const string defPostDataStr = null;
需要注意的是,如果是GET,很明显无需关系此参数,而如果是POST,正常情况下,也只需要去设置对应的postDict参数即可,对应的内部处理POST数据,都是以'&'为分隔符的。
但是,有些特殊的POST,POST的数据是以回车为分隔符的,比如之前折腾【记录】给BlogsToWordPress添加支持导出网易的心情随笔时遇到这种特殊情况,此时,才需要你用到此去设置postDataStr
readWriteTimeout指的是,针对于获得了response后,用SteamReader去read或write时,对应的超时时间。单位是毫秒ms。
readWriteTimeout的默认值是defReadWriteTimeout
defReadWriteTimeout值是30000毫秒==30秒:
private const int defReadWriteTimeout = 30 * 1000;
注意,参考微软官网的解释:HttpWebRequest.ReadWriteTimeout 属性 其默认的ReadWriteTimeout是300秒=5分钟,太长了。
所以,此处才把默认时间改短一些的,否则,5分钟的超时时间,太长了。
此参数,是经过多次折腾后,才搞明白的,详见:【已解决】C#中在GetResponseStream得到的Stream后,通过StreamReader去ReadLine或ReadToEnd会无限期挂掉 + 给StreamReader添加Timeout支持
getUrlResponse参数太多,但是其实也是自己一点点,从无到有,加进去的,以适应各种应用需求。
此处,就来通过例子来说明,如何使用此getUrlResponse函数。
其实,此处的getUrlResponse,在绝大多数的时候,都是被,我的另外一个函数:getUrlRespHtml,所调用的。
即,getUrlRespHtml,调用,getUrlResponse,获得对应的HttpWebResponse,然后后续再处理,得到返回的html的。
所以,用起来,一般都是这样的:
例 9.5. getUrlResponse 的使用范例:被getUrlRespHtml调用
// valid charset:"GB18030"/"UTF-8", invliad:"UTF8" public string getUrlRespHtml(string url, Dictionary<string, string> headerDict = defHeaderDict, string charset = defCharset, Dictionary<string, string> postDict = defPostDict, int timeout = defTimeout, string postDataStr = defPostDataStr, int readWriteTimeout = defReadWriteTimeout) { string respHtml = ""; HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr, readWriteTimeout);
关于此种用法,更详细的代码和解释,参见下面要介绍的:第 9.6 节 “获得Url地址返回的网页内容:getUrlRespHtml”
getUrlResponse的相对次要的用法是:当有时候,不仅仅需要html,而且也要关心和处理HttpWebResponse时,此时,才会考虑直接调用getUrlResponse(而不是去调用getUrlRespHtml)
而直接使用getUrlResponse的话,相对简单的用法就是,只传入对应的url即可:
例 9.6. getUrlResponse 的使用范例:只传入url
const string constSkydriveUrl = "https://skydrive.live.com/"; HttpWebResponse resp = getUrlResponse(constSkydriveUrl);
// valid charset:"GB18030"/"UTF-8", invliad:"UTF8" public string getUrlRespHtml(string url, Dictionary<string, string> headerDict = defHeaderDict, string charset = defCharset, Dictionary<string, string> postDict = defPostDict, int timeout = defTimeout, string postDataStr = defPostDataStr, int readWriteTimeout = defReadWriteTimeout) { string respHtml = ""; HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr, readWriteTimeout); //long realRespLen = resp.ContentLength; if (resp != null) { StreamReader sr; Stream respStream = resp.GetResponseStream(); if (!string.IsNullOrEmpty(charset)) { Encoding htmlEncoding = Encoding.GetEncoding(charset); sr = new StreamReader(respStream, htmlEncoding); } else { sr = new StreamReader(respStream); } try { respHtml = sr.ReadToEnd(); //while (!sr.EndOfStream) //{ // respHtml = respHtml + sr.ReadLine(); //} //string curLine = ""; //while ((curLine = sr.ReadLine()) != null) //{ // respHtml = respHtml + curLine; //} ////http://msdn.microsoft.com/zh-cn/library/system.io.streamreader.peek.aspx //while (sr.Peek() > -1) //while not error or not reach end of stream //{ // respHtml = respHtml + sr.ReadLine(); //} //respStream.Close(); //sr.Close(); //resp.Close(); } catch (Exception ex) { //【未解决】C#中StreamReader中遇到异常:未处理ObjectDisposedException,无法访问已关闭的流 //http://www.crifan.com/csharp_streamreader_unhandled_exception_objectdisposedexception_cannot_access_closed_stream //System.ObjectDisposedException respHtml = ""; } finally { if (respStream != null) { respStream.Close(); } if (sr != null) { sr.Close(); } if (resp != null) { resp.Close(); } } } return respHtml; }
很明显可以看出,此处的getUrlRespHtml的很多参数,和前面介绍的第 9.5 节 “获得Url地址的响应:getUrlResponse”非常类似。
此处,针对于getUrlRespHtml的参数,也要再解释一下:
其他参数,包括url,headerDict,postDict,timeout,postDataStr,readWriteTimeout,都和getUrlResponse的参数含义相同。所以不再赘述。
另外还有参数,需要解释一下:
charset表示返回的网页内容,用何种字符编码去解码。
charset默认值是defCharset
defCharset的值是:
private const string defCharset = null;
此处,之所以defCharset的值,不是我们所常见的GBK,UTF-8等等,是因为此处是为了支持,当不设置charset时,就不去尝试用某种编码去解码通过StreamReader所读取出来的内容。
这样的就可以获得,原始的,返回的html,可以供有需要的人,后期进行自己的处理,比如自己去解码等等。
getUrlRespHtml内部,已经实现了足够多的,相对比较复杂的功能,对此,需要详细解释一下:
getUrlRespHtml内部调用getUrlResponse,内部已经加上了对应的User-Agent了。
当然默认使用的是IE8的User-Agent,相关部分的代码是:
//IE7 const string constUserAgent_IE7_x64 = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)"; //IE8 const string constUserAgent_IE8_x64 = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E"; //IE9 const string constUserAgent_IE9_x64 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64 const string constUserAgent_IE9_x86 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86 //Chrome const string constUserAgent_Chrome = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4"; //Mozilla Firefox const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"; private string gUserAgent; gUserAgent = constUserAgent_IE8_x64; req.UserAgent = gUserAgent;
所以,不会出现,被服务器当做普通的机器人或蜘蛛爬虫的情况。
内部相关代码:
req.AllowAutoRedirect = true;
默认是启用了自动跳转的。
如果想要禁止自动跳转,可以去给headerDict中加上对应的"AllowAutoRedirect"为"false"的参数
更多使用实例,详见后续的例子。
内部相关代码:
req.Headers["Accept-Encoding"] = "gzip, deflate"; //req.AutomaticDecompression = DecompressionMethods.GZip; req.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
即前面所解释的参数:第 9.5.1.4 节 “getUrlResponse的参数:timeout”,指的是网络方面的超时,和GetResponse和GetRequestStream有关
内部相关部分的代码是:
if (timeout > 0) { req.Timeout = timeout; }
即前面所解释的参数:第 9.5.1.6 节 “getUrlResponse的参数:readWriteTimeout”,指的是StreamReader或StreamWriter的读写超时,和readLine之类的有关。
内部相关部分的代码是:
if (readWriteTimeout > 0) { //default ReadWriteTimeout is 300000=300 seconds = 5 minutes !!! //too long, so here change to 300000 = 30 seconds //for support TimeOut for later StreamReader's ReadToEnd req.ReadWriteTimeout = readWriteTimeout; }
此处已经支持,getUrlRespHtml内部,自动处理cookie。
内部相关部分的代码是:
CookieCollection curCookies = null; curCookies = new CookieCollection(); if (curCookies != null) { req.CookieContainer = new CookieContainer(); req.CookieContainer.PerDomainCapacity = 40; // following will exceed max default 20 cookie per domain req.CookieContainer.Add(curCookies); } resp = (HttpWebResponse)req.GetResponse(); updateLocalCookies(resp.Cookies, ref curCookies);
其中,注意到,设置了最大支持40个cookie,是因为,之前折腾InsertSkydriveFiles期间,遇到相对极端的情况:cookie超过默认的20多个,一个CookieContainer都装不下了,所以才改为40个,以便支持如此多的cookie。
getUrlRespHtml的参数够多,用法,也有很多种。
此处,就来通过例子来说明,如何使用此getUrlResponse函数。
getUrlRespHtml最常用,也是最简单的用法,就是:直接传入url,然后获得返回的html
代码如下:
例 9.7. getUrlRespHtml用法示例:只传入url而获得html
string mainJsUrl = "http://image.songtaste.com/inc/main.js"; string respHtmlMainJs = getUrlRespHtml(mainJsUrl);
其中,getUrlRespHtml内部,会自动帮你处理各种细节,比如cookie,header中的User-Agent等等内容,而你就直接可以得到对应返回的html了。
很多时候,在折腾抓取网页和模拟登陆时,都会遇到,需要额外再指定一些header,用于实现一些特定的目的。
比如,添加对应的Referer,以便成功模拟网页逻辑,获得所需返回的内容的:
string tmpRespHtml = ""; Dictionary<string, string> headerDict; //(1)to get cookies string pageRankMainUrl = "http://pagerank.webmasterhome.cn/"; tmpRespHtml = getUrlRespHtml(pageRankMainUrl); //(2)ask page rank string firstBaseUrl = "http://pagerank.webmasterhome.cn/?domain="; //http://pagerank.webmasterhome.cn/?domain=answers.yahoo.com string firstWholeUrl = firstBaseUrl + noHttpPreDomainUrl; headerDict = new Dictionary<string, string>(); headerDict.Add("referer", pageRankMainUrl); tmpRespHtml = getUrlRespHtml(firstWholeUrl, headerDict: headerDict);
如第 9.6.2.2 节 “默认是允许自动跳转的”所述,默认是启用了自动跳转的,想要禁止自动跳转,可以通过header去设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("AllowAutoRedirect", "false"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
此处默认的Accept是"*/*",如果想要指定不同的类型,可以手动通过header去设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Accept", "text/html"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
关于Accept更多可能的取值,自己参考官网的解释:14.1 Accept
此处默认的KeepAlive是true的,如果不想继续保持连接,则可以通过header去禁止:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Keep-Alive", "false"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
此处默认没有指定Accept-Language,有需要的话,可以去通过header设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Accept-Language", "en-US"); //"zh-CN" string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
关于Accept-Language更多可能的取值,自己参考官网的解释:14.4 Accept-Language
如第 9.6.2.1 节 “内部已默认指定了IE8的User-Agent”所述,我此处的getUrlRespHtml,默认添加的User-Agent是IE8的。
如果有需要,你可以自己换成别的,比如Firefox的User-Agent:
//Mozilla Firefox const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"; Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("User-Agent", constUserAgent_Firefox); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
其中,关于各种浏览器的User-Agent,你可以自己去网络上找到。也可以参考我代码中的值:
//IE7 const string constUserAgent_IE7_x64 = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)"; //IE8 const string constUserAgent_IE8_x64 = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E"; //IE9 const string constUserAgent_IE9_x64 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64 const string constUserAgent_IE9_x86 = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86 //Chrome const string constUserAgent_Chrome = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4"; //Mozilla Firefox const string constUserAgent_Firefox = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6";
此处默认情况下,对于GET,没有指定ContentType,对于POST,已经指定了"application/x-www-form-urlencoded"。
如果你有别的特殊需求,需要设置ContentType的话,可以去通过header设置:
Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Content-Type", "text/plain"); string respHtml = getUrlRespHtml(yourUrl, headerDict: headerDict);
关于Content-Type更多可能的取值,自己参考官网的解释:14.17 Content-Type
在很多时候,都需要设置,某些其他的,非标准的,header信息,则也可以去通过header设置。
比如,之前折腾InsertSkydriveFiles时所用到的:
string createFolerUrl = "https://skydrive.live.com/API/2/AddFolder?lct=1"; Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Accept", "application/json"); headerDict.Add("Referer", constSkydriveUrl); headerDict.Add("Canary", gCanary); headerDict.Add("Appid", gAppid); headerDict.Add("X-Requested-With", "XMLHttpRequest"); headerDict.Add("Cache-Control", "no-cache"); string postDataStr = genCreateFolderPostData(folderName, parentId, cid); respJson = getUrlRespHtml(createFolerUrl, headerDict:headerDict, postDataStr:postDataStr);
有时候,已经网页是某种编码的,所以为了正确解析返回的html,需要指定对应的字符编码charset:
string songtasteUserUrl = "http://www.songtaste.com/user/351979/"; string songtasteHtmlCharset = "GB18030"; string respHtmlUnicode = getUrlRespHtml(songtasteUserUrl, charset:songtasteHtmlCharset);
即可返回对应的,已经解码后的,Unicode字符串了。
如果你觉得默认的网络超时时间30秒不合适,可以自己另外指定,比如:
int timeoutInMilliSec = 10 * 1000; string respHtml = getUrlRespHtml(someUrl, timeout:timeoutInMilliSec);
如果你觉得默认的Stream的读写超时时间30秒不合适,可以自己另外指定,比如:
int streamRdWrTimeout = 20 * 1000; string respHtml = getUrlRespHtml(someUrl, readWriteTimeout:streamRdWrTimeout);
在模拟登陆时,往往会用到POST,会传递对应的POST数据
此处,主要有两种方式传递POST数据:
一般都是通过postDict传递数据进去
然后内部通过quoteParas转换为对应的post data,是以"&"为分隔符的。
个别情况下,特殊的情况下,会用到此postDataStr
其传递的post数据,是以换行为分隔符的。此时需要,不设置postDict(默认为null),然后设置对应的postDataStr即可。
下面,针对两种情况,都给出对应的多个示例来说明如何使用:
比如,之前折腾:第 9.11 节 “查找获得域名的Page Rank:getDomainPageRank”时所用到的:
//Method 1: use http://www.pagerankme.com/ queryUrl = "http://www.pagerankme.com/"; postDict = new Dictionary<string, string>(); postDict.Add("url", domainUrl); respHtml = getUrlRespHtml(queryUrl, postDict: postDict);
比如,之前折腾:DownloadSongtasteMusic时所用到的:
const string stHtmlCharset = "GB18030"; Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("x-requested-with", "XMLHttpRequest"); // when click play // access http://songtaste.com/time.php, post data: //str=5bf271ccad05f95186be764f725e9aaf07e0c7791a89123a9addb2a239179e64c91834c698a9c5d82f1ced3fe51ffc51&sid=3015123&t=0 Dictionary<string, string> postDict = new Dictionary<string, string>(); postDict.Add("str", str); postDict.Add("sid", sid); postDict.Add("t", "0"); string getRealAddrUrl = "http://songtaste.com/time.php"; songInfo.realAddr = crl.getUrlRespHtml(getRealAddrUrl, headerDict:headerDict, postDict:postDict, charset:stHtmlCharset);
比如,之前折腾:【未解决】通过百度API上传单个文件出现403的错误时所遇到的就是,post数据是以换行符非分隔符的,所以就要去直接设置对应的postDataStr:
string[] token = respTokenJson.Split(','); string tokenStr = token[2].Split(':')[1].Trim('"'); byte[] fileBytes = null; string filename = "fileForUpload2.txt"; string fullFilePath = @"d:\" + filename; using (FileStream fs = new FileStream(fullFilePath, FileMode.Open)) { fileBytes = new byte[fs.Length]; fs.Read(fileBytes, 0, fileBytes.Length); } StringBuilder buffer = new StringBuilder(); char[] fileCh = new char[fileBytes.Length]; for (int i = 0; i < fileBytes.Length; i++) fileCh[i] = (char)fileBytes[i]; buffer.Append(fileCh); //postDict = new Dictionary<string, string>(); //postDict.Add("file", buffer.ToString()); string postDataStr = buffer.ToString(); string uploadSingleFileUrl = "https://pcs.baidu.com/rest/2.0/pcs/file?"; Dictionary<string, string> queryParaDict = new Dictionary<string, string>(); queryParaDict.Add("method", "upload"); queryParaDict.Add("access_token", tokenStr); queryParaDict.Add("path", "/apps/测试应用/" + filename); uploadSingleFileUrl += crifanLib.quoteParas(queryParaDict); curCookies = crifanLib.getCurCookies(); newCookies = new CookieCollection(); foreach (Cookie ck in curCookies) { if (ck.Name == "BAIDUID" || ck.Name == "BDUSS") { ck.Domain = "pcs.baidu.com"; } newCookies.Add(ck); } crifanLib.setCurCookies(newCookies); string boundaryValue = "----WebKitFormBoundaryS0JIa4uHF7yHd8xJ"; string boundaryExpression = "boundary=" + boundaryValue; headerDict = new Dictionary<string, string>(); headerDict.Add("Pragma", "no-cache"); headerDict.Add("Content-Type", "multipart/form-data;" + " " + boundaryExpression); postDataStr = boundaryValue + "\r\n" + "Content-Disposition: form-data; name=\"file\"" + "\r\n" + postDataStr + "\r\n" + boundaryValue; //string str = crifanLib.getUrlRespHtml( // string.Format(@"https://pcs.baidu.com/rest/2.0/pcs/file?method=upload&path=%2Fapps%2F%E6%B5%8B%E8%AF%95%E5%BA%94%E7%94%A8%2F78.jpg&access_token={0}", tokenStr), // headerDict, postDict); string respJson = crifanLib.getUrlRespHtml(uploadSingleFileUrl, headerDict:headerDict, postDataStr: postDataStr);
比如,之前折腾:【记录】给BlogsToWordPress添加支持导出网易的心情随笔时所遇到的就是,post数据是以换行符非分隔符的,所以就要去直接设置对应的postDataStr:
string postDataStr = "callCount=1" + "\r\n" + "scriptSessionId=${scriptSessionId}187" + "\r\n" + "c0-scriptName=BlogBeanNew" + "\r\n" + "c0-methodName=getBlogs" + "\r\n" + "c0-id=0" + "\r\n" + "c0-param0=" + "number:" + userId + "\r\n" + "c0-param1=" + "number:" + startBlogIdx + "\r\n" + "c0-param2=" + "number:" + onceGetNum; //http://api.blog.163.com/ni_chen/dwr/call/plaincall/BlogBeanNew.getBlogs.dwr string getBlogsDwrMainUrl = blogApi163 + "/" + blogUser + "/" + "dwr/call/plaincall/BlogBeanNew.getBlogs.dwr"; Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict = new Dictionary<string, string>(); //Referer http://api.blog.163.com/crossdomain.html?t=20100205 headerDict.Add("Referer", "http://api.blog.163.com/crossdomain.html?t=20100205"); headerDict.Add("Content-Type", "text/plain"); string blogsRespHtml = getUrlRespHtml(getBlogsDwrMainUrl, headerDict:headerDict, postDataStr:postDataStr);
默认的getUrlRespHtml只允许一次,即当出错时,就返回空字符串了,就不再继续了。
此处的getUrlRespHtml_multiTry,是带多次尝试的版本。
其完整代码是:
public string getUrlRespHtml_multiTry (string url, Dictionary<string, string> headerDict = defHeaderDict, string charset = defCharset, Dictionary<string, string> postDict = defPostDict, int timeout = defTimeout, string postDataStr = defPostDataStr, int readWriteTimeout = defReadWriteTimeout, int maxTryNum = defMaxTryNum, int retryFailSleepTime = defRetryFailSleepTime) { string respHtml = ""; for (int tryIdx = 0; tryIdx < maxTryNum; tryIdx++) { respHtml = getUrlRespHtml(url, headerDict, charset, postDict, timeout, postDataStr, readWriteTimeout); if (!string.IsNullOrEmpty(respHtml)) { break; } else { //something wrong //maybe network is not stable //so wait some time, then re-do it System.Threading.Thread.Sleep(retryFailSleepTime); } } return respHtml; }
很明显可以看出,此处的getUrlRespHtml_multiTry的很多参数,和前面介绍的第 9.6 节 “获得Url地址返回的网页内容:getUrlRespHtml”非常类似。
此处,还有另外两个参数,需要解释一下:
maxTryNum表示最大(当出错时)重试次数。
maxTryNum默认值是defMaxTryNum
defMaxTryNum的值是5:
private const int defMaxTryNum = 5;
当你需要,在出错时,重试更多次,则可以修改此参数。
retryFailSleepTime表示在每次出错之后,sleep的时间。
retryFailSleepTime默认值是defRetryFailSleepTime
defRetryFailSleepTime的值是100毫秒:
private const int defRetryFailSleepTime = 100; //sleep time in ms when retry fail for getUrlRespHtml
此处,是为了,尽量适应网络不稳定等异常情况,在出错后,sleep一段时间重试,以希望实现,网络不稳定的时候,经过多次尝试,且每次错误后会sleep,达到增大网络访问成功的机会。
例 9.8. getUrlRespHtml_multiTry 的使用范例
//respHtml = crl.getUrlRespHtml(viewHtmlUrl); respHtml = crl.getUrlRespHtml_multiTry(viewHtmlUrl);
public int getUrlRespStreamBytes(ref Byte[] respBytesBuf, string url, Dictionary<string, string> headerDict, Dictionary<string, string> postDict, int timeout, Action<int> funcUpdateProgress) { int realReadoutLen = 0; getUrlRespStreamBytes_bw(ref respBytesBuf, url, headerDict, postDict, timeout, funcUpdateProgress); while (bNotCompleted_download) { System.Windows.Forms.Application.DoEvents(); } realReadoutLen = gRealReadoutLen; //clear gRealReadoutLen = 0; return realReadoutLen; }
例 9.9. getUrlRespStreamBytes 的使用范例
public bool downloadStMusicFile(string musicRealAddr, string fullnameToStore, out string errStr, Action<int> funcUpdateProgress) { bool downloadOk = false; errStr = "未知错误!"; if (musicRealAddr == null || musicRealAddr == "" || fullnameToStore == null || fullnameToStore == "") { errStr = "Songtaste歌曲真实的地址无效!"; return downloadOk; } Dictionary<string, string> headerDict = new Dictionary<string, string>(); //headerDict.Add("Referer", "http://songtaste.com/music/"); headerDict.Add("Referer", "http://songtaste.com/"); //const int maxMusicFileLen = 100 * 1024 * 1024; // 100M const int maxMusicFileLen = 300 * 1024 * 1024; // 300M Byte[] binDataBuf = new Byte[maxMusicFileLen]; int respDataLen = crl.getUrlRespStreamBytes(ref binDataBuf, musicRealAddr, headerDict, null, 0, funcUpdateProgress); if (respDataLen < 0) { errStr = "无法读取歌曲数据!"; return downloadOk; }
//----------------------------------------------------------------------------- //translate strToTranslate from fromLanguage to toLanguage //return the translated string //return empty string if error //some frequently used language abbrv: //Chinese Simplified: zh-CN //Chinese Traditional: zh-TW //English: en //German: de //Japanese: ja //Korean: ko //French: fr //more can be found at: //http://code.google.com/intl/ru/apis/language/translate/v2/using_rest.html#language-params public string translateString(string strToTranslate, string fromLanguage, string toLanguage) { string translatedStr = ""; string transRetHtml = ""; ////following refer: http://python.u85.us/viewnews-335.html //string googleTranslateUrl = "http://translate.google.cn/translate_t"; //Dictionary<string, string> postDict = new Dictionary<string, string>(); //postDict.Add("hl", "zh-CN"); //postDict.Add("ie", "UTF-8"); //postDict.Add("text", strToTranslate); //postDict.Add("langpair", fromLanguage + "|" + toLanguage); //const string googleTransHtmlCharset = "UTF-8"; //string transRetHtml = getUrlRespHtml(googleTranslateUrl, charset:googleTransHtmlCharset, postDict:postDict); ////http://translate.google.cn/#zh-CN/en/%E4%BB%96%E4%BB%AC%E6%98%AF%E8%BF%99%E6%A0%B7%E8%AF%B4%E7%9A%84 //string googleTransBaseUrl = "http://translate.google.cn/#"; //strToTranslate = "他们是这样说的"; //string encodedStr = HttpUtility.UrlEncode(strToTranslate); //string googleTransUrl = googleTransBaseUrl + fromLanguage + "/" + toLanguage + "/" + encodedStr; //string transRetHtml = getUrlRespHtml(googleTransUrl); //http://translate.google.cn/translate_a/t?client=t&text=%E4%BB%96%E4%BB%AC%E6%98%AF%E8%BF%99%E6%A0%B7%E8%AF%B4%E7%9A%84&hl=zh-CN&sl=zh-CN&tl=en&ie=UTF-8&oe=UTF-8&multires=1&ssel=0&tsel=0&sc=1 //strToTranslate = "他们是这样说的"; string encodedStr = HttpUtility.UrlEncode(strToTranslate); string googleTransBaseUrl = "http://translate.google.cn/translate_a/t?"; string googleTransUrl = googleTransBaseUrl; googleTransUrl += "&client=" + "t"; googleTransUrl += "&text=" + encodedStr; googleTransUrl += "&hl=" + "zh-CN"; googleTransUrl += "&sl=" + fromLanguage;// source language googleTransUrl += "&tl=" + toLanguage; // to language googleTransUrl += "&ie=" + "UTF-8"; // input encode googleTransUrl += "&oe=" + "UTF-8"; // output encode try { transRetHtml = getUrlRespHtml_multiTry(googleTransUrl); //[[["They say","他们是这样说的","","Tāmen shì zhèyàng shuō de"]],,"zh-CN",,[["They",[5],0,0,1000,0,1,0],["say",[6],1,0,1000,1,2,0]],[["他们 是",5,[["They",1000,0,0],["they are",0,0,0],["they were",0,0,0],["that they are",0,0,0],["they are the",0,0,0]],[[0,3]],"他们是这样说的"],["这样 说",6,[["say",1000,1,0],["said",0,1,0],["say so",0,1,0],["says",0,1,0],["say this",0,1,0]],[[3,6]],""]],,,[["zh-CN"]],1] if (extractSingleStr(@"\[\[\[""(.+?)"","".+?"",", transRetHtml, out translatedStr)) { //extrac out:They say } } catch { // if pass some special string, such as "彭德怀", then will occur 500 error // here tmp not process the error, just omit it here } return translatedStr; }
例 9.10. translateString 的使用范例
string strToTranslate = "他们是这样说的"; string translatedStr = translateString(strToTranslate, "zh-CN", "en");
public string transZhcnToEn(string strToTranslate) { return translateString(strToTranslate, "zh-CN", "en"); }
例 9.11. transzhcntoen 的使用范例
string strToTranslate = "他们是这样说的"; string translatedEnglishStr = transZhcnToEn(strToTranslate);
//get page rank for some domain url //para: http://answers.yahoo.com //return: 7 public int getDomainPageRank(string domainUrl) { int pageRank = 0; string queryUrl = ""; string respHtml = ""; Dictionary<string, string> postDict = new Dictionary<string,string>(); string rankStr = ""; bool prevMethodFail = true; if ((pageRank == 0) && prevMethodFail) { //Method 1: use http://www.pagerankme.com/ queryUrl = "http://www.pagerankme.com/"; postDict = new Dictionary<string, string>(); postDict.Add("url", domainUrl); respHtml = getUrlRespHtml_multiTry(queryUrl, postDict: postDict); //<a href="http://www.pagerankme.com" target="_blank" style="text-decoration:none;color:#000000;">PageRank 7</a> rankStr = ""; if (extractSingleStr(@"<a href=""http://www\.pagerankme\.com"" target=""_blank"" style="".+?"">PageRank (\d+)</a>", respHtml, out rankStr)) { pageRank = Int32.Parse(rankStr); prevMethodFail = false; } else { prevMethodFail = true; } } if ((pageRank == 0) && prevMethodFail) { //Method 2: use http://moonsy.com/pagerank_checker/ //(1) http://moonsy.com/pagerank_checker/ queryUrl = "http://moonsy.com/pagerank_checker/"; postDict = new Dictionary<string, string>(); postDict.Add("domain", domainUrl); postDict.Add("Submit", "CHECK"); respHtml = getUrlRespHtml_multiTry(queryUrl, postDict: postDict); //<h3>Your Page Rank: 7/10 rankStr = ""; if (extractSingleStr(@"<h3>Your Page Rank.+?(\d+)/10", respHtml, out rankStr)) { pageRank = Int32.Parse(rankStr); prevMethodFail = false; } else { prevMethodFail = true; } } if ((pageRank == 0) && prevMethodFail) { //Method 3: use http://pagerank.webmasterhome.cn/ string noHttpPreDomainUrl = Regex.Replace(domainUrl, "((https)|(http)|(ftp))://", ""); //http://pagerank.webmasterhome.cn/prLoading.asp?domain=answers.yahoo.com string tmpRespHtml = ""; Dictionary<string, string> headerDict; //(1)to get cookies string pageRankMainUrl = "http://pagerank.webmasterhome.cn/"; tmpRespHtml = getUrlRespHtml_multiTry(pageRankMainUrl); //(2)ask page rank string firstBaseUrl = "http://pagerank.webmasterhome.cn/?domain="; //http://pagerank.webmasterhome.cn/?domain=answers.yahoo.com string firstWholeUrl = firstBaseUrl + noHttpPreDomainUrl; headerDict = new Dictionary<string, string>(); headerDict.Add("referer", pageRankMainUrl); tmpRespHtml = getUrlRespHtml_multiTry(firstWholeUrl, headerDict: headerDict); string baseUrl = "http://pagerank.webmasterhome.cn/prLoading.asp?domain="; //http://pagerank.webmasterhome.cn/prLoading.asp?domain=answers.yahoo.com queryUrl = baseUrl + noHttpPreDomainUrl; headerDict = new Dictionary<string, string>(); headerDict.Add("referer", firstWholeUrl); respHtml = getUrlRespHtml_multiTry(queryUrl, headerDict: headerDict); //'<img src=\"http://primg.webmasterhome.cn/pr7.gif\" style=\"width:40px;height:5px;border:0px;\" alt=PageRank align=absmiddle> (7/10)' rankStr = ""; if (extractSingleStr(@"\((\d+)/10\)", respHtml, out rankStr)) { pageRank = Int32.Parse(rankStr); prevMethodFail = false; } else { prevMethodFail = true; } } //TODO: //Google PR (PageRank) Checker //http://www.searchbliss.com/seo-tools/google-pagerank-checker.php //tmp is "We're sorry, the Google PR check is currently being repaired." //future: if Ok, mayby can use it return pageRank; }
例 9.12. getDomainPageRank 的使用范例
public struct searchItemInfo { public string title; public string googleUrl; // with google appendix public string originalUrl; public string description; //add domain url and rank public string domainUrl; public int pageRank; public int alexaRank; }; singleItemInfo.domainUrl = crifanLib.getDomainUrl(singleItemInfo.originalUrl); singleItemInfo.pageRank = crifanLib.getDomainPageRank(singleItemInfo.domainUrl); singleItemInfo.alexaRank = crifanLib.getDomainAlexaRank(singleItemInfo.domainUrl);
//get alexa rank for some domain url //para: http://answers.yahoo.com //return: 4 public int getDomainAlexaRank(string domainUrl) { int alexaRank = 0; string queryUrl = ""; string respHtml = ""; Dictionary<string, string> postDict = new Dictionary<string, string>(); string alexaRankStr = ""; bool prevMethodFail = true; //string noHttpPreDomainUrl = Regex.Replace(domainUrl, "((https)|(http)|(ftp))://", ""); if ((alexaRank == 0) && prevMethodFail) { //Method 1: use http://www.searchbliss.com/rank.asp string mainUrl = "http://www.searchbliss.com/rank.asp"; respHtml = getUrlRespHtml_multiTry(mainUrl); //<input type="hidden" name="RAC" value="EIS"> string accessCode = ""; if (extractSingleStr(@"<input\s+type=""hidden""\s+name=""RAC""\s+value=""([A-Z]+)"">", respHtml, out accessCode)) { queryUrl = "http://www.searchbliss.com/rank.asp"; //AC EIS //RAC EIS //rank http://hubpages.com postDict = new Dictionary<string, string>(); //postDict.Add("domain", noHttpPreDomainUrl); postDict.Add("AC", accessCode); postDict.Add("RAC", accessCode); postDict.Add("rank", domainUrl); respHtml = getUrlRespHtml_multiTry(queryUrl, postDict: postDict); //<a href="http://www.alexa.com/data/details/main/http://hubpages.com" target="_blank">444</a> if (extractSingleStr(@"<a\s+href=""http://www\.alexa\.com/data/details/main/.+?""\s+target=""_blank"">(\d+)</a>", respHtml, out alexaRankStr)) { //alexaRank = Int32.Parse(alexaRankStr); if (Int32.TryParse(alexaRankStr, out alexaRank)) { prevMethodFail = false; } else { prevMethodFail = true; } prevMethodFail = false; } else { prevMethodFail = true; } } else { prevMethodFail = true; } } #if USE_HTML_PARSER_HTMLAGILITYPACK if ((alexaRank == 0) && prevMethodFail) { //Method 2: use http://www.alexa.com/ string tmpUrl = "http://www.alexa.com"; //to get cookies string tmpRespHtml = getUrlRespHtml_multiTry(tmpUrl); //then do work queryUrl = "http://www.alexa.com/search"; //http://www.alexa.com/search?q=crifan.com&r=home_home&p=bigtop queryUrl += "?q=" + domainUrl; queryUrl += "&r=" + "home_home"; queryUrl += "&p=" + "bigtop"; respHtml = getUrlRespHtml_multiTry(queryUrl); HtmlAgilityPack.HtmlDocument htmlDoc = htmlToHtmlDoc(respHtml); HtmlNode rootHtmlNode = htmlDoc.DocumentNode; //<span> //<img class="align-top" src="/images/icons/globe-sm.gif" /> //<span class="traffic-stat-label">Alexa Traffic Rank:</span> //<a href="/siteinfo/yahoo.com#trafficstats"> //4</a> //</span> //<span class="traffic-stat-label">Alexa Traffic Rank:</span> //<a href="/siteinfo/crifan.com#trafficstats"> //170,557</a> //</span> //HtmlNode trafficHtmlNode = rootHtmlNode.SelectSingleNode("//span/span[@class='traffic-stat-label']/a[@href]"); //HtmlNode trafficHtmlNode = rootHtmlNode.SelectSingleNode("//span/span[@class='traffic-stat-label']/a]"); //HtmlNodeCollection trafficHtmlNodes = rootHtmlNode.SelectNodes("//span/span[@class='traffic-stat-label']"); HtmlNode trafficHtmlNode = rootHtmlNode.SelectSingleNode("//span/span[@class='traffic-stat-label']"); if ((trafficHtmlNode != null) && (trafficHtmlNode.InnerText.StartsWith("Alexa Traffic Rank:"))) { HtmlNode parentHtmlNode = trafficHtmlNode.ParentNode; HtmlNode aHrefNode = parentHtmlNode.SelectSingleNode(".//a[@href]"); string tracfficNumberStr = aHrefNode.InnerText; alexaRankStr = tracfficNumberStr.Trim().Replace(",", ""); //speical: //"No Data" //alexaRank = Int32.Parse(alexaRankStr); if(Int32.TryParse(alexaRankStr, out alexaRank)) { prevMethodFail = false; } else { prevMethodFail = true; } } else { prevMethodFail = true; } } #endif if ((alexaRank == 0) && prevMethodFail) { //Method 3: use http://moonsy.com/alexa_rank/ //(1) http://moonsy.com/alexa_rank/ queryUrl = "http://moonsy.com/alexa_rank/"; postDict = new Dictionary<string, string>(); //postDict.Add("domain", noHttpPreDomainUrl); postDict.Add("domain", domainUrl); postDict.Add("Submit", "CHECK"); respHtml = getUrlRespHtml_multiTry(queryUrl, postDict: postDict); //<h2>Alexa Rank of <b>ANSWERS.YAHOO.COM</b> is : <b>4</b></h2> alexaRankStr = ""; if (extractSingleStr(@"<h2>Alexa Rank of.+?is.+?(\d+).+?</h2>", respHtml, out alexaRankStr)) { //alexaRank = Int32.Parse(alexaRankStr); if (Int32.TryParse(alexaRankStr, out alexaRank)) { prevMethodFail = false; } else { prevMethodFail = true; } prevMethodFail = false; } else { prevMethodFail = true; } } //TODO: //maybe future can use: //http://www.dakola.com/tools/alexa/ return alexaRank; }
例 9.13. getDomainAlexaRank 的使用范例
public struct searchItemInfo { public string title; public string googleUrl; // with google appendix public string originalUrl; public string description; //add domain url and rank public string domainUrl; public int pageRank; public int alexaRank; }; singleItemInfo.domainUrl = crifanLib.getDomainUrl(singleItemInfo.originalUrl); singleItemInfo.pageRank = crifanLib.getDomainPageRank(singleItemInfo.domainUrl); singleItemInfo.alexaRank = crifanLib.getDomainAlexaRank(singleItemInfo.domainUrl);
目录
调用对应的FolderBrowserDialog控件,得到用户所选的(保存文件的)路径
public string getSaveFolder(FolderBrowserDialog fbdSave) { string saveFolderPath = ""; //string saveFolderPath = System.Environment.CurrentDirectory; //fbdSaveFolder.SelectedPath = System.Environment.CurrentDirectory; DialogResult saveFolderResult = fbdSave.ShowDialog(); if (saveFolderResult == System.Windows.Forms.DialogResult.OK) { saveFolderPath = fbdSave.SelectedPath; } else if (saveFolderResult == System.Windows.Forms.DialogResult.Cancel) { saveFolderPath = ""; } return saveFolderPath; }
例 10.1. getSaveFolder 的使用范例
//private System.Windows.Forms.FolderBrowserDialog fbdSaveFolder; string saveFolderPath = getSaveFolder(fbdSaveFolder);
//save binary bytes into file public bool saveBytesToFile(string fileToSave, ref Byte[] bytes, int dataLen, out string errStr) { bool saveOk = false; errStr = "未知错误!"; try { int bufStartPos = 0; int bytesToWrite = dataLen; FileStream fs; fs = File.Create(fileToSave, bytesToWrite); fs.Write(bytes, bufStartPos, bytesToWrite); fs.Close(); saveOk = true; } catch (Exception ex) { errStr = ex.Message; } return saveOk; }
例 10.2. saveBytesToFile 的使用范例
public bool downloadStMusicFile(string musicRealAddr, string fullnameToStore, out string errStr, Action<int> funcUpdateProgress) { bool downloadOk = false; errStr = "未知错误!"; if (musicRealAddr == null || musicRealAddr == "" || fullnameToStore == null || fullnameToStore == "") { errStr = "Songtaste歌曲真实的地址无效!"; return downloadOk; } Dictionary<string, string> headerDict = new Dictionary<string, string>(); //headerDict.Add("Referer", "http://songtaste.com/music/"); headerDict.Add("Referer", "http://songtaste.com/"); //const int maxMusicFileLen = 100 * 1024 * 1024; // 100M const int maxMusicFileLen = 300 * 1024 * 1024; // 300M Byte[] binDataBuf = new Byte[maxMusicFileLen]; int respDataLen = crl.getUrlRespStreamBytes(ref binDataBuf, musicRealAddr, headerDict, null, 0, funcUpdateProgress); if (respDataLen < 0) { errStr = "无法读取歌曲数据!"; return downloadOk; } if (crl.saveBytesToFile(fullnameToStore, ref binDataBuf, respDataLen, out errStr)) { downloadOk = true; }
//download file from url //makesure destination folder exist before call this function //input para example: //http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KC/KC-slate-01-lg._V401028090_.jpg //download\B007OZNZG0\KC-slate-01-lg._V401028090_.jpg public bool downloadFile(string fileUrl, string fullnameToStore, out string errStr, Action<int> funcUpdateProgress) { bool downloadOk = false; errStr = "未知错误!"; if ((fileUrl == null) || (fileUrl == "")) { errStr = "URL地址为空!"; return downloadOk; } if ((fullnameToStore == null) || (fullnameToStore == "")) { errStr = "文件保存路径为空!"; return downloadOk; } //const int maxFileLen = 100 * 1024 * 1024; // 100M const int maxFileLen = 300 * 1024 * 1024; // 300M const int lessMaxFileLen = 100 * 1024 * 1024; // 100M Byte[] binDataBuf; try { binDataBuf = new Byte[maxFileLen]; } catch (Exception ex) { //if no enough memory, then try alloc less binDataBuf = new Byte[lessMaxFileLen]; } int respDataLen = getUrlRespStreamBytes(ref binDataBuf, fileUrl, null, null, 0, funcUpdateProgress); if (respDataLen < 0) { errStr = "无法下载文件数据!"; return downloadOk; } if (saveBytesToFile(fullnameToStore, ref binDataBuf, respDataLen, out errStr)) { downloadOk = true; } return downloadOk; }
例 10.3. downloadFile 的使用范例
public void updateProgress(int percentage) { //pgbDownload.Value = percentage; } public void downloadPictures(string productUrl, string respHtml, out string[] picFullnameList) { //...... string[] imageUrlList = amazonLib.extractProductImageList(respHtml); gLogger.Info("Extracted image url list:"); if (imageUrlList != null) { picFullnameList = new string[imageUrlList.Length]; for (int idx = 0; idx < imageUrlList.Length; idx++) { string imageUrl = imageUrlList[idx]; gLogger.Info(String.Format("[{0}]={1}", idx, imageUrl)); string picFilename = crl.extractFilenameFromUrl(imageUrl); string picFullFilename = Path.Combine(picFolderFullPath, picFilename); string errorStr = ""; gLogger.Info(String.Format("Downloading {0} to {1}", imageUrl, picFullFilename)); crl.downloadFile(imageUrl, picFullFilename, out errorStr, updateProgress);
//open folder and select file public void openFolderAndSelectFile(string fullFilename) { System.Diagnostics.Process.Start("Explorer.exe", "/select," + fullFilename); }
例 10.4. openFolderAndSelectFile 的使用范例
string outputFilename = txbExpAlertFilename.Text + ".xls"; string fullFilename = Path.Combine(saveFolderPath, outputFilename); //...... crifanLib.openFolderAndSelectFile(fullFilename);
//open file/url/... public void openFileDirectly(string fullFilename) { System.Diagnostics.Process.Start(fullFilename); }
例 10.5. openFileDirectly 的使用范例
private void btnOpenOutputFolder_Click(object sender, EventArgs e) { if (Directory.Exists(txbOutputFolder.Text)) { crl.openFileDirectly(txbOutputFolder.Text); } }
目录
// get current taskbar size(width, height), support 4 mode: taskbar bottom/right/up/left public Size getCurTaskbarSize() { int width = 0, height = 0; if ((Screen.PrimaryScreen.Bounds.Width == Screen.PrimaryScreen.WorkingArea.Width) && (Screen.PrimaryScreen.WorkingArea.Y == 0)) { //taskbar bottom width = Screen.PrimaryScreen.WorkingArea.Width; height = Screen.PrimaryScreen.Bounds.Height - Screen.PrimaryScreen.WorkingArea.Height; } else if ((Screen.PrimaryScreen.Bounds.Height == Screen.PrimaryScreen.WorkingArea.Height) && (Screen.PrimaryScreen.WorkingArea.X == 0)) { //taskbar right width = Screen.PrimaryScreen.Bounds.Width - Screen.PrimaryScreen.WorkingArea.Width; height = Screen.PrimaryScreen.WorkingArea.Height; } else if ((Screen.PrimaryScreen.Bounds.Width == Screen.PrimaryScreen.WorkingArea.Width) && (Screen.PrimaryScreen.WorkingArea.Y > 0)) { //taskbar up width = Screen.PrimaryScreen.WorkingArea.Width; //height = Screen.PrimaryScreen.WorkingArea.Y; height = Screen.PrimaryScreen.Bounds.Height - Screen.PrimaryScreen.WorkingArea.Height; } else if ((Screen.PrimaryScreen.Bounds.Height == Screen.PrimaryScreen.WorkingArea.Height) && (Screen.PrimaryScreen.WorkingArea.X > 0)) { //taskbar left width = Screen.PrimaryScreen.Bounds.Width - Screen.PrimaryScreen.WorkingArea.Width; height = Screen.PrimaryScreen.WorkingArea.Height; } return new Size(width, height); }
// get current taskbar position(X, Y), support 4 mode: taskbar bottom/right/up/left public System.Drawing.Point getCurTaskbarLocation() { int xPos = 0, yPos = 0; if ((Screen.PrimaryScreen.Bounds.Width == Screen.PrimaryScreen.WorkingArea.Width) && (Screen.PrimaryScreen.WorkingArea.Y == 0)) { //taskbar bottom xPos = 0; yPos = Screen.PrimaryScreen.WorkingArea.Height; } else if ((Screen.PrimaryScreen.Bounds.Height == Screen.PrimaryScreen.WorkingArea.Height) && (Screen.PrimaryScreen.WorkingArea.X == 0)) { //taskbar right xPos = Screen.PrimaryScreen.WorkingArea.Width; yPos = 0; } else if ((Screen.PrimaryScreen.Bounds.Width == Screen.PrimaryScreen.WorkingArea.Width) && (Screen.PrimaryScreen.WorkingArea.Y > 0)) { //taskbar up xPos = 0; yPos = 0; } else if ((Screen.PrimaryScreen.Bounds.Height == Screen.PrimaryScreen.WorkingArea.Height) && (Screen.PrimaryScreen.WorkingArea.X > 0)) { //taskbar left xPos = 0; yPos = 0; } return new System.Drawing.Point(xPos, yPos); }
// get current right bottom corner position(X, Y), support 4 mode: taskbar bottom/right/up/left public System.Drawing.Point getCornerLocation(Size windowSize) { int xPos = 0, yPos = 0; if ((Screen.PrimaryScreen.Bounds.Width == Screen.PrimaryScreen.WorkingArea.Width) && (Screen.PrimaryScreen.WorkingArea.Y == 0)) { //taskbar bottom xPos = Screen.PrimaryScreen.WorkingArea.Width - windowSize.Width; yPos = Screen.PrimaryScreen.WorkingArea.Height - windowSize.Height; } else if ((Screen.PrimaryScreen.Bounds.Height == Screen.PrimaryScreen.WorkingArea.Height) && (Screen.PrimaryScreen.WorkingArea.X == 0)) { //taskbar right xPos = Screen.PrimaryScreen.WorkingArea.Width - windowSize.Width; yPos = Screen.PrimaryScreen.WorkingArea.Height - windowSize.Height; } else if ((Screen.PrimaryScreen.Bounds.Width == Screen.PrimaryScreen.WorkingArea.Width) && (Screen.PrimaryScreen.WorkingArea.Y > 0)) { //taskbar up xPos = Screen.PrimaryScreen.WorkingArea.Width - windowSize.Width; yPos = Screen.PrimaryScreen.WorkingArea.Y; } else if ((Screen.PrimaryScreen.Bounds.Height == Screen.PrimaryScreen.WorkingArea.Height) && (Screen.PrimaryScreen.WorkingArea.X > 0)) { //taskbar left xPos = Screen.PrimaryScreen.WorkingArea.X; yPos = Screen.PrimaryScreen.WorkingArea.Height - windowSize.Height; } return new System.Drawing.Point(xPos, yPos); }
public string getCurVerStr() { string curVerStr = ""; Assembly asm = Assembly.GetExecutingAssembly(); FileVersionInfo fvi = FileVersionInfo.GetVersionInfo(asm.Location); curVerStr = String.Format("{0}.{1}", fvi.ProductMajorPart, fvi.ProductMinorPart); return curVerStr; }
目录
#if USE_HTML_PARSER_SGML //convert html to XML document public XmlDocument htmlToXmlDoc(string html) { // setup SgmlReader SgmlReader sgmlReader = new SgmlReader(); sgmlReader.DocType = "HTML"; sgmlReader.WhitespaceHandling = WhitespaceHandling.All; sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower; string decodedHtml = HttpUtility.HtmlDecode(html); sgmlReader.InputStream = new StringReader(decodedHtml); // create document XmlDocument xmlDoc = new XmlDocument(); xmlDoc.PreserveWhitespace = true; xmlDoc.XmlResolver = null; xmlDoc.Load(sgmlReader); return xmlDoc; } #endif
例 13.1. htmlToXmlDoc 的使用范例
//(1) with xmlns string withXmlnsUrl = "http://fiverr.com/gigs/search?utf8=%E2%9C%93&query=seo&x=15&y=13&page=2"; string withXmlnsHtml = getUrlRespHtml(withXmlnsUrl); XmlDocument xmlDocWithNs = htmlToXmlDoc(withXmlnsHtml);
另外,贴出,完整的示例代码:
//example code for html parse void _demoHtmlParse() { #if USE_HTML_PARSER_SGML //Method 1: use htmlToXmlDoc //(1) with xmlns string withXmlnsUrl = "http://fiverr.com/gigs/search?utf8=%E2%9C%93&query=seo&x=15&y=13&page=2"; string withXmlnsHtml = getUrlRespHtml(withXmlnsUrl); XmlDocument xmlDocWithNs = htmlToXmlDoc(withXmlnsHtml); //<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> //<html xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" > // <head> // ... XmlNamespaceManager m = new XmlNamespaceManager(xmlDocWithNs.NameTable); m.AddNamespace("w3org", "http://www.w3.org/1999/xhtml"); XmlNode titleNode = xmlDocWithNs.SelectSingleNode("//w3org:h1[@itemprop='name']", m); string title = titleNode.InnerText; //(2) without xmlns string withoutXmlnsUrl = "http://www.amazon.com/gp/new-releases/appliances/ref=zg_bsnr_nav_0"; //<!DOCTYPE html> //<html> //<head> //... string withoutXmlnsHtml = getUrlRespHtml(withoutXmlnsUrl); XmlDocument xmlDocNoNs = htmlToXmlDoc(withoutXmlnsHtml); XmlNodeList pageNodeList = xmlDocNoNs.SelectNodes("//ol[@class='zg_pagination']/li[@class]"); #endif //common part //how to use Attributes //XmlNodeList pageNodeList = xmlDoc.SelectNodes("//ol[@class='zg_pagination']/li[@class]"); //if (pageNodeList != null) //{ // for (int pageIdx = 1; pageIdx < pageNodeList.Count; pageIdx++) // { // XmlNode curPageNode = pageNodeList[pageIdx]; // //<li class="zg_page " id="zg_page2"><a page="2" ajaxUrl="http://www.amazon.com/gp/new-releases/appliances/ref=zg_bsnr_appliances_pg_2/191-0874592-3518518?ie=UTF8&pg=2&ajax=1" href="http://www.amazon.com/gp/new-releases/appliances/ref=zg_bsnr_appliances_pg_2/191-0874592-3518518?ie=UTF8&pg=2">21-40</a></li> // XmlNode ajaxUrlNode = curPageNode.SelectSingleNode(".//a[@href]"); // string pageUrl = ajaxUrlNode.Attributes["href"].Value; // } //} #if USE_HTML_PARSER_HTMLAGILITYPACK //Method 2: use htmlToHtmlDoc string testUrlWithXmlns = "http://sd.csdn.net/"; string respHtml = getUrlRespHtml(testUrlWithXmlns); //<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> //<html xmlns="http://www.w3.org/1999/xhtml"> //<head> HtmlAgilityPack.HtmlDocument htmlDoc = htmlToHtmlDoc(respHtml); //<div class="tabcontent" id="sc1"> // <ul> // <li><a href="http://www.csdn.net/article/tag/%E4%BA%A7%E5%93%81" target="_blank">产品</a></li> // ...... // <li><a href="http://www.csdn.net/article/tag/%E8%AE%BE%E8%AE%A1" target="_blank">设计</a></li> // </ul> //</div> //... //<div class="tabcontent" id="sc4"> // <ul> // ... // <li><a href="http://www.csdn.net/article/tag/%E6%95%B0%E6%8D%AE%E5%BA%93" target="_blank">数据库</a></li> // </ul> //</div> //here, no need to take care the html xmlns //is better than SGMLReader HtmlNode rootHtmlNode = htmlDoc.DocumentNode; HtmlNodeCollection htmlNodes = rootHtmlNode.SelectNodes("//div[@class='tabcontent']"); foreach (HtmlNode link in htmlNodes) { HtmlAttribute att = link.Attributes["id"]; string idHref = att.Value; }
public HtmlAgilityPack.HtmlDocument htmlToHtmlDoc(string html) { HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); //http://www.crifan.com/htmlagilitypack_html_tag_form_option_no_child_via_sibling_get_innertext/ //make some html tag: form/option, has child HtmlNode.ElementsFlags.Remove("form"); HtmlNode.ElementsFlags.Remove("option"); htmlDoc.LoadHtml(html); return htmlDoc; }
例 13.2. htmlToHtmlDoc 的使用范例
//Method 2: use htmlToHtmlDoc string testUrlWithXmlns = "http://sd.csdn.net/"; string respHtml = getUrlRespHtml(testUrlWithXmlns); //<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> //<html xmlns="http://www.w3.org/1999/xhtml"> //<head> HtmlAgilityPack.HtmlDocument htmlDoc = htmlToHtmlDoc(respHtml);
注意,使用此函数之前,需要开启对应的宏USE_HTML_PARSER_HTMLAGILITYPACK,以及添加对应的dll库HtmlAgilityPack.dll的引用。
//remove sub node from current html node //eg: //"script" //for //<script type="text/javascript"> public HtmlNode removeSubHtmlNode(HtmlNode curHtmlNode, string subNodeToRemove) { HtmlNode afterRemoved = curHtmlNode; ////method 1: fail ////foreach (var subNode in afterRemoved.Descendants(subNodeToRemove)) //foreach (HtmlNode subNode in afterRemoved.Descendants(subNodeToRemove)) //{ // //An unhandled exception of type 'System.InvalidOperationException' occurred in mscorlib.dll // //Additional information: Collection was modified; enumeration operation may not execute. // //afterRemoved.RemoveChild(subNode); // //curHtmlNode.RemoveChild(subNode); // subNode.Remove(); //} //method 2: OK HtmlNodeCollection foundAllSub = curHtmlNode.SelectNodes(subNodeToRemove); if ((foundAllSub != null) && (foundAllSub.Count > 0)) { foreach (HtmlNode subNode in foundAllSub) { curHtmlNode.RemoveChild(subNode); } } return afterRemoved; }
例 13.3. removeSubHtmlNode 的使用范例
HtmlNode curBulletNode = allBulletNodeList[idx]; HtmlNode noJsNode = crl.removeSubHtmlNode(curBulletNode, "script"); HtmlNode noStyleNode = crl.removeSubHtmlNode(curBulletNode, "style"); string bulletStr = noStyleNode.InnerText;
/* * [Function] * remove html tag, retain html content * [Input] * html, with tag * * [Output] * pure content, no html tag * * [Note] */ public string htmlRemoveTag(string html) { string filteredHtml = ""; if (!string.IsNullOrEmpty(html)) { HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.LoadHtml(html); if (htmlDoc == null) { return ""; } // 1. remove all comments //(1)get all comment nodes using XPATH HtmlNodeCollection commentNodeList = htmlDoc.DocumentNode.SelectNodes("//comment()"); if (commentNodeList != null) { foreach (HtmlNode comment in commentNodeList) { //(2) remove comment node itself comment.ParentNode.RemoveChild(comment); } } //2. get all content foreach (var node in htmlDoc.DocumentNode.ChildNodes) { filteredHtml += node.InnerText; } } return filteredHtml; }
例 13.4. htmlRemoveTag 的使用范例
HtmlAgilityPack.HtmlDocument htmlDoc = crl.htmlToHtmlDoc(googleSearchRespHtml); HtmlNodeCollection liNodeList = htmlDoc.DocumentNode.SelectNodes("//li[@class='g']"); foreach (HtmlNode liNode in liNodeList) { HtmlNode h3ANode = liNode.SelectSingleNode(".//h3[@class='r']/a"); if (h3ANode != null) { googleSearchResultItem singleResultItem = new googleSearchResultItem(); //string titleHtml = h3ANode.InnerHtml; //"Amritanandamayi Math to <em>sponsor charity</em> events - Times Of India" string titleHtml = h3ANode.InnerText; //"Amritanandamayi Math to sponsor charity events - Times Of India" string filteredTitle = crl.htmlRemoveTag(titleHtml);
public yourClassname() { //!!! for load embedded dll: (1) register resovle handler AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve); InitializeComponent(); ... } //!!! for load embedded dll: (2) implement this handler System.Reflection.Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args) { string dllName = args.Name.Contains(",") ? args.Name.Substring(0, args.Name.IndexOf(',')) : args.Name.Replace(".dll", ""); dllName = dllName.Replace(".", "_"); if (dllName.EndsWith("_resources")) return null; System.Resources.ResourceManager rm = new System.Resources.ResourceManager(GetType().Namespace + ".Properties.Resources", System.Reflection.Assembly.GetExecutingAssembly()); byte[] bytes = (byte[])rm.GetObject(dllName); return System.Reflection.Assembly.Load(bytes); }
例 14.1. 集成DLL到exe中 的使用范例
public crifanLib() { //!!! for load embedded dll: (1) register resovle handler AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve); //...... } //!!! for load embedded dll: (2) implement this handler System.Reflection.Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args) { string dllName = args.Name.Contains(",") ? args.Name.Substring(0, args.Name.IndexOf(',')) : args.Name.Replace(".dll", ""); dllName = dllName.Replace(".", "_"); if (dllName.EndsWith("_resources")) return null; System.Resources.ResourceManager rm = new System.Resources.ResourceManager(GetType().Namespace + ".Properties.Resources", System.Reflection.Assembly.GetExecutingAssembly()); byte[] bytes = (byte[])rm.GetObject(dllName); return System.Reflection.Assembly.Load(bytes); }
关于如何把DLL集成到exe中,详见:【已解决】C#中集成DLL库到自己的exe程序中
目录
public void dgvClearContent(DataGridView dgvValue) { dgvValue.Rows.Clear(); }
//draw the row index public void dgvDrawHeaderNum(DataGridView dgvValue) { for (int index = 0; (index <= (dgvValue.Rows.Count - 1)); index++) { int number = index + 1; dgvValue.Rows[index].HeaderCell.Value = String.Format("{0}", number); } }
//release object public void releaseObject(object obj) { try { System.Runtime.InteropServices.Marshal.ReleaseComObject(obj); obj = null; } catch (Exception ex) { obj = null; //MessageBox.Show("Exception Occured while releasing object " + ex.ToString()); } finally { GC.Collect(); } }
例 15.3. releaseObject 的使用范例
xlWorkBook.Close(true, misValue, misValue); xlApp.Quit(); releaseObject(xlWorkSheet); releaseObject(xlWorkBook); releaseObject(xlApp);
public void dgvExportToExcel( DataGridView dgvValue, string excelFullFilename, bool isAutoFit = true, bool isHeaderBold = true, List<int> omitRowIdxList = null, List<int> omitColumnIdxList = null, List<int> useTagColumnIdxList = null) { Excel.Application xlApp = new Excel.Application(); Excel.Workbook xlWorkBook; Excel.Worksheet xlWorkSheet; object misValue = System.Reflection.Missing.Value; xlApp = new Excel.ApplicationClass(); xlWorkBook = xlApp.Workbooks.Add(misValue); xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1); int rowIdx = 0, realRowIdx = 0; int columnIdx = 0, realColumnIdx = 0; const int excelRowHeader = 1; const int excelColumnHeader = 1; //save header for (columnIdx = 0, realColumnIdx = 0; columnIdx <= dgvValue.ColumnCount - 1; columnIdx++) { if ((omitColumnIdxList != null) && omitColumnIdxList.Contains(columnIdx)) { //omit this column } else { //excelRowHeader and excelColumnHeader -> jump over the excel buildin row and column xlWorkSheet.Cells[0 + excelRowHeader, realColumnIdx + excelColumnHeader] = dgvValue.Columns[columnIdx].HeaderText; realColumnIdx++; } } const int excelTitleRow = 1; //save cells for (rowIdx = 0, realRowIdx= 0; rowIdx <= dgvValue.RowCount - 1; rowIdx++) { if ((omitRowIdxList != null) && omitRowIdxList.Contains(rowIdx)) { //omit this row } else { for (columnIdx = 0, realColumnIdx = 0; columnIdx <= dgvValue.ColumnCount - 1; columnIdx++) { if ((omitColumnIdxList != null) && omitColumnIdxList.Contains(columnIdx)) { //omit this column } else { //note here use [columnIdx, rowIdx], not [rowIdx, columnIdx] DataGridViewCell curCell = dgvValue[columnIdx, rowIdx]; if ((useTagColumnIdxList != null) && useTagColumnIdxList.Contains(columnIdx)) { xlWorkSheet.Cells[(realRowIdx + excelTitleRow) + excelRowHeader, realColumnIdx + excelColumnHeader] = curCell.Tag; } else { xlWorkSheet.Cells[(realRowIdx + excelTitleRow) + excelRowHeader, realColumnIdx + excelColumnHeader] = curCell.Value; } realColumnIdx++; } } realRowIdx++; } } //formatting //(1) header to bold if (isHeaderBold) { Range headerRow = xlWorkSheet.get_Range("1:1", System.Type.Missing); headerRow.Font.Bold = true; } //(2) auto adjust column width (according to content) if (isAutoFit) { Range allColumn = xlWorkSheet.Columns; allColumn.AutoFit(); } //output xlWorkBook.SaveAs( excelFullFilename, XlFileFormat.xlWorkbookNormal, misValue, misValue, misValue, misValue, XlSaveAsAccessMode.xlExclusive, XlSaveConflictResolution.xlLocalSessionChanges, misValue, misValue, misValue, misValue); xlWorkBook.Close(true, misValue, misValue); xlApp.Quit(); releaseObject(xlWorkSheet); releaseObject(xlWorkBook); releaseObject(xlApp); }
例 15.4. dgvExportToExcel 的使用范例
string outputFilename = txbExpAlertFilename.Text + ".xls"; string fullFilename = Path.Combine(saveFolderPath, outputFilename); List<int> omitColumnIdxList = new List<int>(); //omit the last column: View page omitColumnIdxList.Add(dgvSearchedAlerts.ColumnCount - 1); crifanLib.dgvExportToExcel(dgvSearchedAlerts, fullFilename, omitColumnIdxList: omitColumnIdxList);
public void dgvExportToCsv(DataGridView dgvValue, string csvFullFilename, string delimiter = ",", List<int> omitRowIdxList = null, List<int> omitColumnIdxList = null, List<int> useTagColumnIdxList = null) { StreamWriter csvStreamWriter = new StreamWriter(csvFullFilename, false, System.Text.Encoding.UTF8); int rowIdx = 0, realRowIdx = 0; int columnIdx = 0, realColumnIdx = 0; //output header data string headerRowStr = ""; for (columnIdx = 0, realColumnIdx = 0; columnIdx <= dgvValue.ColumnCount - 1; columnIdx++) { if ((omitColumnIdxList != null) && omitColumnIdxList.Contains(columnIdx)) { //omit this column } else { headerRowStr += dgvValue.Columns[columnIdx].HeaderText + delimiter; realColumnIdx++; } } csvStreamWriter.WriteLine(headerRowStr); //output rows data for (rowIdx = 0, realRowIdx = 0; rowIdx <= dgvValue.RowCount - 1; rowIdx++) { if ((omitRowIdxList != null) && omitRowIdxList.Contains(rowIdx)) { //omit this row } else { string eachRowStr = ""; for (columnIdx = 0, realColumnIdx = 0; columnIdx <= dgvValue.ColumnCount - 1; columnIdx++) { if ((omitColumnIdxList != null) && omitColumnIdxList.Contains(columnIdx)) { //omit this column } else { DataGridViewCell curCell = dgvValue[columnIdx, rowIdx];//dgvValue.Rows[rowIdx].Cells[columnIdx] if ((useTagColumnIdxList != null) && useTagColumnIdxList.Contains(columnIdx)) { eachRowStr += curCell.Tag + delimiter; } else { eachRowStr += curCell.Value + delimiter; } realColumnIdx++; } } csvStreamWriter.WriteLine(eachRowStr); realRowIdx++; } } csvStreamWriter.Close(); }
例 15.5. dgvExportToCsv 的使用范例
string outputFilename = txbExpAlertFilename.Text + ".csv"; string fullFilename = Path.Combine(saveFolderPath, outputFilename); List<int> omitColumnIdxList = new List<int>(); //omit the last column: View page omitColumnIdxList.Add(dgvSearchedAlerts.ColumnCount - 1); crifanLib.dgvExportToCsv(dgvSearchedAlerts, fullFilename, omitColumnIdxList: omitColumnIdxList);
#if USE_JSON /* * [Function] * convert json string into dictionary object * [Input] * json string * [Output] * object, internally is dictionary * [Note] * 1.you should know the internal structure of the dictionary * then converted to specific type of yours */ public Object jsonToDict(string jsonStr) { JavaScriptSerializer jsonSerializer = new JavaScriptSerializer() { MaxJsonLength = int.MaxValue }; Object dictObj = jsonSerializer.DeserializeObject(jsonStr); return dictObj; } #endif
例 16.1. jsonToDict 的使用范例
string kibMasJson = ""; string colorImagesJson = ""; if (crl.extractSingleStr(@"window\.kibMAs\s*=\s*(\[.+?\])\s*;\s*window\.kibConfig\s*=", productHtml, out kibMasJson, RegexOptions.Singleline)) { //2. json to dict Object[] dictList = (Object[])crl.jsonToDict(kibMasJson); //3. get ["preplayImages"]["L"] imageUrlList = new string[dictList.Length]; crl.emptyStringArray(imageUrlList); for (int idx = 0; idx < dictList.Length; idx++) { Dictionary<string, Object> eachImgDict = (Dictionary<string, Object>)dictList[idx]; Object imgUrlObj = null; if (eachImgDict.ContainsKey("preplayImages")) { eachImgDict.TryGetValue("preplayImages", out imgUrlObj); } else if (eachImgDict.ContainsKey("imageUrls")) { eachImgDict.TryGetValue("imageUrls", out imgUrlObj); } if (imgUrlObj != null) { //"L" : "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KC/KC-slate-01-lg._V401028090_.jpg", //"S" : "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KC/KC-slate-01-sm._V401028090_.jpg" //"L" : "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KC/KC-slate-03-lg._V400694812_.jpg", //"S" : "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KC/KC-slate-03-sm._V400694812_.jpg", //"rich": { // src: "http://g-ecx.images-amazon.com/images/G/01/misc/untranslatable-image-id.jpg", // width: null, // height: null //} //Type curType = imgUrlObj.GetType(); Dictionary<string, Object> imgUrlDict = (Dictionary<string, Object>)imgUrlObj; Object largeImgUrObj = ""; if (imgUrlDict.TryGetValue("L", out largeImgUrObj)) { //[0] "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KT/KT-slate-01-lg._V395919237_.jpg" //[1] "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KT/KT-slate-02-lg._V389394532_.jpg" //[2] "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KT/KT-slate-03-lg._V389394535_.jpg" //[3] "http://g-ecx.images-amazon.com/images/G/01//kindle/dp/2012/KT/KT-slate-04-lg.jpg" //[4] "http://g-ecx.images-amazon.com/images/G/01/kindle/dp/2012/KT/KT-slate-05-lg._V389394532_.jpg" imageUrlList[idx] = largeImgUrObj.ToString(); } else { //something wrong //not get all pic } } else { //something wrong } } }