【背景】
C#中,需要用WebBrowser模拟浏览器访问页面。
且需要能捕获到用户的点击事件,然后执行一些动作。
此处是:
能访问google搜索。
能得到当前的html。
能捕获到点google搜索结果中的第几页后,
再去获得最新的html,然后解析html,提取其中的url。
【解决过程】
1.新建一个WebBrowser:
2.然后再去弄懂如何使用WebBrowser。
尝试直接设置对应的Uri:
//http://www.google.com.hk/search?q=weight%20loss+%22Sponsor%20Charity%22 wbsChaseFootprint.Url = new Uri(strEncodedFullFootprintUrl);
结果是直接就可以实现浏览网页的效果了:
3.现在接着要去获得,当前网页的html,并且提取其中的url地址显示出来。
现在先要搞清楚,如何判断网页加载完毕,然后才能是去获得html。
然后参考:
然后发现,本身WebBrowser内置支持此completed事件:
所以双击添加代码:
private void wbsChaseFootprint_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { }
4.然后接着去搞懂如何获得html内容。
参考:
Getting the HTML source through the WebBrowser control in C#
去试试:
string curHtml = wbsChaseFootprint.DocumentText;
就可以获得html代码了。
5.不过,本身只是想要提起google搜索出来的结果的url地址,所以,在看了别人:
How to get rendered html (processed by Javascript) in WebBrowser control?
提到的GetElementById,GetElementsByTagName,所以,打算直接借用试试,看看能否获得对应的html的tag内容,以及接着获得其中的url。
以此希望免去复杂的html的解析。
想去试试,结果看了官网的:
中的:
的解释后,放弃了此想法,因为其没有像之前的Xpath那样方便,能直接找到所要的html中的节点。
6.而关于从WebBrowser得到的HtmlDocument,对于Xpath是否支持,参考了:
Navigation and WebBrowser control
说是不支持的。
所以,干脆还是自己另外处理从DocumentText所得到的,原始的html吧。
7.所以,还是用原来的HtmlAgilityPack去处理吧。
相关代码如下:
private void wbsChaseFootprint_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { string curHtml = wbsChaseFootprint.DocumentText; //System.Windows.Forms.HtmlDocument htmlDoc = wbsChaseFootprint.Document; List<crifanLibGoogle.googleSearchResultItem> resultItemList = google.extractGoogleSearchResult("", curHtml); if ((resultItemList != null) && (resultItemList.Count > 0)) { txbOutput.Text = ""; foreach (crifanLibGoogle.googleSearchResultItem singleResultItem in resultItemList) { txbOutput.Text += singleResultItem.Url + Environment.NewLine; } } }
和
using HtmlAgilityPack; public class crifanLibGoogle { public crifanLib crl; public struct googleSearchResultItem { public string Title { get; set; } public string Url { get; set; } //TODO: add Description } ...... /* * [Function] * extract google search result item from google search url or its html * [Input] * url: * http://www.google.com.hk/search?q=weight%20loss+%22Sponsor%20Charity%22 * or its html * [Output] * search result item * [Note] */ public List<googleSearchResultItem> extractGoogleSearchResult(string googleSearchUrl = "", string googleSearchRespHtml = "") { List<googleSearchResultItem> resultItemList = new List<googleSearchResultItem>(); //if not give html, get it if (string.IsNullOrEmpty(googleSearchRespHtml)) { googleSearchRespHtml = crl.getUrlRespHtml_multiTry(googleSearchUrl); } if (!string.IsNullOrEmpty(googleSearchRespHtml)) { //<li class="g"> // <div data-hveid="42" class="rc"> // <span style="float:left"></span> // <h3 class="r"> // <a href="http://articles.timesofindia.indiatimes.com/2012-09-22/kochi/34021062_1_kidney-transplants-fireworks-factory-birthday-celebrations" onmousedown="return rwt(this,'','','','1','AFQjCNEML6Pgh2cKhjyy19S1Rj2zt91iAg','','0CCsQFjAA','','',event)" target="_blank"> // Amritanandamayi Math to <em>sponsor charity</em> events - Times Of India // </a> // </h3> // <div class="s"> // <div><div class="f kv" style="white-space:nowrap"><cite class="bc">articles.timesofindia.indiatimes.com › <a href="http://articles.timesofindia.indiatimes.com/" onmousedown="return rwt(this,'','','','1','AFQjCNHYQDP9zOXmqE2BLyiniRDD4oZS4g','','0CC0Q6QUoADAA','','',event)" target="_blank">Collections</a> › <a href="http://articles.timesofindia.indiatimes.com/keyword/kannur" onmousedown="return rwt(this,'','','','1','AFQjCNFOec2KvR8ZCCt8sV5S5EZBpJ1l8g','','0CC4Q6QUoATAA','','',event)" target="_blank">Kannur</a></cite> - <a href="http://translate.google.com.hk/translate?hl=zh-CN&sl=en&u=http://articles.timesofindia.indiatimes.com/2012-09-22/kochi/34021062_1_kidney-transplants-fireworks-factory-birthday-celebrations&prev=/search%3Fq%3Dweight%2Bloss%2B%2522Sponsor%2BCharity%2522%26newwindow%3D1%26safe%3Dstrict" onmousedown="return rwt(this,'','','','1','AFQjCNEiP3vOES7Rpw3v20GEzkxb_WL5DA','','0CDAQ7gEwAA','','',event)" target="_blank" class="fl">翻译此页</a></div><div class="f slp"></div><span class="st"><span class="f">2012年9月22日 – </span>Amritanandamayi Math to <em>sponsor charity</em> events. TNN Sep 22, 2012, <b>...</b> 10 Tips for guaranteed <em>weight loss</em> · How to lose weight without dieting <b>...</b></span> // </div> // </div> //</div> //</li> //<li class="g"> // <div data-hveid="50" class="rc"> // <span style="float:left"></span> // <h3 class="r"> // <a href="http://www.gobookee.net/non-profit-charity-golf-sponsor-letter/" onmousedown="return rwt(this,'','','','2','AFQjCNGACDpc3rYcQ7xyLWeso2O8Uh_dzQ','','0CDMQFjAB','','',event)" target="_blank"> // Non profit charity golf sponsor letter - free eBooks download // </a> // </h3> // <div class="s"><div><div class="f kv" style="white-space:nowrap"><cite>www.gobookee.net/non-profit-charity-golf-sponsor-letter/</cite><div class="action-menu ab_ctl"><a class="clickable-dropdown-arrow ab_button" id="am-b1" href="#" data-ved="0CDQQ7B0wAQ" aria-label="结果详情" jsaction="ab.tdd; keydown:ab.hbke; keypress:ab.mskpe" role="button" aria-haspopup="true" aria-expanded="false"><span class="mn-dwn-arw"></span></a><div data-ved="0CDUQqR8wAQ" class="action-menu-panel ab_dropdown" jsaction="keydown:ab.hdke; mouseover:ab.hdhne; mouseout:ab.hdhue" role="menu" tabindex="-1"><ul><li class="action-menu-item ab_dropdownitem" role="menuitem"><a href="http://webcache.googleusercontent.com/search?q=cache:700J2efn4woJ:www.gobookee.net/non-profit-charity-golf-sponsor-letter/+weight+loss+%22Sponsor+Charity%22&cd=2&hl=zh-CN&ct=clnk&gl=cn" onmousedown="return rwt(this,'','','','2','AFQjCNH4JkH1_ORT0Gq3Gi-_UsKhuGy4PA','','0CDYQIDAB','','',event)" target="_blank" class="fl">网页快照</a></li></ul></div></div><a href="http://translate.google.com.hk/translate?hl=zh-CN&sl=en&u=http://www.gobookee.net/non-profit-charity-golf-sponsor-letter/&prev=/search%3Fq%3Dweight%2Bloss%2B%2522Sponsor%2BCharity%2522%26newwindow%3D1%26safe%3Dstrict" onmousedown="return rwt(this,'','','','2','AFQjCNFgq5X686zRjTuhe8rQ11RoE7VNEw','','0CDgQ7gEwAQ','','',event)" target="_blank" class="fl">翻译此页</a></div><div class="f slp"></div><span class="st">GOLF TOURNAMENT <em>SPONSOR. ... charity</em> golf tournament to help raise funds for our programs and teams ... non-profit org. so all donations/sponsorships are <b>...</b></span></div></div> // </div> //</li> HtmlAgilityPack.HtmlDocument htmlDoc = crl.htmlToHtmlDoc(googleSearchRespHtml); HtmlNodeCollection liNodeList = htmlDoc.DocumentNode.SelectNodes("//li[@class='g']"); foreach (HtmlNode liNode in liNodeList) { HtmlNode h3ANode = liNode.SelectSingleNode(".//h3[@class='r']/a"); if (h3ANode != null) { googleSearchResultItem singleResultItem = new googleSearchResultItem(); //string titleHtml = h3ANode.InnerHtml; //"Amritanandamayi Math to <em>sponsor charity</em> events - Times Of India" string titleHtml = h3ANode.InnerText; //"Amritanandamayi Math to sponsor charity events - Times Of India" string filteredTitle = crl.htmlRemoveTag(titleHtml); string url = h3ANode.Attributes["href"].Value; //"http://articles.timesofindia.indiatimes.com/2012-09-22/kochi/34021062_1_kidney-transplants-fireworks-factory-birthday-celebrations" //store info singleResultItem.Title = filteredTitle; singleResultItem.Url = url; resultItemList.Add(singleResultItem); } else { } } } return resultItemList; } }
【总结】
C#的WebBrowser,相对还是蛮好用,容易上手的。
只是,对于另外解析html来说,内置的DOM不好用。只好另外用HtmlAgilityPack去实现自己想要的,任何的效果了。
转载请注明:在路上 » 【记录】C#中使用WebBrowser浏览google页面