【记录】C#中使用WebBrowser浏览google页面

【背景】

C#中,需要用WebBrowser模拟浏览器访问页面。

且需要能捕获到用户的点击事件,然后执行一些动作。

此处是:

能访问google搜索。

能得到当前的html。

能捕获到点google搜索结果中的第几页后,

再去获得最新的html,然后解析html,提取其中的url。

 

【解决过程】

1.新建一个WebBrowser:

drag a new webbrowser

2.然后再去弄懂如何使用WebBrowser。

尝试直接设置对应的Uri:

            //http://www.google.com.hk/search?q=weight%20loss+%22Sponsor%20Charity%22
            wbsChaseFootprint.Url = new Uri(strEncodedFullFootprintUrl);

结果是直接就可以实现浏览网页的效果了:

already realized browse website

3.现在接着要去获得,当前网页的html,并且提取其中的url地址显示出来。

现在先要搞清楚,如何判断网页加载完毕,然后才能是去获得html。

然后参考:

C# WebBrowser 获得选中部分的html源码

然后发现,本身WebBrowser内置支持此completed事件:

webbrowser buildin support completed event

所以双击添加代码:

private void wbsChaseFootprint_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{

}

4.然后接着去搞懂如何获得html内容。

参考:

Getting the HTML source through the WebBrowser control in C#

去试试:

string curHtml = wbsChaseFootprint.DocumentText;

就可以获得html代码了。

5.不过,本身只是想要提起google搜索出来的结果的url地址,所以,在看了别人:

How to get rendered html (processed by Javascript) in WebBrowser control?

提到的GetElementById,GetElementsByTagName,所以,打算直接借用试试,看看能否获得对应的html的tag内容,以及接着获得其中的url。

以此希望免去复杂的html的解析。

想去试试,结果看了官网的:

HtmlDocument Members

中的:

GetElementById

GetElementFromPoint

GetElementsByTagName

的解释后,放弃了此想法,因为其没有像之前的Xpath那样方便,能直接找到所要的html中的节点。

6.而关于从WebBrowser得到的HtmlDocument,对于Xpath是否支持,参考了:

Navigation and WebBrowser control

说是不支持的。

所以,干脆还是自己另外处理从DocumentText所得到的,原始的html吧。

7.所以,还是用原来的HtmlAgilityPack去处理吧。

相关代码如下:

        private void wbsChaseFootprint_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            string curHtml = wbsChaseFootprint.DocumentText;
            //System.Windows.Forms.HtmlDocument htmlDoc = wbsChaseFootprint.Document;

            List<crifanLibGoogle.googleSearchResultItem> resultItemList = google.extractGoogleSearchResult("", curHtml);
            if ((resultItemList != null) && (resultItemList.Count > 0))
            {
                txbOutput.Text = "";

                foreach (crifanLibGoogle.googleSearchResultItem singleResultItem in resultItemList)
                {
                    txbOutput.Text += singleResultItem.Url + Environment.NewLine;
                }
            }
        }

using HtmlAgilityPack;

public class crifanLibGoogle
{
    public crifanLib crl;

    public struct googleSearchResultItem
    {
        public string Title { get; set; }
        public string Url { get; set; }

        //TODO: add Description
    }

    ......
    
    /*
     * [Function]
     * extract google search result item from google search url or its html
     * [Input]
     * url:
     * http://www.google.com.hk/search?q=weight%20loss+%22Sponsor%20Charity%22
     * or its html
     * [Output]
     * search result item
     * [Note]
     */
    public List<googleSearchResultItem> extractGoogleSearchResult(string googleSearchUrl = "", string googleSearchRespHtml = "")
    {
        List<googleSearchResultItem> resultItemList = new List<googleSearchResultItem>();

        //if not give html, get it
        if (string.IsNullOrEmpty(googleSearchRespHtml))
        {
            googleSearchRespHtml = crl.getUrlRespHtml_multiTry(googleSearchUrl);
        }

        if (!string.IsNullOrEmpty(googleSearchRespHtml))
        {
            //<li class="g">
            //    <div data-hveid="42" class="rc">
            //    <span style="float:left"></span>
            //    <h3 class="r">
            //        <a href="http://articles.timesofindia.indiatimes.com/2012-09-22/kochi/34021062_1_kidney-transplants-fireworks-factory-birthday-celebrations" onmousedown="return rwt(this,'','','','1','AFQjCNEML6Pgh2cKhjyy19S1Rj2zt91iAg','','0CCsQFjAA','','',event)" target="_blank">
            //            Amritanandamayi Math to <em>sponsor charity</em> events - Times Of India
            //        </a>
            //    </h3>

            //    <div class="s">
            //        <div><div class="f kv" style="white-space:nowrap"><cite class="bc">articles.timesofindia.indiatimes.com &rsaquo; <a href="http://articles.timesofindia.indiatimes.com/" onmousedown="return rwt(this,'','','','1','AFQjCNHYQDP9zOXmqE2BLyiniRDD4oZS4g','','0CC0Q6QUoADAA','','',event)" target="_blank">Collections</a> &rsaquo; <a href="http://articles.timesofindia.indiatimes.com/keyword/kannur" onmousedown="return rwt(this,'','','','1','AFQjCNFOec2KvR8ZCCt8sV5S5EZBpJ1l8g','','0CC4Q6QUoATAA','','',event)" target="_blank">Kannur</a></cite> - <a href="http://translate.google.com.hk/translate?hl=zh-CN&amp;sl=en&amp;u=http://articles.timesofindia.indiatimes.com/2012-09-22/kochi/34021062_1_kidney-transplants-fireworks-factory-birthday-celebrations&amp;prev=/search%3Fq%3Dweight%2Bloss%2B%2522Sponsor%2BCharity%2522%26newwindow%3D1%26safe%3Dstrict" onmousedown="return rwt(this,'','','','1','AFQjCNEiP3vOES7Rpw3v20GEzkxb_WL5DA','','0CDAQ7gEwAA','','',event)" target="_blank" class="fl">翻译此页</a></div><div class="f slp"></div><span class="st"><span class="f">2012年9月22日 &ndash; </span>Amritanandamayi Math to <em>sponsor charity</em> events. TNN Sep 22, 2012, <b>...</b> 10 Tips for guaranteed <em>weight loss</em> &middot; How to lose weight without dieting&nbsp;<b>...</b></span>
            //        </div>
            //    </div>
            //</div>
            //</li>

            //<li class="g">
            //    <div data-hveid="50" class="rc">
            //        <span style="float:left"></span>
            //        <h3 class="r">
            //            <a href="http://www.gobookee.net/non-profit-charity-golf-sponsor-letter/" onmousedown="return rwt(this,'','','','2','AFQjCNGACDpc3rYcQ7xyLWeso2O8Uh_dzQ','','0CDMQFjAB','','',event)" target="_blank">
            //                Non profit charity golf sponsor letter - free eBooks download
            //            </a>
            //        </h3>
            //        <div class="s"><div><div class="f kv" style="white-space:nowrap"><cite>www.gobookee.net/non-profit-charity-golf-sponsor-letter/</cite>‎<div class="action-menu ab_ctl"><a class="clickable-dropdown-arrow ab_button" id="am-b1" href="#" data-ved="0CDQQ7B0wAQ" aria-label="结果详情" jsaction="ab.tdd; keydown:ab.hbke; keypress:ab.mskpe" role="button" aria-haspopup="true" aria-expanded="false"><span class="mn-dwn-arw"></span></a><div data-ved="0CDUQqR8wAQ" class="action-menu-panel ab_dropdown" jsaction="keydown:ab.hdke; mouseover:ab.hdhne; mouseout:ab.hdhue" role="menu" tabindex="-1"><ul><li class="action-menu-item ab_dropdownitem" role="menuitem"><a href="http://webcache.googleusercontent.com/search?q=cache:700J2efn4woJ:www.gobookee.net/non-profit-charity-golf-sponsor-letter/+weight+loss+%22Sponsor+Charity%22&amp;cd=2&amp;hl=zh-CN&amp;ct=clnk&amp;gl=cn" onmousedown="return rwt(this,'','','','2','AFQjCNH4JkH1_ORT0Gq3Gi-_UsKhuGy4PA','','0CDYQIDAB','','',event)" target="_blank" class="fl">网页快照</a></li></ul></div></div><a href="http://translate.google.com.hk/translate?hl=zh-CN&amp;sl=en&amp;u=http://www.gobookee.net/non-profit-charity-golf-sponsor-letter/&amp;prev=/search%3Fq%3Dweight%2Bloss%2B%2522Sponsor%2BCharity%2522%26newwindow%3D1%26safe%3Dstrict" onmousedown="return rwt(this,'','','','2','AFQjCNFgq5X686zRjTuhe8rQ11RoE7VNEw','','0CDgQ7gEwAQ','','',event)" target="_blank" class="fl">翻译此页</a></div><div class="f slp"></div><span class="st">GOLF TOURNAMENT <em>SPONSOR. ... charity</em> golf tournament to help raise funds for our programs and teams ... non-profit org. so all donations/sponsorships are&nbsp;<b>...</b></span></div></div>
            //    </div>
            //</li>

            HtmlAgilityPack.HtmlDocument htmlDoc = crl.htmlToHtmlDoc(googleSearchRespHtml);
            HtmlNodeCollection liNodeList = htmlDoc.DocumentNode.SelectNodes("//li[@class='g']");
            foreach (HtmlNode liNode in liNodeList)
            {
                HtmlNode h3ANode = liNode.SelectSingleNode(".//h3[@class='r']/a");
                if (h3ANode != null)
                {
                    googleSearchResultItem singleResultItem = new googleSearchResultItem();

                    //string titleHtml = h3ANode.InnerHtml; //"Amritanandamayi Math to <em>sponsor charity</em> events - Times Of India"
                    string titleHtml = h3ANode.InnerText; //"Amritanandamayi Math to sponsor charity events - Times Of India"
                    string filteredTitle = crl.htmlRemoveTag(titleHtml);

                    string url = h3ANode.Attributes["href"].Value; //"http://articles.timesofindia.indiatimes.com/2012-09-22/kochi/34021062_1_kidney-transplants-fireworks-factory-birthday-celebrations"

                    //store info
                    singleResultItem.Title = filteredTitle;
                    singleResultItem.Url = url;

                    resultItemList.Add(singleResultItem);
                }
                else
                {
 
                }
            }
        }

        return resultItemList;
    }
}

 

【总结】

C#的WebBrowser,相对还是蛮好用,容易上手的。

只是,对于另外解析html来说,内置的DOM不好用。只好另外用HtmlAgilityPack去实现自己想要的,任何的效果了。



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量