13.2. 将HTML转换为HtmlAgilityPack的HtmlDocument:htmlToHtmlDoc


    public HtmlAgilityPack.HtmlDocument htmlToHtmlDoc(string html)
    {
        HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

        //http://www.crifan.com/htmlagilitypack_html_tag_form_option_no_child_via_sibling_get_innertext/
        //make some html tag: form/option, has child
        HtmlNode.ElementsFlags.Remove("form");
        HtmlNode.ElementsFlags.Remove("option");

        htmlDoc.LoadHtml(html);

        return htmlDoc;
    }

    

例 13.2. htmlToHtmlDoc 的使用范例


        //Method 2: use htmlToHtmlDoc
        string testUrlWithXmlns = "http://sd.csdn.net/";
        string respHtml = getUrlRespHtml(testUrlWithXmlns);

        //<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
        //<html xmlns="http://www.w3.org/1999/xhtml">
        //<head>
        HtmlAgilityPack.HtmlDocument htmlDoc = htmlToHtmlDoc(respHtml);

        

注意,使用此函数之前,需要开启对应的宏USE_HTML_PARSER_HTMLAGILITYPACK,以及添加对应的dll库HtmlAgilityPack.dll的引用。