最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】C#去除Html的tag且同时去除注释

C# crifan 4711浏览 0评论

【问题】

C#中,想要去除html的标签tag,且同时去除注释comment。

 

【解决过程】

1.参考:

How can I strip HTML tags from a string in ASP.NET?

去试试用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public string htmlRemoveTag(string html)
{
    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
    htmlDoc.LoadHtml(html);
    if (htmlDoc == null)
    {
        return "";
    }
 
    string filteredHtml = "";
    foreach (var node in htmlDoc.DocumentNode.ChildNodes)
    {
        filteredHtml += node.InnerText;
    }
 
    return filteredHtml;
}

结果是,可以去除所有的tag了。

但是对于html的注释:

<!——- A+ Content Begins Here ——->  <!——- BRAND LOGO ——->      <!——- TITLE ——->  Frigidaire Mini Air Conditioner  <!——- GENERAL DESCRIPTION ——->     Frigidaire’s FRA052XT7 5,000 BTU 115-Volt Window-Mounted Mini-Compact Air Conditioner is perfect for rooms up to 150 square feet.  It quickly cools a room on hot days and quie。。。。。。。。

却没去掉。

2.继续去除comment。

参考:

Removing HTML Comments

然后用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public string htmlRemoveTag(string html)
{
    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
    htmlDoc.LoadHtml(html);
    if (htmlDoc == null)
    {
        return "";
    }
 
    // 1. remove all comments
    //(1)get all comment nodes using XPATH
    foreach (HtmlNode comment in htmlDoc.DocumentNode.SelectNodes("//comment()"))
    {
        //(2) remove comment node itself
        comment.ParentNode.RemoveChild(comment);
    }
 
    //2. get all content
    string filteredHtml = "";
    foreach (var node in htmlDoc.DocumentNode.ChildNodes)
    {
        filteredHtml += node.InnerText;
    }
 
    return filteredHtml;
}

就实现了目的,结果是html的内容,没有tag,没有comment:

”          Frigidaire Mini Air Conditioner       Frigidaire’s FRA052XT7 5,000 BTU 115-Volt Window-Mounted Mini-Compact Air Conditioner is perfect for rooms up to 150 square feet.  It quickly cools a room on hot days and quiet operation keeps you cool without keeping you awake. This unit features mechanical rotary  controls and top, full-width, 2-way air direction control. The antimicrobial mesh filter with side, slide-out access cleans the air  removing harmful bacteria. Low voltage start-up conserves energy and saves you money 。。。。。。。。。。。。。。

 

【总结】

想要去除html的tag,并且不保留对应的comment,那么可以用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
using HtmlAgilityPack;
 
public string htmlRemoveTag(string html)
{
    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
    htmlDoc.LoadHtml(html);
    if (htmlDoc == null)
    {
        return "";
    }
 
    // 1. remove all comments
    //(1)get all comment nodes using XPATH
    foreach (HtmlNode comment in htmlDoc.DocumentNode.SelectNodes("//comment()"))
    {
        //(2) remove comment node itself
        comment.ParentNode.RemoveChild(comment);
    }
 
    //2. get all content
    string filteredHtml = "";
    foreach (var node in htmlDoc.DocumentNode.ChildNodes)
    {
        filteredHtml += node.InnerText;
    }
 
    return filteredHtml;
}
1
 

转载请注明:在路上 » 【已解决】C#去除Html的tag且同时去除注释

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.402 seconds, using 22.17MB memory