最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未解决】php中用html解析库去解析处理印象笔记的html源码

PHP crifan 1234浏览 0评论
折腾:
【未解决】php中用正则过滤html中code中多余span标签
期间,印象笔记帖子内容中html有嵌套,所以用之前正则不是很好写,所以考虑用html的lib去解析和处理。
php html lib
PHP library for HTML tag generation – Stack Overflow
paquettg/php-html-parser: An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.
Masterminds/html5-php: An HTML5 parser and serializer for PHP.
PHP Simple HTML DOM Parser
【未解决】用php的html库php-html-parser去解析处理印象笔记html源码
再去试试另外的库:
Masterminds/html5-php: An HTML5 parser and serializer for PHP.
{
  "require" : {
    "masterminds/html5": "^2.0"
  }
}
安装:
➜  html5-php composer install
Loading composer repositories with package information
Updating dependencies (including require-dev)
Package operations: 1 install, 0 updates, 0 removals
  - Installing masterminds/html5 (2.7.0): Downloading (100%)         
Writing lock file
Generating autoload files
但是具体如何使用:
https://github.com/Masterminds/html5-php
没解释
去看看:
Parsing and Serializing HTML5 in PHP | html5-php
Basic Usage · Masterminds/html5-php Wiki
Processor Instructions · Masterminds/html5-php Wiki
用代码:
<?php
// Assuming you installed from Composer:
require "vendor/autoload.php";
use Masterminds\HTML5;


$originEvernoteHtml = 'xxx';
// $originEvernoteHtml = "<div>" . $originEvernoteHtml . "</div>";
$originEvernoteHtml = "<html><head><title>parse evernote html</title></head><body>" . $originEvernoteHtml . "</body></html>";


// Parse the document. $dom is a DOMDocument.
$html5 = new HTML5();
$dom = $html5->loadHTML($originEvernoteHtml);
// Render it as HTML5:
$htmlStr = $html5->saveHTML($dom);
print $htmlStr;


// $codeBlockHtml = $dom->find('div')[0];
// echo("codeBlockHtml=".$codeBlockHtml);
// error_log($codeBlockHtml);


?>
是可以输出html:
但是却不支持解析
那去换别的库:
【未解决】用php的html解析库simplehtmldom解析印象笔记帖子的html源码

转载请注明:在路上 » 【未解决】php中用html解析库去解析处理印象笔记的html源码

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.190 seconds, using 22.09MB memory