折腾:
【未解决】php中用正则过滤html中code中多余span标签
期间,印象笔记帖子内容中html有嵌套,所以用之前正则不是很好写,所以考虑用html的lib去解析和处理。
php html lib
【未解决】用php的html库php-html-parser去解析处理印象笔记html源码
再去试试另外的库:
{ "require" : { "masterminds/html5": "^2.0" } }
安装:
➜ html5-php composer install Loading composer repositories with package information Updating dependencies (including require-dev) Package operations: 1 install, 0 updates, 0 removals - Installing masterminds/html5 (2.7.0): Downloading (100%) Writing lock file Generating autoload files
![](https://www.crifan.com/files/pic/uploads/2021/03/95f9879a3619481096b2513fbbd40b52.jpg)
但是具体如何使用:
没解释
去看看:
用代码:
<?php // Assuming you installed from Composer: require "vendor/autoload.php"; use Masterminds\HTML5; $originEvernoteHtml = 'xxx'; // $originEvernoteHtml = "<div>" . $originEvernoteHtml . "</div>"; $originEvernoteHtml = "<html><head><title>parse evernote html</title></head><body>" . $originEvernoteHtml . "</body></html>"; // Parse the document. $dom is a DOMDocument. $html5 = new HTML5(); $dom = $html5->loadHTML($originEvernoteHtml); // Render it as HTML5: $htmlStr = $html5->saveHTML($dom); print $htmlStr; // $codeBlockHtml = $dom->find('div')[0]; // echo("codeBlockHtml=".$codeBlockHtml); // error_log($codeBlockHtml); ?>
是可以输出html:
![](https://www.crifan.com/files/pic/uploads/2021/03/bbde3f1a1cf9462e901895a33696f4e9.jpg)
![](https://www.crifan.com/files/pic/uploads/2021/03/2f4f9ea4e542445a8d0a9263b4f104f5.jpg)
但是却不支持解析
那去换别的库:
【未解决】用php的html解析库simplehtmldom解析印象笔记帖子的html源码
转载请注明:在路上 » 【未解决】php中用html解析库去解析处理印象笔记的html源码