折腾:
【未解决】php中用正则过滤html中code中多余span标签
期间,印象笔记帖子内容中html有嵌套,所以用之前正则不是很好写,所以考虑用html的lib去解析和处理。
php html lib
【未解决】用php的html库php-html-parser去解析处理印象笔记html源码
再去试试另外的库:
{ "require" : { "masterminds/html5": "^2.0" } }
安装:
➜ html5-php composer install Loading composer repositories with package information Updating dependencies (including require-dev) Package operations: 1 install, 0 updates, 0 removals - Installing masterminds/html5 (2.7.0): Downloading (100%) Writing lock file Generating autoload files
但是具体如何使用:
没解释
去看看:
用代码:
<?php // Assuming you installed from Composer: require "vendor/autoload.php"; use Masterminds\HTML5; $originEvernoteHtml = 'xxx'; // $originEvernoteHtml = "<div>" . $originEvernoteHtml . "</div>"; $originEvernoteHtml = "<html><head><title>parse evernote html</title></head><body>" . $originEvernoteHtml . "</body></html>"; // Parse the document. $dom is a DOMDocument. $html5 = new HTML5(); $dom = $html5->loadHTML($originEvernoteHtml); // Render it as HTML5: $htmlStr = $html5->saveHTML($dom); print $htmlStr; // $codeBlockHtml = $dom->find('div')[0]; // echo("codeBlockHtml=".$codeBlockHtml); // error_log($codeBlockHtml); ?>
是可以输出html:
但是却不支持解析
那去换别的库:
【未解决】用php的html解析库simplehtmldom解析印象笔记帖子的html源码
转载请注明:在路上 » 【未解决】php中用html解析库去解析处理印象笔记的html源码