折腾:
【未解决】php中用html解析库去解析处理印象笔记的html源码
期间,先去试试star最多的:
看看能否处理印象笔记中的html源码
下载到代码后:
去看看如何使用
<?php use PHPHtmlParser\Dom;
直接报错
Fatal error: Uncaught Error: Class 'PHPHtmlParser\Dom' not found in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/useHtmlLibParseEvernoteHtml.php:12Stack trace: #0 {main} thrown in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/useHtmlLibParseEvernoteHtml.php on line 12
看来没法直接包含代码后去运行?
问题转换为:
php中如何直接引用某文件夹下php库的源码
php use Fatal error Uncaught Error Class not found in
<?php require_once 'PHPHtmlParser/Dom.php'; use PHPHtmlParser\Dom;
结果:
Fatal error: Uncaught Error: Class 'PHPHtmlParser\Dom\AbstractNode' not found in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/PHPHtmlParser/Dom.php:139Stack trace: #0 /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/useHtmlLibParseEvernoteHtml.php(15): PHPHtmlParser\Dom->load('<div><br /></di...') #1 {main} thrown in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/PHPHtmlParser/Dom.php on line 139
属于内部调用还是报错
算了,干脆还是去:
问题转换为:
【已解决】Mac中安装和使用composer安装php的库php-html-parser
但是用
Listen for XDebug
模式去调试,才能单步调试 单行运行:
但是对于印象笔记的html
<?php require "vendor/autoload.php"; use PHPHtmlParser\Dom; $originEvernoteHtml = '<div><br /></div><div>此处包含要测试的内容,包括code代码:</div><div style="box-sizing: border-box; padding: 8px; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 12px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background-color: rgb(251, 250, 248); border: 1px solid rgba(0, 0, 0, 0.14902);-en-codeblock:true;"><div><span style="font-size: 12px; font-family: Monaco;">some code include</span></div><div><span style="font-size: 12px; font-family: Monaco;">little <</span></div><div><span style="font-size: 12px; font-family: Monaco;">greater ></span></div><div><span style="font-size: 12px; font-family: Monaco;">at &</span></div><div><span style="font-size: 12px; font-family: Monaco;">和其他字符</span></div></div><div>希望同步后,不要:</div><div>有多余的code</div><div>html字符不要被转义</div><div><br /></div><div>另外再去看看,之前出bug的代码</div><div>好像是中间包含多个空行?的代码</div><div style="box-sizing: border-box; padding: 8px; font-family: Monaco, Menlo, Consolas, "Courier New", monospace; font-size: 12px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background-color: rgb(251, 250, 248); border: 1px solid rgba(0, 0, 0, 0.14902);-en-codeblock:true;"><div># Author: Crifan Li</div><div># Function: Batch make for all gitbooks</div><div># Version: 20190716</div><div>#</div><div># [Note]</div><div># 1. this makefile should be located in</div><div># /Users/crifan/dev/dev_root/gitbook/gitbook_src_root/common</div><div><div><br /></div><div><br /></div></div><div><div>SUB_BOOKS=$(shell ls ../books)</div><div><br /></div></div><div><div>BOOKS_SRC_ROOT=$(shell cd ../books && pwd)</div><div><br /></div></div><div><div><br /></div><div><br /></div></div><div># Batch make for all gitbooks</div><div><div>help debug_dir init sync_content clean_all website pdf epub mobi all upload commit deploy:</div><div><br /></div></div><div> @echo "Current path="`pwd`;</div><div> @echo "LS_OUTPUT="$(SUB_BOOKS);</div><div> @echo "BOOKS_SRC_ROOT="$(BOOKS_SRC_ROOT);</div><div><div> @for each_item in $(SUB_BOOKS); \</div><div><br /></div></div><div><div> do \</div><div><br /></div></div><div><div> if [ -d $(BOOKS_SRC_ROOT)/$$each_item ]; then \</div><div><br /></div></div><div><div> cd $(BOOKS_SRC_ROOT)/$$each_item; \</div><div><br /></div></div><div><div> echo `pwd`; \</div><div><br /></div></div><div><div> if [ -f Makefile ]; then \</div><div><br /></div></div><div><div> make $@ || exit "$$?"; \</div><div><br /></div></div><div><div> fi; \</div><div><br /></div></div><div><div> cd ..; \</div><div><br /></div></div><div><div> fi; \</div><div><br /></div></div><div> done;</div></div><div>看看效果</div><div><br /></div>'; $dom = new Dom; $dom->load($originEvernoteHtml); $codeBlockHtml = $dom->find('div')[0]; echo("codeBlockHtml=".$codeBlockHtml); error_log($codeBlockHtml); ?>
却解析失败:
出现异常。 PHPHtmlParser\Exceptions\ChildNotFoundException: Child '135' next not found in this node.
-》说明此处的PHPHtmlParser库
兼容性并不是很好
继续运行,结果还是各种exception:
所以,暂时还是放弃这个库。
不过突然想到,或许是此处的html源码,不是合法的?
因为缺了最外层的div了?
所以解析报错了?
那去加上试试
$originEvernoteHtml = "<div>" . $originEvernoteHtml . "</div>";
再去调试看看结果,问题依旧。
那去参考:
加上
$originEvernoteHtml = "<html><head><title>parse evernote html</title></head><body>" . $originEvernoteHtml . "</body></html>";
结果问题依旧:
出现异常。 PHPHtmlParser\Exceptions\ChildNotFoundException: Child '1' next not found in this node.
去试试’strict’ => false
$dom->setOptions([ 'strict' => false, // Set a global option to disable strict html parsing. ]);
看看是否还会报错,问题依旧。
以及试试loadStr
// $dom->load($originEvernoteHtml); $dom->loadStr($originEvernoteHtml, []); $html = $dom->outerHtml;
问题依旧。
所以只能放弃这个库了。