最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未解决】用php的html库php-html-parser去解析处理印象笔记html源码

PHP crifan 323浏览 0评论
折腾:
【未解决】php中用html解析库去解析处理印象笔记的html源码
期间,先去试试star最多的:
php-html-parser
看看能否处理印象笔记中的html源码
下载到代码后:
去看看如何使用
<?php
use PHPHtmlParser\Dom;
直接报错
Fatal error: Uncaught Error: Class 'PHPHtmlParser\Dom' not found in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/useHtmlLibParseEvernoteHtml.php:12Stack trace: #0 {main} thrown in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/useHtmlLibParseEvernoteHtml.php on line 12
看来没法直接包含代码后去运行?
问题转换为:
php中如何直接引用某文件夹下php库的源码
php use Fatal error Uncaught Error Class not found in
oop – Why am I getting PHP Fatal error: Uncaught Error: Class ‘MyClass’ not found? – Stack Overflow
orm – PHP Fatal error: Uncaught Error: Class not found – Stack Overflow
<?php
require_once 'PHPHtmlParser/Dom.php';
use PHPHtmlParser\Dom;
结果:
Fatal error: Uncaught Error: Class 'PHPHtmlParser\Dom\AbstractNode' not found in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/PHPHtmlParser/Dom.php:139Stack trace: #0 /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/useHtmlLibParseEvernoteHtml.php(15): PHPHtmlParser\Dom->load('<div><br /></di...') #1 {main} thrown in /Users/crifan/dev/dev_root/crifan.com/mamp/test/useHtmlLibParseEvernoteHtml/PHPHtmlParser/Dom.php on line 139
属于内部调用还是报错
PHP Uncaught Error: Class not found using composer autoload – Stack Overflow
php – How to fix “Class ” not found” error – DEV Community 👩‍💻👨‍💻
PHP:Fatal error: Class ‘COM’ not found in … 的处理办法 – 蒗若晨曦 – CSDN博客
算了,干脆还是去:
问题转换为:
【已解决】Mac中安装和使用composer安装php的库php-html-parser
但是用
Listen for XDebug
模式去调试,才能单步调试 单行运行:
但是对于印象笔记的html
<?php
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$originEvernoteHtml = '<div><br /></div><div>此处包含要测试的内容,包括code代码:</div><div style="box-sizing: border-box; padding: 8px; font-family: Monaco, Menlo, Consolas, &quot;Courier New&quot;, monospace; font-size: 12px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background-color: rgb(251, 250, 248); border: 1px solid rgba(0, 0, 0, 0.14902);-en-codeblock:true;"><div><span style="font-size: 12px; font-family: Monaco;">some code include</span></div><div><span style="font-size: 12px; font-family: Monaco;">little &lt;</span></div><div><span style="font-size: 12px; font-family: Monaco;">greater &gt;</span></div><div><span style="font-size: 12px; font-family: Monaco;">at &amp;</span></div><div><span style="font-size: 12px; font-family: Monaco;">和其他字符</span></div></div><div>希望同步后,不要:</div><div>有多余的code</div><div>html字符不要被转义</div><div><br /></div><div>另外再去看看,之前出bug的代码</div><div>好像是中间包含多个空行?的代码</div><div style="box-sizing: border-box; padding: 8px; font-family: Monaco, Menlo, Consolas, &quot;Courier New&quot;, monospace; font-size: 12px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; background-color: rgb(251, 250, 248); border: 1px solid rgba(0, 0, 0, 0.14902);-en-codeblock:true;"><div># Author: Crifan Li</div><div># Function: Batch make for all gitbooks</div><div># Version: 20190716</div><div>#</div><div># [Note]</div><div># 1. this makefile should be located in</div><div># /Users/crifan/dev/dev_root/gitbook/gitbook_src_root/common</div><div><div><br /></div><div><br /></div></div><div><div>SUB_BOOKS=$(shell ls ../books)</div><div><br /></div></div><div><div>BOOKS_SRC_ROOT=$(shell cd ../books &amp;&amp; pwd)</div><div><br /></div></div><div><div><br /></div><div><br /></div></div><div># Batch make for all gitbooks</div><div><div>help debug_dir init sync_content clean_all website pdf epub mobi all upload commit deploy:</div><div><br /></div></div><div>  @echo "Current path="`pwd`;</div><div>  @echo "LS_OUTPUT="$(SUB_BOOKS);</div><div>  @echo "BOOKS_SRC_ROOT="$(BOOKS_SRC_ROOT);</div><div><div>  @for each_item in $(SUB_BOOKS); \</div><div><br /></div></div><div><div>  do \</div><div><br /></div></div><div><div>    if [ -d $(BOOKS_SRC_ROOT)/$$each_item ]; then \</div><div><br /></div></div><div><div>      cd $(BOOKS_SRC_ROOT)/$$each_item; \</div><div><br /></div></div><div><div>      echo `pwd`; \</div><div><br /></div></div><div><div>      if [ -f Makefile ]; then \</div><div><br /></div></div><div><div>        make [email protected] || exit "$$?"; \</div><div><br /></div></div><div><div>      fi; \</div><div><br /></div></div><div><div>      cd ..; \</div><div><br /></div></div><div><div>    fi; \</div><div><br /></div></div><div>  done;</div></div><div>看看效果</div><div><br /></div>';

$dom = new Dom;

$dom->load($originEvernoteHtml);

$codeBlockHtml = $dom->find('div')[0];
echo("codeBlockHtml=".$codeBlockHtml);
error_log($codeBlockHtml);

?>
却解析失败:
出现异常。

PHPHtmlParser\Exceptions\ChildNotFoundException: Child '135' next not found in this node.
-》说明此处的PHPHtmlParser库
兼容性并不是很好
继续运行,结果还是各种exception:
所以,暂时还是放弃这个库。
不过突然想到,或许是此处的html源码,不是合法的?
因为缺了最外层的div了?
所以解析报错了?
那去加上试试
$originEvernoteHtml = "<div>" . $originEvernoteHtml . "</div>";
再去调试看看结果,问题依旧。
那去参考:
Masterminds/html5-php: An HTML5 parser and serializer for PHP.
加上
$originEvernoteHtml = "<html><head><title>parse evernote html</title></head><body>" . $originEvernoteHtml . "</body></html>";
结果问题依旧:
出现异常。
PHPHtmlParser\Exceptions\ChildNotFoundException: Child '1' next not found in this node.
paquettg/php-html-parser – Packagist
去试试’strict’ => false
$dom->setOptions([
    'strict' => false, // Set a global option to disable strict html parsing.
]);
看看是否还会报错,问题依旧。
以及试试loadStr
// $dom->load($originEvernoteHtml);
$dom->loadStr($originEvernoteHtml, []);
$html = $dom->outerHtml;
问题依旧。
所以只能放弃这个库了。

转载请注明:在路上 » 【未解决】用php的html库php-html-parser去解析处理印象笔记html源码

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
87 queries in 0.104 seconds, using 20.53MB memory