最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】WordPress Importer导入xml文件时,无法识别导入的作者Author

WordPress crifan 5019浏览 0评论

【已解决】WordPress Importer导入xml文件时,无法识别导入的作者Author

【背景】

在用python脚本导出对应wxr格式的xml文件后,然后去wordpress中使用WordPress Importer导入,结果选择文件后,第二步中,显示出来的内容,是无法识别出author信息,接下来的导入过程,好几十篇文章,会有好几篇会出错,而无法导入。

但是很奇怪的是,对于xml文件中的author,都是和其他另外几个xml文件是一样的,而其他几个xml文件,都是可以正常导入的。

所以,看来是这个xml文件很特殊,以至于无法导入。

【解决过程】

1.记录下来了那几篇导入失败的文章是:

Failed to import 文章 “USB基础知识概论 v0.5.pdf”
Failed to import 文章 “【转】各种嵌入式软硬件公司的简介”
Failed to import 文章 “【转】UNIX/LINUX 平台可执行文件格式分析”
Failed to import 文章 “键盘Keyboard中的扫描码Scan Code 通码Make code 断码Break Code”
Failed to import 文章 “【转】字符编码笔记:ASCII,Unicode和UTF-8”
Failed to import 文章 “字符编码标准及存储交换标准”
Failed to import 文章 “【转】怎样花两年时间去面试一个人”
Failed to import 文章 “【to learn】关于实时自动化相关学习知识的网站:Real Time Automation”
Failed to import 文章 “【已解决】运行wp-admin/install.php去安装wordpress,出错:您的 PHP 似乎没有安装运行 WordPress 所必需的 MySQL 扩展。”
Failed to import 文章 “记录 wordpress折腾过程”
Failed to import 文章 “【已解决】python中文字符乱码(GB2312,GBK,GB18030相关的问题)”
Failed to import 文章 “【已解决】Python中,将一个字符串eval或ast.literal_eval变成字典后,unicode的字符变成了\x格式”
Failed to import 文章 “【已解决】wordpress中,url地址包含的中文,虽然已经过urllib.quote解析过了,但是却还是访问出错:403 Forbidden”
Failed to import 文章 “【整理】wordpress中的代码/语法高亮插件:WP SyntaxHighlighter”
Failed to import 文章 “【已解决】想要通过分析网易博客的html源码,以得知网易是如何获得一个帖子的评论的”

对应的去看了下,发现这几篇文章,多数都是内容很长,所以猜测,会不会是由于wordpress中,不支持单篇文章内容太长,就像很多博客,比如百度空间,网易博客等,单篇日志内容超过一定长度,就不给发表了。

所以,尝试了下,去把xml中,这几篇内容,都去除掉绝大部分,保证内容很少,结果再去导入,还是失败了。

相关的失败信息是:

Import WordPress
Failed to import author . Their posts will be attributed to the current user.
和:

Import WordPress
Failed to import author . Their posts will be attributed to the current user.
Failed to import “”: Invalid post type

All done. Have fun!

2.中间折腾了几次,尝试了把那几篇文章,单独的拷贝出来,加上对应的wxr的头,保证是有效的wxr文件,然后再去单独导入,好像是可以的,这就说明,这些文章,是有效的。但是不知道,为何那个blog_163_[againinput4]_20120102_1812-2.xml还是不能识别出author,还是会导入失败。

3.网上搜了下,关于使用WordPress Importer过程中,遇到无法识别author的情况,好像其他人没遇到这样的问题。

不过倒是在wordpress的关于wordpress-importer的论坛中:

http://wordpress.org/tags/wordpress-importer?forum_id=10

找到了这个帖子:

[Plugin: WordPress Importer] wordpress importer problem: all posts’ author become admin

http://wordpress.org/support/topic/plugin-wordpress-importer-wordpress-importer-problem-all-posts-author-become-admin

其中老外说到了,启用WordPress Importer的debug功能,即

wp-contentpluginswordpress-importerwordpress-importer.php的第17行,把

define( ‘IMPORT_DEBUG’, false );

改为:
define( ‘IMPORT_DEBUG’, true );

然后去重新启动apache,重新退出再登陆wordpress。

再去导入,结果可以看到详细的导入过程的信息了:

88:1615 CData section not finished
【记录】DocBook开发过程 – 2   <p></p> <hr

88:1615 PCDATA invalid Char value 16

88:1624 Entity ‘nbsp’ not defined

88:1630 Entity ‘nbsp’ not defined

88:1643 PCDATA invalid Char value 16

88:1672 PCDATA invalid Char value 16

88:1725 PCDATA invalid Char value 5

88:1886 PCDATA invalid Char value 3

88:1887 PCDATA invalid Char value 4

88:2092 PCDATA invalid Char value 4

88:2102 Entity ‘nbsp’ not defined

88:2108 Entity ‘nbsp’ not defined

88:2114 Entity ‘nbsp’ not defined

88:2126 PCDATA invalid Char value 3

88:2156 PCDATA invalid Char value 16

88:2175 Entity ‘nbsp’ not defined

88:2176 PCDATA invalid Char value 4

88:2229 PCDATA invalid Char value 5

88:2235 PCDATA invalid Char value 17

88:2287 PCDATA invalid Char value 25

88:2311 PCDATA invalid Char value 3

88:2312 PCDATA invalid Char value 4

88:2603 PCDATA invalid Char value 3

88:2605 PCDATA invalid Char value 16

88:2613 PCDATA invalid Char value 5

88:2658 PCDATA invalid Char value 6

88:2666 PCDATA invalid Char value 25

88:2686 PCDATA invalid Char value 3

88:2687 PCDATA invalid Char value 4

88:2696 PCDATA invalid Char value 24

88:2714 Entity ‘nbsp’ not defined

88:2720 Entity ‘nbsp’ not defined

88:2726 Entity ‘nbsp’ not defined

88:2732 Entity ‘nbsp’ not defined

88:2738 Entity ‘nbsp’ not defined

88:2744 Entity ‘nbsp’ not defined

88:2750 Entity ‘nbsp’ not defined

88:3071 PCDATA invalid Char value 4

88:3085 PCDATA invalid Char value 3

88:3086 PCDATA invalid Char value 4

88:3212 PCDATA invalid Char value 16

88:3306 PCDATA invalid Char value 24

88:3388 PCDATA invalid Char value 3

88:3389 PCDATA invalid Char value 4

88:3395 Opening and ending tag mismatch: encoded line 88 and p

88:3401 Opening and ending tag mismatch: item line 81 and pre

88:4590 Entity ‘nbsp’ not defined

88:4596 Entity ‘nbsp’ not defined

88:4602 Entity ‘nbsp’ not defined

88:4608 Entity ‘nbsp’ not defined

88:4614 Entity ‘nbsp’ not defined

88:4620 Entity ‘nbsp’ not defined

88:4626 Entity ‘nbsp’ not defined

88:4686 Entity ‘nbsp’ not defined

88:4692 Entity ‘nbsp’ not defined

88:4698 Entity ‘nbsp’ not defined

88:4704 Entity ‘nbsp’ not defined

88:4710 Entity ‘nbsp’ not defined

88:4716 Entity ‘nbsp’ not defined

88:4722 Entity ‘nbsp’ not defined

88:4795 Entity ‘nbsp’ not defined

88:4801 Entity ‘nbsp’ not defined

88:4807 Entity ‘nbsp’ not defined

88:4813 Entity ‘nbsp’ not defined

88:4819 Entity ‘nbsp’ not defined

88:4825 Entity ‘nbsp’ not defined

88:4831 Entity ‘nbsp’ not defined

88:4895 Entity ‘nbsp’ not defined

88:4901 Entity ‘nbsp’ not defined

88:4907 Entity ‘nbsp’ not defined

88:4913 Entity ‘nbsp’ not defined

88:4919 Entity ‘nbsp’ not defined

88:4925 Entity ‘nbsp’ not defined

88:4931 Entity ‘nbsp’ not defined

88:4993 Entity ‘nbsp’ not defined

88:4999 Entity ‘nbsp’ not defined

88:5005 Entity ‘nbsp’ not defined

88:5011 Entity ‘nbsp’ not defined

88:5017 Entity ‘nbsp’ not defined

88:5023 Entity ‘nbsp’ not defined

88:5029 Entity ‘nbsp’ not defined

88:5102 Entity ‘nbsp’ not defined

88:5108 Entity ‘nbsp’ not defined

88:5114 Entity ‘nbsp’ not defined

88:5120 Entity ‘nbsp’ not defined

88:5126 Entity ‘nbsp’ not defined

88:5132 Entity ‘nbsp’ not defined

88:5138 Entity ‘nbsp’ not defined

88:5267 Entity ‘nbsp’ not defined

88:6279 Entity ‘nbsp’ not defined

88:7389 Entity ‘nbsp’ not defined

88:8733 Entity ‘nbsp’ not defined

88:10637 Entity ‘nbsp’ not defined

88:10699 Entity ‘nbsp’ not defined

88:10769 Entity ‘nbsp’ not defined

88:10835 Entity ‘nbsp’ not defined

88:10901 Entity ‘nbsp’ not defined

88:10949 Entity ‘nbsp’ not defined

88:10978 Entity ‘nbsp’ not defined

88:11071 Entity ‘nbsp’ not defined

88:11077 Entity ‘nbsp’ not defined

88:11089 Entity ‘nbsp’ not defined

88:11205 Entity ‘nbsp’ not defined

88:11282 Entity ‘nbsp’ not defined

88:11364 Entity ‘nbsp’ not defined

88:11888 Entity ‘nbsp’ not defined

88:12237 Entity ‘nbsp’ not defined

88:12708 Entity ‘nbsp’ not defined

88:12777 Entity ‘nbsp’ not defined

88:13282 Entity ‘nbsp’ not defined

88:13288 Entity ‘nbsp’ not defined

88:13294 Entity ‘nbsp’ not defined

88:13300 Entity ‘nbsp’ not defined

88:13306 Entity ‘nbsp’ not defined

88:13312 Entity ‘nbsp’ not defined

88:13318 Entity ‘nbsp’ not defined

88:13324 Entity ‘nbsp’ not defined

88:13330 Entity ‘nbsp’ not defined

88:13336 Entity ‘nbsp’ not defined

88:13424 Entity ‘nbsp’ not defined

88:13512 Entity ‘nbsp’ not defined

88:13518 Entity ‘nbsp’ not defined

88:13524 Entity ‘nbsp’ not defined

88:13530 Entity ‘nbsp’ not defined

88:13536 Entity ‘nbsp’ not defined

88:13542 Entity ‘nbsp’ not defined

88:13548 Entity ‘nbsp’ not defined

88:13554 Entity ‘nbsp’ not defined

88:13560 Entity ‘nbsp’ not defined

88:13566 Entity ‘nbsp’ not defined

88:13572 Entity ‘nbsp’ not defined

88:13578 Entity ‘nbsp’ not defined

88:13584 Entity ‘nbsp’ not defined

88:13590 Entity ‘nbsp’ not defined

88:13596 Entity ‘nbsp’ not defined

88:13602 Entity ‘nbsp’ not defined

88:13608 Entity ‘nbsp’ not defined

88:13614 Entity ‘nbsp’ not defined

88:15521 Sequence ‘]]>’ not allowed in content

88:15521 internal error
88:15521 Extra content at the end of the document

There was an error when reading this WXR file
Details are shown above. The importer will now try again with a different parser…

 

Assign Authors
To make it easier for you to edit and save the imported content, you may want to reassign the author of the imported item to an existing user of this site. For example, you may want to import all the entries as
admin
s entries.

If a new user is created by WordPress, a new password will be randomly generated and the new user’s role will be set as subscriber. Manually changing the new user’s details will be necessary.

Import author: ()
or create new user with login name:
or assign posts to an existing user: – Select – admin crifan crifan2
Import Attachments

Download and import file attachments

 

   可以看到,提示内容中说道了,是其中这篇帖子“【记录】DocBook开发过程 – 2”中,包含了一些字符,无法识别。

然后就去看到,到底该帖子,是何内容:

原先网易原帖:

http://againinput4.blog.163.com/blog/static/172799491201110111145259/

中的内容是:

【已解决】WordPress Importer导入xml文件时,无法识别导入的作者Author - crifan - work and job

 对应的xml中的内容是:

【已解决】WordPress Importer导入xml文件时,无法识别导入的作者Author - crifan - work and job

 可以看到xml中,是包含了一些ascii的控制字符,DLE=16,EQT=4,等等,这也就是上面WordPress Importer导入过程中提示的:

1213:2156 PCDATA invalid Char value 16,1213:2176 PCDATA invalid Char value 4。

所以,去xml中,把上述这些ASCII控制字符删除掉,再去重新导入,就可以正常识别作者是crifan,可以正常的导入所有的文章了。

【总结】

WordPress Importer导入WXR格式的xml文件的过程中,如果出现无法识别作者等类似错误信息,可以去

wp-contentpluginswordpress-importerwordpress-importer.php的第17行,把

define( ‘IMPORT_DEBUG’, false );

改为:
define( ‘IMPORT_DEBUG’, true );

即打开debug功能,这样就可以看到详细的导入过程中,到底发生了啥,然后找到问题的原因所在,才能对症下药的去解决问题。

【题外话】

感慨一句,对于wordpress,或者说至少是对于WordPress Importer,写这些代码的人,的确是高手啊。

把对应的debug等功能,都做的如此好,使得即使出了问题,打开debug,就很容易找到到底出了啥问题。

另外想感慨的是,开源的世界里,真的是很爽。一切问题,只要有源码,都是可以解决的。即使不能解决,那也是自己知识不够,但是可以让更懂的人,去帮你解决。

所以,所有内容总结成那句经典的话:

有问题? looking the fucking code !

【提示】

关于ASCII的控制字符,不了解的可以去看:

ASCII字符集中的功能/控制字符

http://bbs.chinaunix.net/thread-3608423-1-1.html


【后记 2012-01-08】

后来又去测试了,对于wordpress importer到底支持哪些ascii的控制字符,

然后就插入了所有的ASCII的控制字符,用于测试:

【已解决】WordPress Importer导入xml文件时,无法识别导入的作者Author - crifan - work and job

 

将其导入进去,结果是:

对于测试所有的ascii的控制字符0-0x20和0x7F,结果是:
89:7 Char 0x0 out of allowed range

89:7 CData section not finished
All ASCII control char test
00 N

89:7 Premature end of data in tag encoded line 88

89:7 Premature end of data in tag item line 81

89:7 Premature end of data in tag channel line 31

89:7 Premature end of data in tag rss line 23

There was an error when reading this WXR file
Details are shown above. The importer will now try again with a different parser…

去掉0x0=NULL之后,结果是:
89:7 CData section not finished
All ASCII control char test
01 S

89:7 PCDATA invalid Char value 1

90:7 PCDATA invalid Char value 2

91:7 PCDATA invalid Char value 3

92:7 PCDATA invalid Char value 4

93:7 PCDATA invalid Char value 5

94:7 PCDATA invalid Char value 6

95:7 PCDATA invalid Char value 7

96:6 PCDATA invalid Char value 8

100:6 PCDATA invalid Char value 11

101:6 PCDATA invalid Char value 12

103:6 PCDATA invalid Char value 14

104:6 PCDATA invalid Char value 15

105:7 PCDATA invalid Char value 16

106:7 PCDATA invalid Char value 17

107:7 PCDATA invalid Char value 18

108:7 PCDATA invalid Char value 19

109:7 PCDATA invalid Char value 20

110:7 PCDATA invalid Char value 21

111:7 PCDATA invalid Char value 22

112:7 PCDATA invalid Char value 23

113:7 PCDATA invalid Char value 24

114:6 PCDATA invalid Char value 25

115:7 PCDATA invalid Char value 26

116:7 PCDATA invalid Char value 27

117:6 PCDATA invalid Char value 28

118:6 PCDATA invalid Char value 29

119:6 PCDATA invalid Char value 30

120:6 PCDATA invalid Char value 31

123:1 Sequence ‘]]>’ not allowed in content

【结论】

证实了结果是,好像wordpress importer中,即wordpress本身博客系统中,只支持:
9=t=tab
10=n=LF=Line Feed=换行
13=r=CR=回车
32= =空格,

不支持:
0x0-0x19之间的剩下的那些,和0x7F=DEL=删除键

转载请注明:在路上 » 【已解决】WordPress Importer导入xml文件时,无法识别导入的作者Author

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
90 queries in 0.174 seconds, using 22.15MB memory