折腾:
【记录】用Python的Scrapy去爬取Youtube中Humf的字幕
期间,用代码:
<code>class YoutubesubtitleSpider(scrapy.Spider): def jsonToStr(jsonDict, indent=2): return json.dumps(jsonDict, indent=indent, ensure_ascii=False) # errorJsonStr = json.dumps(decodedLinksDict, indent=2, ensure_ascii=False) errorJsonStr = self.jsonToStr(decodedLinksDict) </code>
出错:
<code>2018-03-07 14:58:51 [scrapy.core.scraper] ERROR: Spider error processing <POST http://www.yousubtitles.com/loadvideo/YEH3-7UVvqQ> (referer: None) Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output for x in result: File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr> return (_set_referer(r) for r in result or ()) File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr> return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr> return (r for r in result or () if _filter(r)) File "/Users/crifan/dev/dev_root/company/naturling/projects/scrapy/youtubeSubtitle/youtubeSubtitle/spiders/YoutubeSubtitle.py", line 139, in parseLoadVideoResp errorJsonStr = self.jsonToStr(decodedLinksDict) File "/Users/crifan/dev/dev_root/company/naturling/projects/scrapy/youtubeSubtitle/youtubeSubtitle/spiders/YoutubeSubtitle.py", line 35, in jsonToStr return json.dumps(jsonDict, indent=indent, ensure_ascii=False) File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 251, in dumps sort_keys=sort_keys, **kw).encode(obj) File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 209, in encode chunks = list(chunks) File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 442, in _iterencode o = _default(o) File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 184, in default raise TypeError(repr(o) + " is not JSON serializable") TypeError: <YoutubesubtitleSpider 'YoutubeSubtitle' at 0x106102890> is not JSON serializable </code>
以为是空的dict导致出错的,去改为:
<code>def jsonToStr(jsonDict, indent=2): if jsonDict: return json.dumps(jsonDict, indent=indent, ensure_ascii=False) else: return "{}" </code>
问题依旧。
再去看看,当时出错的值是什么
python – How to JSON serialize sets? – Stack Overflow
python – set object is not JSON serializable – Stack Overflow
当时出错的值是:
{u’error’: 1}
{}
是的dict啊
不应该出错啊
但是为何传递过来就又变了:
重新改为:
<code>self.logger.info("type(decodedLinksDict)=%s", type(decodedLinksDict)) # errorJsonStr = json.dumps(decodedLinksDict, indent=2, ensure_ascii=False) # errorJsonStr = self.jsonToStr(decodedLinksDict) errorJsonStr = self.jsonToStr(jsonDict=decodedLinksDict) </code>
再去调试:
感觉发现问题了,类中的参数,第一个必须是self
改为:
<code>def jsonToStr(self, jsonDict, indent=2): </code>
然后再去试试
好像就对了:
python dict raise TypeError(repr(o) + ” is not JSON serializable”)
【总结】
Python中类中的函数,第一个参数必须是self,所以之前写成:
<code>def jsonToStr(jsonDict, indent=2): </code>
则传入的参数,本来是:
jsonDict={u’error’: 1}
结果变成了:
jsonDict=类的实例=<YoutubesubtitleSpider ‘YoutubeSubtitle’ at 0x102c708d0>
所以导致后续的json.dumps无法转换为字符串。
解决办法是:
改为:
<code>def jsonToStr(self, jsonDict, indent=2): </code>
即可正常给jsonToStr传递参数,正常执行:
<code>json.dumps(jsonDict, indent=indent, ensure_ascii=False) </code>
输出字符串。
转载请注明:在路上 » 【已解决】Python中json的dumps出错:raise TypeError(repr(o) + ” is not JSON serializable”)