最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】pyspider中出错:TypeError __init__() got an unexpected keyword argument resultdb

pyspider crifan 4391浏览 0评论

折腾:

【已解决】PySpider中保存数据到mysql

期间,解决了之前的错误,又出现别的错误:

<code>➜  AutocarData pyspider -c config.json result_worker
Traceback (most recent call last):
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/pyspider", line 11, in &lt;module&gt;
    sys.exit(main())
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker
    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
TypeError: __init__() got an unexpected keyword argument 'resultdb'
</code>

TypeError: __init__() got an unexpected keyword argument ‘resultdb’

Error 500 in webUI result view when use mongodb as result db · Issue #251 · binux/pyspider

PySpider:一个国人编写的强大的网络爬虫系统并带有强大的WebUI – Python开发 – 评论 | CTOLib码库

感觉很怪。

干脆去掉resultdb,改为:

<code>{
  "taskdb":     "mysql://root:[email protected]:3306/AutohomeTaskdb",
  "projectdb":  "mysql://root:[email protected]:3306/AutohomeProjectdb",
  "resultdb":   "mysql://root:[email protected]:3306/AutohomeResultdb",
  "result_worker":{
      "result_cls": "AutohomeResultWorker.AutohomeResultWorker"
   }
}
</code>

结果:

<code>➜  AutocarData pyspider -c config.json result_worker
Traceback (most recent call last):
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/pyspider", line 11, in &lt;module&gt;
    sys.exit(main())
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker
    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/utils.py", line 355, in __getattr__
    return ret.__get__(self, ObjectDict)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/utils.py", line 342, in __get__
    return self.getter()
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 43, in &lt;lambda&gt;
    return utils.Get(lambda: connect_database(value))
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/database/__init__.py", line 44, in connect_database
    db = _connect_database(url)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/database/__init__.py", line 54, in _connect_database
    raise Exception('wrong scheme format: %s' % parsed.scheme)
Exception: wrong scheme format: mysql
</code>

难道是:

之前

【已解决】pyspider中运行result_worker出错:ModuleNotFoundError No module named mysql

弄的

ConfigParser.py

导致的参数解析的问题?

那去掉试试,换个名字

<code>➜  AutocarData mv /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParser.py /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParser.py_backup
➜  AutocarData ll /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParse*
-rw-r--r--  1 crifan  staff    52K  5  5 22:31 /Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/ConfigParser.py_backup
</code>

还是:

<code>    raise Exception('wrong scheme format: %s' % parsed.scheme)
Exception: wrong scheme format: mysql
</code>

再改回:

<code>{
  "taskdb":     "mysql+taskdb://root:[email protected]:3306/AutohomeTaskdb",
  "projectdb":  "mysql+projectdb://root:[email protected]:3306/AutohomeProjectdb",
  "resultdb":   "mysql+resultdb://root:[email protected]:3306/AutohomeResultdb",
  "result_worker":{
      "result_cls": "AutohomeResultWorker.AutohomeResultWorker"
   }
}
</code>

结果:

又回到之前的错误了:

<code>➜  AutocarData pyspider -c config.json result_worker
Traceback (most recent call last):
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/bin/pyspider", line 11, in &lt;module&gt;
    sys.exit(main())
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 754, in main
    cli()
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker
    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
TypeError: __init__() got an unexpected keyword argument 'resultdb'
</code>

TypeError __init__() got an unexpected keyword argument resultdb

Error 500 in webUI result view when use mongodb as result db · Issue #251 · binux/pyspider

pyspider/setup.py at master · binux/pyspider

好像是:

不应该加上这个resultdb参数的?

然后去掉,用:

<code>➜  AutocarData pyspider -c config.json
phantomjs fetcher running on port 25555
Process Process-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker
    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
TypeError: __init__() got an unexpected keyword argument 'resultdb'
[I 180508 20:38:19 tornado_fetcher:638] fetcher starting...
[I 180508 20:38:19 processor:211] processor starting...
[I 180508 20:38:19 scheduler:647] scheduler starting...
[I 180508 20:38:19 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 180508 20:38:19 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 180508 20:38:20 app:76] webui running on 0.0.0.0:5000
</code>

可以正常运行了。

但是不知道内部到底是否真的用到了:

mysql的resultdb

看到现在project都没了:

难道是配置中的:

<code>"projectdb":  "mysql+projectdb://root:[email protected]:3306/AutohomeProjectdb",
</code>

生效了?

因为此处projectdb是空的:

那去去掉projectdb的配置,然后重新运行试试

<code>{
  "resultdb":   "mysql+resultdb://root:[email protected]:3306/AutohomeResultdb",
  "result_worker":{
      "result_cls": "AutohomeResultWorker.AutohomeResultWorker"
   }
}
</code>

结果:

注意到,log中还是输出了:

<code>Process Process-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.6.4_4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/run.py", line 299, in result_worker
    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)
TypeError: __init__() got an unexpected keyword argument 'resultdb'
</code>

然后project是出来了:

去看看源码:

pyspider/run.py at master · binux/pyspider

<code>@cli.command()

@click.option('--result-cls', default='pyspider.result.ResultWorker', callback=load_cls,
              help='ResultWorker class to be used.')
@click.pass_context
def result_worker(ctx, result_cls, get_object=False):
    """
    Run result worker.
    """
    g = ctx.obj
    ResultWorker = load_cls(None, None, result_cls)

    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)

    g.instances.append(result_worker)
    if g.get('testing_mode') or get_object:
        return result_worker

    result_worker.run()
</code>

好像是自己的此处的继承ResultWorker的写法有问题?

去看了源码:

果然是的

所以去改为:

<code>class AutohomeResultWorker(ResultWorker):
    mysqldb = None

    def __init__(self, resultdb, inqueue):
        """init mysql db"""
        print("AutohomeResultWorker init: resultdb=%, inqueue=%s" % (resultdb, inqueue))
        super.__init__(resultdb, inqueue)

        if self.mysqldb is None:
            self.mysqldb = MysqlDb()
            print("self.mysqldb=%s" % self.mysqldb)
</code>

结果:

期间,先去解决:

【已解决】Python中继承父类如何重写__init__以自定义初始化

然后貌似用代码:

<code>import pymysql
import pymysql.cursors
from pyspider.result import ResultWorker

class AutohomeResultWorker(ResultWorker):
    # mysqldb = None

    def __init__(self, resultdb, inqueue):
        """init mysql db"""
        print("AutohomeResultWorker init")
        print("resultdb=%s, inqueue=%s" % (resultdb, inqueue))
        ResultWorker.__init__(self, resultdb, inqueue)

        # print("self.mysqldb=%s" % (self.mysqldb))
        # if self.mysqldb is None:
        self.mysqldb = MysqlDb()
        print("self.mysqldb=%s" % self.mysqldb)

    def on_result(self, task, result):
        """override pyspider on_result to save data into mysql"""
        # assert task['taskid']
        # assert task['project']
        # assert task['url']
        # assert result
        print("on_result: result=%s" % result)
        insertOk = self.mysqldb.insert(result)
        print("insertOk=%s" % insertOk)

class MysqlDb:
...
</code>

就可以了?

至少正常运行,没有错误了:

<code>➜  AutocarData pyspider -c config.json all
phantomjs fetcher running on port 25555
AutohomeResultWorker init
resultdb=&lt;pyspider.database.mysql.resultdb.ResultDB object at 0x1025b3c18&gt;, inqueue=&lt;pyspider.libs.multiprocessing_queue.MultiProcessingQueue object at 0x1025b3a20&gt;
connect mysql ok, self.connection= &lt;pymysql.connections.Connection object at 0x102763d30&gt;
Connect mysql return True
self.mysqldb=&lt;AutohomeResultWorker.MysqlDb object at 0x102763cc0&gt;
[I 180508 21:13:12 result_worker:49] result_worker starting...
[I 180508 21:13:12 tornado_fetcher:638] fetcher starting...
[I 180508 21:13:12 processor:211] processor starting...
[I 180508 21:13:12 scheduler:647] scheduler starting...
[I 180508 21:13:12 scheduler:126] project autohomeBrandData updated, status:TODO, paused:False, 0 tasks
[I 180508 21:13:12 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 180508 21:13:12 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 180508 21:13:12 app:76] webui running on 0.0.0.0:5000
</code>

然后接着去调试看看,最终能否调用到:

resultdb,执行到此处的AutohomeResultWorker中的on_result

【总结】

此处之所以出错:

    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)

TypeError: __init__() got an unexpected keyword argument ‘resultdb’

原因是:

之前继承ResultWorker的AutohomeResultWorker的__init__初始化写的有问题

写成了:

<code>def __init__(self):
</code>

后来是参考:

pyspider/result/result_worker.py

的源码:

<code>class ResultWorker(object):

    def __init__(self, resultdb, inqueue):
        self.resultdb = resultdb
        self.inqueue = inqueue
        self._quit = False
</code>

看到是除了self外,还有2个参数:resultdb和inqueue

所以自己的继承该类的代码也要有这些参数才对。

然后再通过:

【已解决】Python中继承父类如何重写__init__以自定义初始化

搞清楚了如何调用父类去__init__

然后改为正确的写法:

<code>class AutohomeResultWorker(ResultWorker):

    def __init__(self, resultdb, inqueue):
        """init mysql db"""
        print("AutohomeResultWorker init")
        print("resultdb=%s, inqueue=%s" % (resultdb, inqueue))
        ResultWorker.__init__(self, resultdb, inqueue)

        # print("self.mysqldb=%s" % (self.mysqldb))
        # if self.mysqldb is None:
        self.mysqldb = MysqlDb()
        print("self.mysqldb=%s" % self.mysqldb)
</code>

心得:

  • 要认真分析错误提示,从错误提示入手,找到错误的原因和线索,然后顺藤摸瓜去找到问题根源,然后才能解决掉

分析过程:

还是要认真看人家显示出来的错误的提示信息:

TypeError: __init__() got an unexpected keyword argument ‘resultdb’

意思是:

__init__收到了一个,没有想到的,没有期望,的参数,resultdb

而此处的错误行的代码是:

    result_worker = ResultWorker(resultdb=g.resultdb, inqueue=g.processor2result)

-》所以自己当时认真看错误提示,应该能想到:

此处应该去找ResultWorker方面的问题。

-〉最后是找到了是自己继承人家的ResultWorker的__init__写法有误

-》最终才改为正确初始化写法,才解决此问题的。

转载请注明:在路上 » 【已解决】pyspider中出错:TypeError __init__() got an unexpected keyword argument resultdb

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
84 queries in 0.180 seconds, using 22.24MB memory