折腾:
期间,需要去搞清楚,PySpider中:
如何发送POST请求,且带格式为application/x-www-form-urlencoded的form data
pyspider post with url-encoded
python – pyspider中利用self.crawl函数实现向服务器post用户名和密码,如何解决遇到的编码错误? – SegmentFault 思否
self.crawl – pyspider中文文档 – pyspider中文网
pyspider示例代码六:传递参数 – microman – 博客园
然后代码:
<code> @config(age=10 * 24 * 60 * 60) def index_page(self, response): # <ul class="list-user list-user-1" id="list-user-1"> for each in response.doc('ul[id^="list-user"] li a[href^="http"]').items(): self.crawl(each.attr.href, callback=self.detail_page) maxPageNum = 10 for curPageIdx in range(maxPageNum): curPageNum = curPageIdx + 1 print("curPageNum=%s" % curPageNum) getShowsUrl = "http://xxx/index.php?m=home&c=match_new&a=get_shows" headerDict = { "Content-Type": "application/x-www-form-urlencoded" } dataDict = { "counter": curPageNum, "order": 1, "match_type": 2, "match_name": "", "act_id": 3 } self.crawl( getShowsUrl, method="POST", headers=headerDict, data=dataDict, cookies=response.cookies, callback=self.parseGetShowsCallback ) def parseGetShowsCallback(self, response): print("parseGetShowsCallback: self=%s, response=%s"%(self, response)) </code>
是可以返回response了:
但是想要获得对应的json
然后再去:
pyspider response json
pyspider示例代码二:解析JSON数据 – microman – 博客园
Response – pyspider中文文档 – pyspider中文网
<code> def parseGetShowsCallback(self, response): print("parseGetShowsCallback: self=%s, response=%s"%(self, response)) respJson = response.json print("respJson=%s" % (respJson)) </code>
可以返回我们需要的json:
<code>parseGetShowsCallback: self=<x.x.x.Handler object at 0x10def6ac8>, response=<Response [200]> respJson={'status': 1, 'data': [{'id': '3293', 'uid': '878964', 'show_id': '104728193', 'course_id': '43716', 'supports': '107', 'rewards': '0', 'shares': '2', 'scores': '65.00', 'status': '1', 'match_type': '2', 'create_time': '1513346405', 'act_id': '3', 'child_type': '1', 'show_score': '100', 'head_img': 'https://x.x.x/avatar_2018-06-02_1527913844_9082951.jpeg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9dec28283.jpg', 'name': '徐欣蕊', 'href': '/index.php?m=home&c=match_new&a=video&show_id=104728193'}, {'id': '489', 'uid': '5697525', 'show_id': '103129621', 'course_id': '17734', 'supports': '104', 'rewards': '0', 'shares': '2', 'scores': '63.20', 'status': '1', 'match_type': '2', 'create_time': '1512737780', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'https://x.x.x/2018-06-23/5b2de55693ad9.jpg', 'cover_img': 'https://x.x.x/2018-06-04/5b14e22b8850a.jpg', 'name': '唐昕玥', 'href': '/index.php?m=home&c=match_new&a=video&show_id=103129621'}, {'id': '9', 'uid': '3977349', 'show_id': '103000234', 'course_id': '41758', 'supports': '94', 'rewards': '0', 'shares': '2', 'scores': '57.20', 'status': '1', 'match_type': '2', 'create_time': '1512685717', 'act_id': '3', 'child_type': '1', 'show_score': '95', 'head_img': 'https://x.x.x/2017-09-11/59b6363a9e099.jpg', 'cover_img': 'https://x.x.x/2017-03-15/58c8abf7eafb6.jpg', 'name': '梁多', 'href': '/index.php?m=home&c=match_new&a=video&show_id=103000234'}, {'id': '460', 'uid': '5697525', 'show_id': '103122827', 'course_id': '41758', 'supports': '93', 'rewards': '0', 'shares': '2', 'scores': '56.60', 'status': '1', 'match_type': '2', 'create_time': '1512737139', 'act_id': '3', 'child_type': '1', 'show_score': '78', 'head_img': 'https://x.x.x/2018-06-23/5b2de55693ad9.jpg', 'cover_img': 'https://x.x.x/2017-03-15/58c8abf7eafb6.jpg', 'name': '唐昕玥', 'href': '/index.php?m=home&c=match_new&a=video&show_id=103122827'}, {'id': '4096', 'uid': '3896494', 'show_id': '105000309', 'course_id': '49023', 'supports': '77', 'rewards': '0', 'shares': '1', 'scores': '46.60', 'status': '1', 'match_type': '2', 'create_time': '1513434346', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'http://q.qlogo.cn/qqapp/1104670989/DFC726007737AE2674A65E5BD4FFC3F5/100', 'cover_img': 'https://x.x.x/2017-11-01/59f9793b2bd2c.jpg', 'name': '彭怡', 'href': '/index.php?m=home&c=match_new&a=video&show_id=105000309'}, {'id': '1194', 'uid': '4837277', 'show_id': '103429330', 'course_id': '41758', 'supports': '71', 'rewards': '0', 'shares': '0', 'scores': '42.60', 'status': '1', 'match_type': '2', 'create_time': '1512828159', 'act_id': '3', 'child_type': '1', 'show_score': '95', 'head_img': 'https://x.x.x/2017-10-20/59e9fe8d49cd7.jpg', 'cover_img': 'https://x.x.x/2017-03-15/58c8abf7eafb6.jpg', 'name': '朱思颖', 'href': '/index.php?m=home&c=match_new&a=video&show_id=103429330'}, {'id': '27', 'uid': '1035103', 'show_id': '103008839', 'course_id': '46923', 'supports': '70', 'rewards': '0', 'shares': '1', 'scores': '42.40', 'status': '1', 'match_type': '2', 'create_time': '1512698148', 'act_id': '3', 'child_type': '1', 'show_score': '92', 'head_img': 'https://x.x.x/2016-05-22/5741045425b1f.jpg', 'cover_img': 'https://x.x.x/2017-06-13/14973432415241.jpg', 'name': '王陆睿祺', 'href': '/index.php?m=home&c=match_new&a=video&show_id=103008839'}, {'id': '4570', 'uid': '248179', 'show_id': '105265776', 'course_id': '43716', 'supports': '66', 'rewards': '0', 'shares': '0', 'scores': '39.60', 'status': '1', 'match_type': '2', 'create_time': '1513519204', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'https://x.x.x/2018-06-18/5b27b7b196047.jpg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9dec28283.jpg', 'name': '介里', 'href': '/index.php?m=home&c=match_new&a=video&show_id=105265776'}, {'id': '161', 'uid': '874998', 'show_id': '103036066', 'course_id': '43716', 'supports': '53', 'rewards': '0', 'shares': '1', 'scores': '32.20', 'status': '1', 'match_type': '2', 'create_time': '1512724329', 'act_id': '3', 'child_type': '1', 'show_score': '0', 'head_img': 'https://x.x.x/2018-07-05/5b3e254832248.jpg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9dec28283.jpg', 'name': '尤薇然', 'href': '/index.php?m=home&c=match_new&a=video&show_id=103036066'}, {'id': '2872', 'uid': '3901045', 'show_id': '104542553', 'course_id': '43713', 'supports': '49', 'rewards': '0', 'shares': '1', 'scores': '29.80', 'status': '1', 'match_type': '2', 'create_time': '1513260014', 'act_id': '3', 'child_type': '1', 'show_score': '94', 'head_img': 'https://x.x.x/2017-10-23/59eda1c991187.jpg', 'cover_img': 'https://x.x.x/2017-02-23/58ae9e49a1353.jpg', 'name': '肖乐遥', 'href': '/index.php?m=home&c=match_new&a=video&show_id=104542553'}]} </code>
【总结】
此处,PySpider中通过:
<code> @config(age=10 * 24 * 60 * 60) def index_page(self, response): # <ul class="list-user list-user-1" id="list-user-1"> for each in response.doc('ul[id^="list-user"] li a[href^="http"]').items(): self.crawl(each.attr.href, callback=self.detail_page) maxPageNum = 10 for curPageIdx in range(maxPageNum): curPageNum = curPageIdx + 1 print("curPageNum=%s" % curPageNum) getShowsUrl = "http://xxx/index.php?m=home&c=match_new&a=get_shows" headerDict = { "Content-Type": "application/x-www-form-urlencoded" } dataDict = { "counter": curPageNum, "order": 1, "match_type": 2, "match_name": "", "act_id": 3 } self.crawl( getShowsUrl, method="POST", headers=headerDict, data=dataDict, cookies=response.cookies, callback=self.parseGetShowsCallback ) def parseGetShowsCallback(self, response): print("parseGetShowsCallback: self=%s, response=%s"%(self, response)) respJson = response.json print("respJson=%s" % (respJson)) </code>
实现了:
发送POST
传递header
“Content-Type”: “application/x-www-form-urlencoded”
传递data
一个dict,包含对应的key和value
顺带传递了cookie
cookies=response.cookies
获得返回的JSON
callback中用response.json
转载请注明:在路上 » 【已解决】PySpider中如何发送POST请求且传递格式为application/x-www-form-urlencoded的form data参数