【未解决】模拟mp.codeup.cn中调用queryAllPageByEbookId.do返回json数据

折腾：

【未解决】爬取mp.codeup.cn中的英语教材电子书资源

期间，现在去模拟mp.codeup.cn去尝试写代码

模拟如下内容：

1. Request URL: 
https://biz.bookln.cn/ebookpageservices/queryAllPageByEbookId.do
2. Request Method: POST
3. Status Code: 200
4. Remote Address: 1xxx.210:443
5. Referrer Policy: no-referrer-when-downgrade

Request Headers:
1. :authority: biz.bookln.cn
2. :method: POST
3. :path: /ebookpageservices/queryAllPageByEbookId.do

content-type: application/x-www-form-urlencoded
accept: application/json, text/javascript, */*; q=0.01

1. content-length: 118


Form Data:

view source:
ebookId=52365&_timestamp=1583157835&_nonce=491fd5fc-b046-4bd7-870b-ccae94ccc23b&_sign=47CBFDFACD3E0A0746E2391C7F78AD00

encoded:
1. ebookId: 52365
2. _timestamp: 1583157835
3. _nonce: 491fd5fc-b046-4bd7-870b-ccae94ccc23b
4. _sign: 47CBFDFACD3E0A0746E2391C7F78AD00

目测可能：_timestamp，_nonce，_sign可能稍微麻烦点

话说，如果只是这2本书，都不用模拟了：直接用保存的json即可。

不过为了支持更多书，还是去尝试模拟吧

快速上手 — Requests 2.18.1 文档

感觉要：

要去搞清楚requests如何发送：

post，但是data是application/x-www-form-urlencoded的

requests application/x-www-form-urlencoded

python实现Content-Type类型为application/x-www-form-urlencoded发送POST请求 – 梦雨情殇 – 博客园

四种常见的 POST 提交数据方式 | JerryQu 的小站

Quickstart — Requests 0.8.2 documentation

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("
http://httpbin.org/post
", data=payload)
>>> print r.content
{
  "origin": "179.13.100.4",
  "files": {},
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  "url": "
http://httpbin.org/post
",
  "args": {},
  "headers": {
    "Content-Length": "23",
    "Accept-Encoding": "identity, deflate, compress, gzip",
    "Accept": "*/*",
    "User-Agent": "python-requests/0.8.0",
    "Host": "127.0.0.1:7077",
    "Content-Type": "application/x-www-form-urlencoded"
  },
  "data": ""
}

如果直接post，data是dict的话，默认就是：

“Content-Type”: “application/x-www-form-urlencoded”

如果想要发送json字符串，则是：

url = '
https://api.github.com/some/endpoint
'
payload = {'some': 'data'}

r = requests.post(url, data=json.dumps(payload))

对于代码：

for eachBookId in gBookIdList:
    getAllPageUrl = "
https://biz.bookln.cn/ebookpageservices/queryAllPageByEbookId.do
"
    curHeaders = deep.copy(gHeaders)
    curHeaders["Content-Type"] = "application/x-www-form-urlencoded"
    postDict = {
      "ebookId": eachBookId
    }
    resp = requests.post(getAllPageUrl, headers=gHeaders, data=postDict)
    print("resp=%s" % resp)

先去调试看看再说

'{"msg":"服务器繁忙中,请稍后重试!","success":false}\n'

很明显，此处参数不对。

加了其他一些header，估计是没关系的：

    curHeaders["Accept"] = "application/json, text/javascript, */*; q=0.01"
    curHeaders["origin"] = "
http://mp.codeup.cn
"
    curHeaders["referer"] = "
http://mp.codeup.cn/book/sample2.htm?id=%s
" % eachBookId
    curHeaders["sec-fetch-dest"] = "empty"
    curHeaders["sec-fetch-mode"] = "cors"
    curHeaders["sec-fetch-site"] = "cross-site"

结果：

问题依旧。

看来要去想办法实现sign了：

【未解决】分析mp.codeup.cn中核心参数_timestamp、_nonce、_sign逻辑

其中已获取到js源码。

暂时懒得转python了。

等有需要再去转Python。

转载请注明：在路上 » 【未解决】模拟mp.codeup.cn中调用queryAllPageByEbookId.do返回json数据

Post Views: 1,451

与本文相关的文章