最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】分析tch.ityxb.com页面内部获取电子书图片的逻辑

图片 crifan 406浏览 0评论
折腾:
【未解决】爬取tch.ityxb.com中电子书《java 入门》
期间,去分析看看
找到最后的几个页面的逻辑:
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=[email protected]JxoLsV0Yrw4=&isMobile=false&[email protected]==&dk=0&ver=2&sn=4' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: */*' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jhxNrqd3jM0cBxUcGpfwfWY=","PageCount":427,"ErrorMsg":"","PageIndex":5,"PageWidth":880,"Width":880,"Height":1237}
继续向前找
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&[email protected]==&dk=0&ver=2&sn=0' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: */*' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
头部:
:authority: vip.ow365.cn
:method: GET
:path: /PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&[email protected]==&dk=0&ver=2&sn=0
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7
cache-control: no-cache
pragma: no-cache
referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36
去搜:
YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm
找到:
curl 'https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'sec-fetch-site: cross-site' \
  -H 'sec-fetch-mode: navigate' \
  -H 'sec-fetch-dest: iframe' \
  -H 'referer: http://tch.ityxb.com/ebook?eurl=https%3A%2F%2Fvip.ow365.cn%2F%3Fi%3D11311%26ssl%3D1%26furl%3D0As6WW%40zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9%0AcjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb%40%40GhMGaxrje5AeipdhF4tvw%3D%3D' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
返回:
。。。
    <!--[if lt IE 9]><input id="isIE8" type="hidden" autocomplete="off" /><![endif]-->
    <div name="parms">
        <input type="hidden" id="Url" value="YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm" autocomplete="off" />
        <input type="hidden" id="IsMobi" value="false" autocomplete="off" />
        <input type="hidden" id="Dk" value="0" autocomplete="off" />
        <input type="hidden" id="Ver" value="2" autocomplete="off" />
        <input type="hidden" id="VID" value="@ouvAGlwulktavhIGppyKg==" autocomplete="off" />
        <input type="hidden" id="ViewPath" value="../img" autocomplete="off" />
        <input type="hidden" id="Tp" autocomplete="off" />
    </div>
。。。
然后此处,用postman去测试看看
先试试:
https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&[email protected]==&dk=0&ver=2&sn=0
可以返回信息
把f换成返回的NextPage的值
IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=
不过先去列出来前几页的请求
0到2的:
https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&[email protected]==&dk=0&ver=2&sn=0

https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB[email protected]ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=1

https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=&isMobile=false&[email protected]==&dk=0&ver=2&sn=2
详情
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&[email protected]==&dk=0&ver=2&sn=0' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: */*' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=","PageCount":427,"ErrorMsg":"","PageIndex":1,"PageWidth":880,"Width":880,"Height":1237}
和:
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=&isMobile=false&[email protected]==&dk=0&ver=2&sn=1' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: */*' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=","PageCount":427,"ErrorMsg":"","PageIndex":2,"PageWidth":880,"Width":880,"Height":1237}
和:
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=&isMobile=false&[email protected]==&dk=0&ver=2&sn=2' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: */*' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jqmsr0fjU0ZYHjFOM5f54TA=","PageCount":427,"ErrorMsg":"","PageIndex":3,"PageWidth":880,"Width":880,"Height":1237}
研究看了下,发现是:
一环扣一环:
请求前一页,返回的NextPage的值,用于请求参数img的值,继续获取下一页
后来去postman中,把第一页返回的值,填入img:
也是可以获取下一页的值的。
再去看看,如何获取img图片:
对应请求:
第一页:
curl 'https://vip.ow365.cn/img?img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=&tp=' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: image/webp,image/apng,image/*,*/*;q=0.8' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: no-cors' \
  -H 'sec-fetch-dest: image' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
和:
第二页:
curl 'https://vip.ow365.cn/img?img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=&tp=' \
  -H 'authority: vip.ow365.cn' \
  -H 'pragma: no-cache' \
  -H 'cache-control: no-cache' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \
  -H 'accept: image/webp,image/apng,image/*,*/*;q=0.8' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: no-cors' \
  -H 'sec-fetch-dest: image' \
  -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&[email protected]_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhs[email protected]@GhMGaxrje5AeipdhF4tvw==' \
  -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \
  --compressed
所以即可获取图片了。

转载请注明:在路上 » 【已解决】分析tch.ityxb.com页面内部获取电子书图片的逻辑

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
84 queries in 0.102 seconds, using 20.66MB memory