折腾:
【未解决】爬取tch.ityxb.com中电子书《java 入门》
期间,去分析看看
找到最后的几个页面的逻辑:
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jlk6LGzj1ho@JxoLsV0Yrw4=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=4' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: */*' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-dest: empty' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jhxNrqd3jM0cBxUcGpfwfWY=","PageCount":427,"ErrorMsg":"","PageIndex":5,"PageWidth":880,"Width":880,"Height":1237}
继续向前找
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=0' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: */*' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-dest: empty' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
头部:
:authority: vip.ow365.cn :method: GET :path: /PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=0 :scheme: https accept: */* accept-encoding: gzip, deflate, br accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7 cache-control: no-cache pragma: no-cache referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw== sec-fetch-dest: empty sec-fetch-mode: cors sec-fetch-site: same-origin user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36
去搜:
YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm
找到:
curl 'https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'upgrade-insecure-requests: 1' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \ -H 'sec-fetch-site: cross-site' \ -H 'sec-fetch-mode: navigate' \ -H 'sec-fetch-dest: iframe' \ -H 'referer: http://tch.ityxb.com/ebook?eurl=https%3A%2F%2Fvip.ow365.cn%2F%3Fi%3D11311%26ssl%3D1%26furl%3D0As6WW%40zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9%0AcjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb%40%40GhMGaxrje5AeipdhF4tvw%3D%3D' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
返回:
。。。 <!--[if lt IE 9]><input id="isIE8" type="hidden" autocomplete="off" /><![endif]--> <div name="parms"> <input type="hidden" id="Url" value="YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm" autocomplete="off" /> <input type="hidden" id="IsMobi" value="false" autocomplete="off" /> <input type="hidden" id="Dk" value="0" autocomplete="off" /> <input type="hidden" id="Ver" value="2" autocomplete="off" /> <input type="hidden" id="VID" value="@ouvAGlwulktavhIGppyKg==" autocomplete="off" /> <input type="hidden" id="ViewPath" value="../img" autocomplete="off" /> <input type="hidden" id="Tp" autocomplete="off" /> </div> 。。。
然后此处,用postman去测试看看
先试试:
https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=0
可以返回信息
把f换成返回的NextPage的值
IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=
不过先去列出来前几页的请求
0到2的:
https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=0 https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=1 https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=2
详情
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=0' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: */*' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-dest: empty' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=","PageCount":427,"ErrorMsg":"","PageIndex":1,"PageWidth":880,"Width":880,"Height":1237}
和:
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=1' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: */*' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-dest: empty' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=","PageCount":427,"ErrorMsg":"","PageIndex":2,"PageWidth":880,"Width":880,"Height":1237}
和:
curl 'https://vip.ow365.cn/PW/GetPage?f=YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=&isMobile=false&vid=@ouvAGlwulktavhIGppyKg==&dk=0&ver=2&sn=2' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: */*' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: cors' \ -H 'sec-fetch-dest: empty' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
返回:
{"NextPage":"IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jqmsr0fjU0ZYHjFOM5f54TA=","PageCount":427,"ErrorMsg":"","PageIndex":3,"PageWidth":880,"Width":880,"Height":1237}
研究看了下,发现是:
一环扣一环:
请求前一页,返回的NextPage的值,用于请求参数img的值,继续获取下一页
后来去postman中,把第一页返回的值,填入img:
也是可以获取下一页的值的。
再去看看,如何获取img图片:
对应请求:
第一页:
curl 'https://vip.ow365.cn/img?img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=&tp=' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: image/webp,image/apng,image/*,*/*;q=0.8' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: no-cors' \ -H 'sec-fetch-dest: image' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
和:
第二页:
curl 'https://vip.ow365.cn/img?img=IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=&tp=' \ -H 'authority: vip.ow365.cn' \ -H 'pragma: no-cache' \ -H 'cache-control: no-cache' \ -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ -H 'accept: image/webp,image/apng,image/*,*/*;q=0.8' \ -H 'sec-fetch-site: same-origin' \ -H 'sec-fetch-mode: no-cors' \ -H 'sec-fetch-dest: image' \ -H 'referer: https://vip.ow365.cn/?i=11311&ssl=1&furl=0As6WW@zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw==' \ -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ --compressed
所以即可获取图片了。