折腾:
【未解决】爬取tch.ityxb.com中电子书《java 入门》
期间,去分析看看

找到最后的几个页面的逻辑:

1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: */*' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: cors' \ - H 'sec-fetch-dest: empty' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
返回:
1 | { "NextPage" : "IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jhxNrqd3jM0cBxUcGpfwfWY=" , "PageCount" :427, "ErrorMsg" : "" , "PageIndex" :5, "PageWidth" :880, "Width" :880, "Height" :1237} |
继续向前找

1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: */*' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: cors' \ - H 'sec-fetch-dest: empty' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
头部:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | :authority: vip.ow365.cn :method: GET :path: / PW / GetPage?f = YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img = &isMobile = false&vid = @ouvAGlwulktavhIGppyKg = = &dk = 0 &ver = 2 &sn = 0 :scheme: https accept: * / * accept - encoding: gzip, deflate, br accept - language: zh - CN,zh;q = 0.9 ,en;q = 0.8 ,la;q = 0.7 cache - control: no - cache pragma: no - cache referer: https: / / vip.ow365.cn / ?i = 11311 &ssl = 1 &furl = 0As6WW @zSHIfqZy_0miBI1NfVmqplNkx4osgxUapgos7zntvq_BluwUV5DjSGRhsHRFJwyGpvHi9cjUTIGzm3WHgnjJ2lFd1wVPaQXBaorIzE0K0J_OXwbwK6qlOrtb@@GhMGaxrje5AeipdhF4tvw = = sec - fetch - dest: empty sec - fetch - mode: cors sec - fetch - site: same - origin user - agent: Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_15_2 ) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 81.0 . 4044.122 Safari / 537.36 |
去搜:
YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm
找到:

1 2 3 4 5 6 7 8 9 10 11 12 13 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'upgrade-insecure-requests: 1' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \ - H 'sec-fetch-site: cross-site' \ - H 'sec-fetch-mode: navigate' \ - H 'sec-fetch-dest: iframe' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
返回:
1 2 3 4 5 6 7 8 9 10 11 12 | 。。。 <!--[ if lt IE 9]><input id = "isIE8" type = "hidden" autocomplete= "off" /><![endif]--> <div name= "parms" > <input type = "hidden" id = "Url" value= "YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm" autocomplete= "off" /> <input type = "hidden" id = "IsMobi" value= "false" autocomplete= "off" /> <input type = "hidden" id = "Dk" value= "0" autocomplete= "off" /> <input type = "hidden" id = "Ver" value= "2" autocomplete= "off" /> <input type = "hidden" id = "VID" value= "@ouvAGlwulktavhIGppyKg==" autocomplete= "off" /> <input type = "hidden" id = "ViewPath" value= "../img" autocomplete= "off" /> <input type = "hidden" id = "Tp" autocomplete= "off" /> < /div > 。。。 |
然后此处,用postman去测试看看
先试试:
1 | https: / / vip.ow365.cn / PW / GetPage?f = YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img = &isMobile = false&vid = @ouvAGlwulktavhIGppyKg = = &dk = 0 &ver = 2 &sn = 0 |

可以返回信息
把f换成返回的NextPage的值
1 | IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE= |
不过先去列出来前几页的请求

0到2的:
1 2 3 4 5 | https: / / vip.ow365.cn / PW / GetPage?f = YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img = &isMobile = false&vid = @ouvAGlwulktavhIGppyKg = = &dk = 0 &ver = 2 &sn = 0 https: / / vip.ow365.cn / PW / GetPage?f = YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img = IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE = &isMobile = false&vid = @ouvAGlwulktavhIGppyKg = = &dk = 0 &ver = 2 &sn = 1 https: / / vip.ow365.cn / PW / GetPage?f = YXR0YWNobWVudC1jZW50ZXIuYm94dWVndS5jb20uODBcMThmNWJiOTZhM2I4NGM3NzllZDJhNTY4MzM3ZWFkNjAucGRm&img = IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk = &isMobile = false&vid = @ouvAGlwulktavhIGppyKg = = &dk = 0 &ver = 2 &sn = 2 |
详情
1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: */*' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: cors' \ - H 'sec-fetch-dest: empty' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
返回:
1 | { "NextPage" : "IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jribHO0GQGIZ1exJW4aembE=" , "PageCount" :427, "ErrorMsg" : "" , "PageIndex" :1, "PageWidth" :880, "Width" :880, "Height" :1237} |
和:
1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: */*' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: cors' \ - H 'sec-fetch-dest: empty' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
返回:
1 | { "NextPage" : "IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jmsfp2y28J9E9JreoGwZvNk=" , "PageCount" :427, "ErrorMsg" : "" , "PageIndex" :2, "PageWidth" :880, "Width" :880, "Height" :1237} |
和:
1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: */*' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: cors' \ - H 'sec-fetch-dest: empty' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
返回:
1 | { "NextPage" : "IDcMbrrMGOWvOQVTWydwR6WWz0UVpg2zB9VFJh7jsnp5byBCqeJ6jqmsr0fjU0ZYHjFOM5f54TA=" , "PageCount" :427, "ErrorMsg" : "" , "PageIndex" :3, "PageWidth" :880, "Width" :880, "Height" :1237} |
研究看了下,发现是:

一环扣一环:
请求前一页,返回的NextPage的值,用于请求参数img的值,继续获取下一页
后来去postman中,把第一页返回的值,填入img:

也是可以获取下一页的值的。
再去看看,如何获取img图片:

对应请求:
第一页:
1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: image/webp,image/apng,image/*,*/*;q=0.8' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: no-cors' \ - H 'sec-fetch-dest: image' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
和:
第二页:
1 2 3 4 5 6 7 8 9 10 11 12 | - H 'authority: vip.ow365.cn' \ - H 'pragma: no-cache' \ - H 'cache-control: no-cache' \ - H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36' \ - H 'accept: image/webp,image/apng,image/*,*/*;q=0.8' \ - H 'sec-fetch-site: same-origin' \ - H 'sec-fetch-mode: no-cors' \ - H 'sec-fetch-dest: image' \ - H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7' \ - - compressed |
所以即可获取图片了。