【背景】
关于模拟登陆,之前大多都是模拟用户名和密码登陆的:
中的:
先去用工具分析逻辑: 【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程 再去用代码实现,此处,目前已经实现了: |
没怎么处理过,同时上传文件的。
对于上传文件方面的模拟,之前有过:
但是失败了。(后来猜测,可能是自己的boundary设置错误而导致的)
和帖子:
中的评论:
最近需要用C# httpwebrequest的POST方法提交img图片到https的网站,遇到了诸多问题。 |
所以,打算去总结一下:
关于如何模拟上传文件。
【折腾过程】
1.对于上述的地址:
https://www.peuland.com/captcha/captcha_demo.htm
用IE10的F12去调试,随便选择一个文件:
E:\Dev_Root\svn_dev_root\website\python\BlogsToWordpress\captcha\captcha.gif
然后上传,返回结果是:
{"message":"user error","text":""} |
所得数据如下:
(1)请求标头(request header):
键 值 请求 POST /captcha/captchaimg.php HTTP/1.1 Accept text/html, application/xhtml+xml, */* Referer https://www.peuland.com/captcha/captcha_demo.htm Accept-Language zh-CN User-Agent Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) Content-Type multipart/form-data; boundary=---------------------------7dd2e12d50c9a Accept-Encoding gzip, deflate Host www.peuland.com Content-Length 3720 DNT 1 Connection Keep-Alive Cache-Control no-cache
如图:
(2)请求正文(post data):
-----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="user" test -----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="pwd" test -----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="img"; filename="captcha.gif" Content-Type: image/gif <二进制文件数据未显示> ---------------------------7dd2e12d50c9a Content-Disposition: form-data; name="type" 100000 -----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="button" 鎻愪氦 -----------------------------7dd2e12d50c9a--
如图:
2.对于上述信息,需要重点解释的是:
(1)request header中的Content-Type的值
设置为:multipart/form-data; boundary=xxx
此处是:
multipart/form-data; boundary=---------------------------7dd2e12d50c9a
表示后面的post data中,所要传送的值,是多个
此处即:
用户名user
密码pwd
数据类型type
等等。
(2)post data中的boundary值是boundary前面再加两个短横线
此处,post data中,可以看到,每部分的值,都是用:
-----------------------------7dd2e12d50c9a
分隔开的。
而此值,是上面的request header中的boundary的值:
---------------------------7dd2e12d50c9a
前面再加上两个短横线:
–
之后,而得到的,即:
post data中的多个部分的分隔符:
—————————–7dd2e12d50c9a
=
— + request header中的boundary值
=
— + —————————7dd2e12d50c9a
另外,在post data最后的:
—————————–7dd2e12d50c9a–
是额外,再加上两个短横线,而得到的,即:
post data最后的分隔符:
—————————–7dd2e12d50c9a–
=
— + request header中的boundary值 + —
=
— + —————————7dd2e12d50c9a + —
3.由此,就可以写出代码,去模拟此过程了。
此处,参考之前:
中的C#代码,再去参考别人的代码:
Upload files with HTTPWebrequest (multipart/form-data)
最后代码如下:
/* * [File] * frmEmulateUploadFile.cs * * [Function] * emulate upload file using C# HTTPWebrequest code * 【教程】模拟登陆分析之:分析如何模拟上传文件,其中涉及到Content-Disposition,multipart/form-data,boundary * https://www.crifan.com/emulate_login_example_for_analysis_upload_file_multipart_form_data_content_disposition * * [Version] * 2013-10-06 * * [Author] * Crifan Li * * [Contact] * https://www.crifan.com/contact_me/ * */ using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.IO; namespace EmulateUploadFile { public partial class frmEmulateUploadFile : Form { crifanLib crl; public frmEmulateUploadFile() { crl = new crifanLib(); InitializeComponent(); } public struct FileParameter { public string fileKeyInForm; public string filename; public string fileContentType; public string fileContentStr; } public string generateMultiPartFormData(string boundaryStr, Dictionary<string, string> postFormDict, FileParameter fileToUpload) { //IE10 captured: //(1)reuqest header: //键 值 //请求 POST /captcha/captchaimg.php HTTP/1.1 //Accept text/html, application/xhtml+xml, */* //Referer https://www.peuland.com/captcha/captcha_demo.htm //Accept-Language zh-CN //User-Agent Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) //Content-Type multipart/form-data; boundary=---------------------------7dd2e12d50c9a //Accept-Encoding gzip, deflate //Host www.peuland.com //Content-Length 3720 //DNT 1 //Connection Keep-Alive //Cache-Control no-cache //(2)post data: // -----------------------------7dd2e12d50c9a // Content-Disposition: form-data; name="user" // test // -----------------------------7dd2e12d50c9a // Content-Disposition: form-data; name="pwd" // test // -----------------------------7dd2e12d50c9a // Content-Disposition: form-data; name="img"; filename="captcha.gif" // Content-Type: image/gif // <二进制文件数据未显示> // ---------------------------7dd2e12d50c9a // Content-Disposition: form-data; name="type" // 100000 // -----------------------------7dd2e12d50c9a // Content-Disposition: form-data; name="button" // 鎻愪氦 // -----------------------------7dd2e12d50c9a-- string multiPartFormDataStr = ""; string singlePartTemplate = "--{0}\r\n" + "Content-Disposition: form-data; name=\"{1}\"\r\n" + "\r\n" + "{2}" + "\r\n" /* auto add CRLF for each line */; //string tailTemplae = "\r\n--{0}--"; //string tailTemplae = "\r\n--{0}--\r\n"; string singlePartStr = ""; string fileParaTemplate = "--{0}\r\n" + "Content-Disposition: form-data; name=\"{1}\"; filename=\"{2}\";\r\n" + "Content-Type: {3}\r\n\r\n" + "{4}"; string fileParaStr = String.Format(fileParaTemplate, boundaryStr, fileToUpload.fileKeyInForm, fileToUpload.filename ?? fileToUpload.fileKeyInForm, fileToUpload.fileContentType ?? "application/octet-stream", fileToUpload.fileContentStr); string tailTemplae = "--{0}--\r\n"; //previous lines already added CRLF string tailStr = String.Format(tailTemplae, boundaryStr); //1. post form data: key and value if ((null != postFormDict) && (postFormDict.Count > 0)) { foreach (string postKey in postFormDict.Keys) { string postValue = postFormDict[postKey]; singlePartStr = String.Format(singlePartTemplate, boundaryStr, postKey, postValue); multiPartFormDataStr += singlePartStr; } } //2. file parameters multiPartFormDataStr += fileParaStr; //3. add tail in the end multiPartFormDataStr += tailStr; return multiPartFormDataStr; } public void demoFileUpload() { //access main url string mainUrl = "https://www.peuland.com/captcha/captcha_demo.htm"; string respHtml = crl.getUrlRespHtml(mainUrl); //emulate upload file string fileUploadUrl = "https://www.peuland.com/captcha/captchaimg.php"; string boundaryValue = "---------------------------7dd2e12d50c9a"; string contentTypeValueTemplate = "multipart/form-data; boundary={0}"; string contentTypeValue = String.Format(contentTypeValueTemplate, boundaryValue); Dictionary<string, string> headerDict = new Dictionary<string, string>(); headerDict.Add("Content-Type", contentTypeValue); Dictionary<string, string> postDict = new Dictionary<string, string>(); postDict.Add("user", "test"); postDict.Add("pwd", "test"); postDict.Add("type", "100000"); postDict.Add("button", "提交"); //string fileFullpath = "E:/Dev_Root/svn_dev_root/website/python/BlogsToWordpress/captcha/captcha.gif"; string fileFullpath = @"E:\Dev_Root\svn_dev_root\website\python\BlogsToWordpress\captcha\captcha.gif"; string fileContentStr = File.ReadAllText(fileFullpath); FileParameter fileToUpload = new FileParameter(); fileToUpload.fileKeyInForm = "img"; fileToUpload.filename = "captcha.gif"; fileToUpload.fileContentType = "image/gif"; fileToUpload.fileContentStr = fileContentStr; string postDataStr = generateMultiPartFormData(boundaryValue, postDict, fileToUpload); string respJson = crl.getUrlRespHtml(fileUploadUrl, headerDict: headerDict, postDataStr: postDataStr); //return: //"{\"message\":\"user error\",\"text\":\"\"}" //same as we have see in webbrowser //Console.WriteLine(String.Format("after emulate upload file {0}, returned json={1}", fileFullpath, respJson)); MessageBox.Show(String.Format("after emulate upload file {0}, returned json={1}", fileFullpath, respJson)); } private void frmEmulateUploadFile_Load(object sender, EventArgs e) { demoFileUpload(); } } }
运行效果如图:
注:
其中的crifanLib.cs,详见:
中的:
【总结】
总的来说,还是那句话:
用工具抓取网站内部执行逻辑
然后用代码模拟该逻辑。
更多内容,详见:
详解抓取网站,模拟登陆,抓取动态网页的原理和实现(Python,C#等)
转载请注明:在路上 » 【教程】模拟登陆之如何分析并用代码模拟上传文件