最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【教程】模拟登陆之如何分析并用代码模拟上传文件

Crawl_EmulateLogin crifan 4640浏览 0评论

【背景】

关于模拟登陆,之前大多都是模拟用户名和密码登陆的:

如何模拟登陆网站

中的:

先去用工具分析逻辑:

【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程

再去用代码实现,此处,目前已经实现了:

没怎么处理过,同时上传文件的。

对于上传文件方面的模拟,之前有过:

【未解决】通过百度API上传单个文件出现403的错误

但是失败了。(后来猜测,可能是自己的boundary设置错误而导致的)

 

和帖子:

【已解决】C#中如何访问https的网站/网址

中的评论:

最近需要用C# httpwebrequest的POST方法提交img图片到https的网站,遇到了诸多问题。
1. 看了你的文章,确实C# httpwebrequest对于证书的问题是有的网站不需要证书,有的需要。说明只要程序不报错并且数据返回正常,说明该网站不需要手动加载客户端证书。
2. 但是对于POST数据这块,一直没有成功。网站的返回提示“用户名错误”,经检查,用户名必须正确,因为已经用该用户名对网站的登录网页进行了测试,表示成功。POST表单提交的数据有一个用户名,一个密码,还有一个input=type类型的需要上传的图片。总是提交不成功,demon网址为 https://www.peuland.com/captcha/captcha_demo.htm,还请指点。

所以,打算去总结一下:

关于如何模拟上传文件。

 

【折腾过程】

1.对于上述的地址:

https://www.peuland.com/captcha/captcha_demo.htm

用IE10的F12去调试,随便选择一个文件:

E:\Dev_Root\svn_dev_root\website\python\BlogsToWordpress\captcha\captcha.gif

然后上传,返回结果是:

{"message":"user error","text":""}

所得数据如下:

(1)请求标头(request header):

1
2
3
4
5
6
7
8
9
10
11
12
13
键   值
请求  POST /captcha/captchaimg.php HTTP/1.1
Accept  text/html, application/xhtml+xml, */*
Accept-Language zh-CN
User-Agent  Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)
Content-Type    multipart/form-data; boundary=---------------------------7dd2e12d50c9a
Accept-Encoding gzip, deflate
Host    www.peuland.com
Content-Length  3720
DNT 1
Connection  Keep-Alive
Cache-Control   no-cache

 

如图:

capture got request header content type is multipart form data

(2)请求正文(post data):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-----------------------------7dd2e12d50c9a
Content-Disposition: form-data; name="user"
 
test
-----------------------------7dd2e12d50c9a
Content-Disposition: form-data; name="pwd"
 
test
-----------------------------7dd2e12d50c9a
Content-Disposition: form-data; name="img"; filename="captcha.gif"
Content-Type: image/gif
 
<二进制文件数据未显示>
---------------------------7dd2e12d50c9a
Content-Disposition: form-data; name="type"
 
100000
-----------------------------7dd2e12d50c9a
Content-Disposition: form-data; name="button"
 
鎻愪氦
-----------------------------7dd2e12d50c9a--

如图:

captured post data is content disposition

2.对于上述信息,需要重点解释的是:

(1)request header中的Content-Type的值

设置为:multipart/form-data; boundary=xxx

此处是:

1
multipart/form-data; boundary=---------------------------7dd2e12d50c9a

表示后面的post data中,所要传送的值,是多个

此处即:

用户名user

密码pwd

数据类型type

等等。

(2)post data中的boundary值是boundary前面再加两个短横线

此处,post data中,可以看到,每部分的值,都是用:

1
-----------------------------7dd2e12d50c9a

分隔开的。

而此值,是上面的request header中的boundary的值:

1
---------------------------7dd2e12d50c9a

前面再加上两个短横线:

之后,而得到的,即:

post data中的多个部分的分隔符:

—————————–7dd2e12d50c9a

=

— + request header中的boundary值

=

— + —————————7dd2e12d50c9a

 

另外,在post data最后的:

—————————–7dd2e12d50c9a–

是额外,再加上两个短横线,而得到的,即:

post data最后的分隔符:

—————————–7dd2e12d50c9a–

=

— + request header中的boundary值 + —

=

— + —————————7dd2e12d50c9a + —

 

3.由此,就可以写出代码,去模拟此过程了。

此处,参考之前:

【未解决】通过百度API上传单个文件出现403的错误

中的C#代码,再去参考别人的代码:

Upload files with HTTPWebrequest (multipart/form-data)

Multipart Form Post in C#

 

最后代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
/*
 * [File]
 * frmEmulateUploadFile.cs
 *
 * [Function]
 * emulate upload file using C# HTTPWebrequest code
 * 【教程】模拟登陆分析之:分析如何模拟上传文件,其中涉及到Content-Disposition,multipart/form-data,boundary
 *
 * [Version]
 * 2013-10-06
 *
 * [Author]
 * Crifan Li
 *
 * [Contact]
 *
 */
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
 
namespace EmulateUploadFile
{
    public partial class frmEmulateUploadFile : Form
    {
        crifanLib crl;
        public frmEmulateUploadFile()
        {
            crl = new crifanLib();
            InitializeComponent();
        }
 
        public struct FileParameter
        {
            public string fileKeyInForm;
            public string filename;
            public string fileContentType;
            public string fileContentStr;
        }
 
        public string generateMultiPartFormData(string boundaryStr, Dictionary<string, string> postFormDict, FileParameter fileToUpload)
        {
            //IE10 captured:
 
            //(1)reuqest header:
            //键 值
            //请求    POST /captcha/captchaimg.php HTTP/1.1
            //Accept    text/html, application/xhtml+xml, */*
            //Referer   https://www.peuland.com/captcha/captcha_demo.htm
            //Accept-Language   zh-CN
            //User-Agent    Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)
            //Content-Type  multipart/form-data; boundary=---------------------------7dd2e12d50c9a
            //Accept-Encoding   gzip, deflate
            //Host  www.peuland.com
            //Content-Length    3720
            //DNT   1
            //Connection    Keep-Alive
            //Cache-Control no-cache
 
            //(2)post data:
            // -----------------------------7dd2e12d50c9a
            // Content-Disposition: form-data; name="user"
 
            // test
            // -----------------------------7dd2e12d50c9a
            // Content-Disposition: form-data; name="pwd"
 
            // test
            // -----------------------------7dd2e12d50c9a
            // Content-Disposition: form-data; name="img"; filename="captcha.gif"
            // Content-Type: image/gif
 
            // <二进制文件数据未显示>
            // ---------------------------7dd2e12d50c9a
            // Content-Disposition: form-data; name="type"
 
            // 100000
            // -----------------------------7dd2e12d50c9a
            // Content-Disposition: form-data; name="button"
 
            // 鎻愪氦
            // -----------------------------7dd2e12d50c9a--
 
            string multiPartFormDataStr = "";
            string singlePartTemplate =
                      "--{0}\r\n"
                    + "Content-Disposition: form-data; name=\"{1}\"\r\n"
                    + "\r\n"
                    + "{2}"
                    + "\r\n" /* auto add CRLF for each line */;
            //string tailTemplae = "\r\n--{0}--";
            //string tailTemplae = "\r\n--{0}--\r\n";
            string singlePartStr = "";
 
            string fileParaTemplate =
                "--{0}\r\n"
                + "Content-Disposition: form-data; name=\"{1}\"; filename=\"{2}\";\r\n"
                + "Content-Type: {3}\r\n\r\n"
                + "{4}";
            string fileParaStr = String.Format(fileParaTemplate,
                    boundaryStr,
                    fileToUpload.fileKeyInForm,
                    fileToUpload.filename ?? fileToUpload.fileKeyInForm,
                    fileToUpload.fileContentType ?? "application/octet-stream",
                    fileToUpload.fileContentStr);
 
            string tailTemplae = "--{0}--\r\n"; //previous lines already added CRLF
            string tailStr = String.Format(tailTemplae, boundaryStr);
 
            //1. post form data: key and value
            if ((null != postFormDict) && (postFormDict.Count > 0))
            {
                foreach (string postKey in postFormDict.Keys)
                {
                    string postValue = postFormDict[postKey];
                    singlePartStr = String.Format(singlePartTemplate, boundaryStr, postKey, postValue);
                    multiPartFormDataStr += singlePartStr;
                }
            }
 
            //2. file parameters
            multiPartFormDataStr += fileParaStr;
 
            //3. add tail in the end
            multiPartFormDataStr += tailStr;
 
            return multiPartFormDataStr;
        }
 
        public void demoFileUpload()
        {
            //access main url
            string mainUrl = "https://www.peuland.com/captcha/captcha_demo.htm";
            string respHtml = crl.getUrlRespHtml(mainUrl);
 
            //emulate upload file
            string fileUploadUrl = "https://www.peuland.com/captcha/captchaimg.php";
            string boundaryValue = "---------------------------7dd2e12d50c9a";
            string contentTypeValueTemplate = "multipart/form-data; boundary={0}";
            string contentTypeValue = String.Format(contentTypeValueTemplate, boundaryValue);
 
            Dictionary<string, string> headerDict = new Dictionary<string, string>();
            headerDict.Add("Content-Type", contentTypeValue);
 
            Dictionary<string, string> postDict = new Dictionary<string, string>();
            postDict.Add("user", "test");
            postDict.Add("pwd", "test");
            postDict.Add("type", "100000");
            postDict.Add("button", "提交");
 
            //string fileFullpath = "E:/Dev_Root/svn_dev_root/website/python/BlogsToWordpress/captcha/captcha.gif";
            string fileFullpath = @"E:\Dev_Root\svn_dev_root\website\python\BlogsToWordpress\captcha\captcha.gif";
            string fileContentStr = File.ReadAllText(fileFullpath);
 
            FileParameter fileToUpload = new FileParameter();
            fileToUpload.fileKeyInForm = "img";
            fileToUpload.filename = "captcha.gif";
            fileToUpload.fileContentType = "image/gif";
            fileToUpload.fileContentStr = fileContentStr;
            string postDataStr = generateMultiPartFormData(boundaryValue, postDict, fileToUpload);
 
            string respJson = crl.getUrlRespHtml(fileUploadUrl, headerDict: headerDict, postDataStr: postDataStr);
            //return:
            //"{\"message\":\"user error\",\"text\":\"\"}"
            //same as we have see in webbrowser
            //Console.WriteLine(String.Format("after emulate upload file {0}, returned json={1}", fileFullpath, respJson));
            MessageBox.Show(String.Format("after emulate upload file {0}, returned json={1}", fileFullpath, respJson));
        }
 
        private void frmEmulateUploadFile_Load(object sender, EventArgs e)
        {
            demoFileUpload();
        }
    }
}

运行效果如图:

emulate upload file return json

 

注:

其中的crifanLib.cs,详见:

详解crifan的C#库:crifanLib.cs

中的:

获得Url地址返回的网页内容:getUrlRespHtml

 

【总结】

总的来说,还是那句话:

用工具抓取网站内部执行逻辑

然后用代码模拟该逻辑。

 

更多内容,详见:

详解抓取网站,模拟登陆,抓取动态网页的原理和实现(Python,C#等)

转载请注明:在路上 » 【教程】模拟登陆之如何分析并用代码模拟上传文件

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.544 seconds, using 22.34MB memory