【背景】
之前已经写了教程,分析模拟登陆百度的逻辑:
【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
然后又去用不同的语言:
Python的:
【教程】模拟登陆网站 之 Python版(内含两种版本的完整的可运行的代码)
C#的:
【教程】模拟登陆网站 之 C#版(内含两种版本的完整的可运行的代码)
Java的:
而现在:
对于,算是一无所知的go语言,大概了解到,其也可以有对应的http的库,所以,也打算,
从无到有,一点点,边学习go语言本身,边去实现对应的,模拟登陆百度的功能。
【折腾过程】
1.先去学习一下go语言本身:
2.然后再去搞懂基本的开发:
【记录】go语言的基本开发:实现Hello World,找到合适的开发环境和工具
3.换了个环境,不过也是x64的win7,然后重新去下载和安装go,然后再去试试普通的hello world。
此处,几点值得一提的:
(1)此处,自动安装完go后,已经把对应的路径:
D:\tmp\dev_install_root\Go\bin
加入到当前的PATH中了;
(2)对应的go/bin下面,有三个工具:
- go.exe
- godoc.exe
- gofmt.exe
4.继续去学习如何写go代码:
5.搞清楚了,如何写go代码,接着就是去,参考官网手册,去学习http方面的代码如何写了。
6.关于go的命名规范,这里有介绍:
7.接着,可以去折腾,如何实现,基本的网页抓取方面的功能了:
8.但是如上获得的内容,都是打印到cmd中的,不方便后续开发记录和查看。
所以希望,能log内容到文件中:
9.然后出现文件编码的问题:
【问题】go代码运行出错:# command-line-arguments .\EmulateLoginBaidu.go:86: illegal UTF-8 sequence
10.接着又出现“cannot use body (type []byte) as type string in assignment”的错误:
【已解决】go代码中直接使用http返回的body赋值给string结果出错:cannot use body (type []byte) as type string in assignment
11.至此,已经可以实现了:
将百度主页的html抓取下来,并且输出到log文件中了。
12.接着,继续去,搞懂,如何获得http返回的cookie:
13.接下来,就是要去从返回的html中提取我要的内容,所以要去折腾:
14.接下来,要去搞懂,go语言中的字典类型变量:
15.再去搞懂,如何获得console的输入:
16.接着再去搞懂,如何发送http的POST:
【记录】go语言中实现http的POST且传递对应的post data
17.实现了POST,且可以传递post data后,可以正常模拟登陆成功了,可以获得对应的cookie了。
所以接下来,再去检测,对应的各个cookie:
注意到,当前此刻返回的httpResp.Header中的Set-Cookie是:
(格式化后)
1 2 3 4 5 6 | BDUSS=G1LNG5uLTNYWkU2bzA2SGxCZHZ2Rm5ocnN-MEhFem5uQkZrdkJFVmplUmpBV1ZTQVFBQUFBJCQAAAAAAAAAAAEAAAB-OUgCYWdhaW5pbnB1dAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGN0PVJjdD1SM; expires=Wed, 08-Dec-2021 10:26:43 GMT; path=/; domain=baidu.com; httponly PTOKEN=deleted; expires=Fri, 21-Sep-2012 10:26:42 GMT; path=/; domain=baidu.com; httponly PTOKEN=0f1e0187b042630a47c4eea8e0e96a2f; expires=Wed, 08-Dec-2021 10:26:43 GMT; path=/; domain=passport.baidu.com; httponly STOKEN=8d6ce0cbc7f689a8cd647b8beb5872e3; expires=Wed, 08-Dec-2021 10:26:43 GMT; path=/; domain=passport.baidu.com; httponly SAVEUSERID=deleted; expires=Fri, 21-Sep-2012 10:26:42 GMT; path=/; domain=passport.baidu.com; httponly USERNAMETYPE=1; expires=Wed, 08-Dec-2021 10:26:43 GMT; path=/; domain=passport.baidu.com; httponly |
可见,对应的cookie:
(1)PTOKEN,对于:
domain=baidu.com
是delete掉了;
而对于passport.baidu.com,PTOKEN还是存在的;
(2)而另外几个cookie:
STOKEN,SAVEUSERID,USERNAMETYPE,的domain却都是:
passport.baidu.com
而不是原以为的:
baidu.com
(3)BDUSS的domain的确是baidu.com
这样的话,之前的代码:
1 | gCurCookies = gCurCookieJar.Cookies(httpReq.URL); |
以为会只能获得对应的
BDUSS
或者是:
STOKEN,SAVEUSERID,USERNAMETYPE
不过,幸运的是,此处通过:
dbgPrintCurCookies
而打印出来的cookie,是都存在的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:199) cookieNum=7 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [0]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =H_PS_PSSID [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =3359_1455_2976_2981_3090 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [1]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =BAIDUID [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =74F5614706B58BFCCCB3923C8ABD3E61:FG=1 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [2]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =HOSUPPORT [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =1 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [3]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =BDUSS [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =W95bX41ZTlhNkFKQkpQcGd5Y1ZUOENiYzJ2TkpvakJaZVBXSS10WXh1THVCbVZTQVFBQUFBJCQAAAAAAAAAAAEAAAB-OUgCYWdhaW5pbnB1dAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAO55PVLueT1SO [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [4]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =PTOKEN [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =2e67f3d7d5c52118bf4d222ab87ac9a4 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [5]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =STOKEN [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =63a3b62efbd83a00c095c624ca4dfdfc [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:203) ------ Cookie [6]------ [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:204) Name =USERNAMETYPE [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:205) Value =1 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:206) Path = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:207) Domain = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:208) Expires =0001-01-01 00:00:00 +0000 UTC [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:209) RawExpires = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:210) MaxAge =0 [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:211) Secure =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:212) HttpOnly =false [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:213) Raw = [2013/09/21 18:50:30 ] [INFO] (main.dbgPrintCurCookies:214) Unparsed =[] |
所以,后续可以直接通过cookie的名字,去判断是否存在了。
18.最终,模拟登陆百度成功了。
所用代码为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | /* * [File] * EmulateLoginBaidu.go * * [Function] * 【记录】用go语言实现模拟登陆百度 * * [Version] * 2013-09-21 * * [Contact] */ package main import ( "fmt" //"builtin" //"log" "os" "runtime" "path" "strings" "time" //"io" "io/ioutil" "net/http" "net/http/cookiejar" "net/url" //"sync" //"net/url" "regexp" //"bufio" "bytes" ) //import l4g "log4go.googlecode.com/hg" //import l4g "code.google.com/p/log4go" import "code.google.com/p/log4go" /*************************************************************************************************** Global Variables ***************************************************************************************************/ var gCurCookies []*http.Cookie; var gCurCookieJar *cookiejar.Jar; var gLogger log4go.Logger; /*************************************************************************************************** Functions ***************************************************************************************************/ //do init before all others func initAll(){ gCurCookies = nil //var err error; gCurCookieJar,_ = cookiejar.New(nil) gLogger = nil initLogger() initCrifanLib() } //de-init for all func deinitAll(){ gCurCookies = nil if (nil == gLogger) { gLogger.Close(); //os.Stdout.Sync() //try manually flush, but can not fix log4go's flush bug gLogger = nil } } //do some init for crifanLib func initCrifanLib(){ gLogger.Debug( "init for crifanLib" ) gCurCookies = nil return } //init for logger func initLogger(){ var filenameOnly string = GetCurFilename() var logFilename string = filenameOnly + ".log" ; //gLogger = log4go.NewLogger() //gLogger = make(log4go.Logger) //for console //gLogger.AddFilter("stdout", log4go.INFO, log4go.NewConsoleLogWriter()) gLogger = log4go.NewDefaultLogger(log4go.INFO) //for log file if _, err := os.Stat(logFilename); err == nil { //fmt.Printf("found old log file %s, now remove it\n", logFilename) os.Remove(logFilename) } //gLogger.AddFilter("logfile", log4go.FINEST, log4go.NewFileLogWriter(logFilename, true)) //gLogger.AddFilter("logfile", log4go.FINEST, log4go.NewFileLogWriter(logFilename, false)) gLogger.AddFilter( "log" , log4go.FINEST, log4go.NewFileLogWriter(logFilename, false )) gLogger.Debug( "Current time is : %s" , time.Now().Format( "15:04:05 MST 2006/01/02" )) return } // GetCurFilename // Get current file name, without suffix func GetCurFilename() string { _, fulleFilename, _, _ := runtime.Caller( 0 ) //fmt.Println(fulleFilename) var filenameWithSuffix string filenameWithSuffix = path.Base(fulleFilename) //fmt.Println("filenameWithSuffix=", filenameWithSuffix) var fileSuffix string fileSuffix = path.Ext(filenameWithSuffix) //fmt.Println("fileSuffix=", fileSuffix) var filenameOnly string filenameOnly = strings.TrimSuffix(filenameWithSuffix, fileSuffix) //fmt.Println("filenameOnly=", filenameOnly) return filenameOnly } //get url response html func getUrlRespHtml(strUrl string, postDict map[string]string) string{ gLogger.Debug( "in getUrlRespHtml, strUrl=%s" , strUrl) gLogger.Debug( "postDict=%s" , postDict) var respHtml string = "" ; httpClient := &http.Client{ //Transport:nil, //CheckRedirect: nil, Jar:gCurCookieJar, } var httpReq *http.Request //var newReqErr error if nil == postDict { gLogger.Debug( "is GET" ) //httpReq, newReqErr = http.NewRequest("GET", strUrl, nil) httpReq, _ = http.NewRequest( "GET" , strUrl, nil) // ... //httpReq.Header.Add("If-None-Match", `W/"wyzzy"`) } else { //【记录】go语言中实现http的POST且传递对应的post data gLogger.Debug( "is POST" ) postValues := url.Values{} for postKey, PostValue := range postDict{ postValues.Set(postKey, PostValue) } gLogger.Debug( "postValues=%s" , postValues) postDataStr := postValues.Encode() gLogger.Debug( "postDataStr=%s" , postDataStr) postDataBytes := [] byte (postDataStr) gLogger.Debug( "postDataBytes=%s" , postDataBytes) postBytesReader := bytes.NewReader(postDataBytes) //httpReq, newReqErr = http.NewRequest("POST", strUrl, postBytesReader) httpReq, _ = http.NewRequest( "POST" , strUrl, postBytesReader) //httpReq.Header.Set("Content-Type", "application/x-www-form-urlencoded; param=value") httpReq.Header.Add( "Content-Type" , "application/x-www-form-urlencoded" ) } httpResp, err := httpClient.Do(httpReq) // ... //httpResp, err := http.Get(strUrl) //gLogger.Info("http.Get done") if err != nil { gLogger.Warn( "http get strUrl=%s response error=%s\n" , strUrl, err.Error()) } gLogger.Debug( "httpResp.Header=%s" , httpResp.Header) gLogger.Debug( "httpResp.Status=%s" , httpResp.Status) defer httpResp.Body.Close() // gLogger.Info("defer httpResp.Body.Close done") body, errReadAll := ioutil.ReadAll(httpResp.Body) //gLogger.Info("ioutil.ReadAll done") if errReadAll != nil { gLogger.Warn( "get response for strUrl=%s got error=%s\n" , strUrl, errReadAll.Error()) } //gLogger.Debug("body=%s\n", body) //gCurCookies = httpResp.Cookies() //gCurCookieJar = httpClient.Jar; gCurCookies = gCurCookieJar.Cookies(httpReq.URL); //gLogger.Info("httpResp.Cookies done") //respHtml = "just for test log ok or not" respHtml = string(body) //gLogger.Info("httpResp body []byte to string done") return respHtml } func dbgPrintCurCookies() { var cookieNum int = len(gCurCookies); gLogger.Debug( "cookieNum=%d" , cookieNum) for i := 0 ; i < cookieNum; i++ { var curCk *http.Cookie = gCurCookies[i]; //gLogger.Debug("curCk.Raw=%s", curCk.Raw) gLogger.Debug( "------ Cookie [%d]------" , i) gLogger.Debug( "Name\t\t=%s" , curCk.Name) gLogger.Debug( "Value\t=%s" , curCk.Value) gLogger.Debug( "Path\t\t=%s" , curCk.Path) gLogger.Debug( "Domain\t=%s" , curCk.Domain) gLogger.Debug( "Expires\t=%s" , curCk.Expires) gLogger.Debug( "RawExpires\t=%s" , curCk.RawExpires) gLogger.Debug( "MaxAge\t=%d" , curCk.MaxAge) gLogger.Debug( "Secure\t=%t" , curCk.Secure) gLogger.Debug( "HttpOnly\t=%t" , curCk.HttpOnly) gLogger.Debug( "Raw\t\t=%s" , curCk.Raw) gLogger.Debug( "Unparsed\t=%s" , curCk.Unparsed) } } func main() { initAll() gLogger.Info( "============ 程序说明 ============" ); gLogger.Info( "功能:本程序是用来演示使用Java代码去实现模拟登陆百度" ); gLogger.Info( "注意事项:部分百度账户,在登陆时会出现:" ); gLogger.Info( "1.部分百度账户,在登陆时会出现:" ); gLogger.Info( "系统检测到您的帐号疑似被盗,存在安全风险。请尽快修改密码。" ); gLogger.Info( "此时,本程序,无法成功模拟登陆,请自行按照提示去修改密码后,就可以了。" ); //step1: access baidu url to get cookie BAIDUID gLogger.Info( "====== 步骤1:获得BAIDUID的Cookie ======" ) gLogger.Debug( "baiduMainUrl=%s" , baiduMainUrl) respHtml := getUrlRespHtml(baiduMainUrl, nil) gLogger.Debug( "respHtml=%s" , respHtml) dbgPrintCurCookies() //check cookie var bGotCookieBaiduid = false ; //var cookieNameListToCheck []string = ["BAIDUID"] //toCheckCookieNameList := [1]string{"BAIDUID"} toCheckCookieNameList := []string{ "BAIDUID" } toCheckCookieNum := len(toCheckCookieNameList) gLogger.Debug( "toCheckCookieNum=%d" , toCheckCookieNum) curCookieNum := len(gCurCookies) gLogger.Debug( "curCookieNum=%d" , curCookieNum) for i := 0 ; i < toCheckCookieNum; i++ { toCheckCkName := toCheckCookieNameList[i]; gLogger.Debug( "[%d]toCheckCkName=%s" , i, toCheckCkName) for j := 0 ; j < curCookieNum; j++{ curCookie := gCurCookies[j] if (strings.EqualFold(toCheckCkName, curCookie.Name)){ bGotCookieBaiduid = true ; break ; } } } if bGotCookieBaiduid { gLogger.Info( "Found cookie BAIDUID" ); } else { gLogger.Info( "Not found cookie BAIDUID" ); } //step2: login, pass paras, extract resp cookie gLogger.Info( "====== 步骤2:提取login_token ======" ); bExtractTokenValueOK := false strLoginToken := "" var getApiRespHtml string; if bGotCookieBaiduid{ getApiRespHtml = getUrlRespHtml(getapiUrl, nil); gLogger.Debug( "getApiRespHtml=%s" , getApiRespHtml); dbgPrintCurCookies() //bdPass.api.params.login_token='278623fc5463aa25b0189ddd34165592'; //use regex to extract login_token //【记录】go语言中用正则表达式查找某个值 loginTokenP, _ := regexp.Compile(`bdPass\.api\.params\.login_token= '(?P<loginToken>\w+)' ;`) //loginToken := loginTokenP.FindString(getApiRespHtml); //loginToken := loginTokenP.FindSubmatch(getApiRespHtml); foundLoginToken := loginTokenP.FindStringSubmatch(getApiRespHtml); gLogger.Debug( "foundLoginToken=%s" , foundLoginToken); if nil != foundLoginToken { strLoginToken = foundLoginToken[ 1 ] //tmp go regexp not support named group, so use index here gLogger.Info( "found bdPass.api.params.login_token=%s" , strLoginToken); bExtractTokenValueOK = true ; } else { gLogger.Warn( " not found login_token from html=%s" , getApiRespHtml); } } //step3: verify returned cookies bLoginBaiduOk := false ; if bGotCookieBaiduid && bExtractTokenValueOK { gLogger.Info( "======步骤3:登陆百度并检验返回的Cookie ======" ); postDict := map[string]string{} //postDict["ppui_logintime"] = "" postDict[ "charset" ] = "utf-8" //postDict["codestring"] = "" postDict[ "token" ] = strLoginToken postDict[ "isPhone" ] = "false" postDict[ "index" ] = "0" //postDict["u"] = "" //postDict["safeflg"] = "0" postDict[ "staticpage" ] = staticPageUrl postDict[ "loginType" ] = "1" postDict[ "tpl" ] = "mn" postDict[ "callback" ] = "parent.bdPass.api.login._postCallback" //【已解决】go语言中获得控制台输入的字符串 strBaiduUsername := "" strBaiduPassword := "" gLogger.Info( "Plese input:" ) gLogger.Info( "Baidu Username:" ) _, err1 := fmt.Scanln(&strBaiduUsername) if nil == err1 { gLogger.Debug( "strBaiduUsername=%s" , strBaiduUsername) } gLogger.Info( "Baidu Password:" ) _, err2 := fmt.Scanln(&strBaiduPassword) if nil == err2 { gLogger.Debug( "strBaiduPassword=%s" , strBaiduPassword) } postDict[ "username" ] = strBaiduUsername postDict[ "password" ] = strBaiduPassword postDict[ "verifycode" ] = "" postDict[ "mem_pass" ] = "on" gLogger.Debug( "postDict=%s" , postDict) loginBaiduRespHtml := getUrlRespHtml(baiduMainLoginUrl, postDict); gLogger.Debug( "loginBaiduRespHtml=%s" , loginBaiduRespHtml) dbgPrintCurCookies(); //check resp cookies exist or not cookieNameDict := map[string]bool{ "BDUSS" : false , "PTOKEN" : false , "STOKEN" : false , //"SAVEUSERID": false, //be deleted } for cookieName, _ := range cookieNameDict { for _, singleCookie := range gCurCookies { //if(strings.EqualFold(cookieName, singleCookie.Name)){ if cookieName == singleCookie.Name { cookieNameDict[cookieName] = true ; gLogger.Debug( "Found cookie %s" , cookieName) } } } gLogger.Debug( "After check resp cookie, cookieNameDict=%s" , cookieNameDict) bAllCookiesFound := true for _, bIsExist := range cookieNameDict { bAllCookiesFound = bAllCookiesFound && bIsExist } bLoginBaiduOk = bAllCookiesFound if (bLoginBaiduOk) { gLogger.Info( "成功模拟登陆百度首页!" ); } else { gLogger.Info( "模拟登陆百度首页 失败!" ); gLogger.Info( "所返回的HTML源码为:" + loginBaiduRespHtml); } } deinitAll() //【workaround】go语言中用log4go输出信息时有bug:只输出部分信息,甚至是无任何输出 time.Sleep( 100 * time.Millisecond) } |
效果为:
【总结】
从无到有,经历千辛万苦,最终终于用go语言,实现了,模拟登陆百度。
后续的,抽空再继续优化,至少包括:
【记录】在用go语言成功模拟登陆百度后把相关函数整理至自己的go语言的库函数:crifanLib.go
转载请注明:在路上 » 【记录】用go语言实现模拟登陆百度