【记录】go语言中处理http的cookie

【背景】

折腾:

【记录】用go语言实现模拟登陆百度

期间,已经实现了:

将百度主页的html抓取下来,并且输出到log文件中了。

接下来要去实现:

如何操作cookie,获得当前的cookie

如何确保cookie可以被传入下一次的http的请求中,保持自动处理cookie。

【折腾过程】

1.自己在官网文档中,找到一个和cookie有关的函数:

func SetCookie

另外,也看到:

type Client

中提到了cookie:

type Client struct {
        // Transport specifies the mechanism by which individual
        // HTTP requests are made.
        // If nil, DefaultTransport is used.
        Transport RoundTripper

        // CheckRedirect specifies the policy for handling redirects.
        // If CheckRedirect is not nil, the client calls it before
        // following an HTTP redirect. The arguments req and via are
        // the upcoming request and the requests made already, oldest
        // first. If CheckRedirect returns an error, the Client's Get
        // method returns both the previous Response and
        // CheckRedirect's error (wrapped in a url.Error) instead of
        // issuing the Request req.
        //
        // If CheckRedirect is nil, the Client uses its default policy,
        // which is to stop after 10 consecutive requests.
        CheckRedirect func(req *Request, via []*Request) error

        // Jar specifies the cookie jar.
        // If Jar is nil, cookies are not sent in requests and ignored
        // in responses.
        Jar CookieJar
}

所以,再去看看CookieJar:

type CookieJar

看到其中的说明:

The net/http/cookiejar package provides a CookieJar implementation.

可知,我们应该去添加对应的包:

import (
    "fmt"
    "log"
    "os"
    "runtime"
    "path"
    "strings"
    "io/ioutil"
    "net/http"
    "net/http/cookiejar"
)

2.参考:

Go HTTP Post and use Cookies

去试试其代码,结果试了试,觉得太麻烦,和不好用了。。。。

但是,可以参考其思路,试试自己去用Client。

3.然后看到官网中和cookie相关的一些东西:

http://golang.org/pkg/net/http/

 

func (*Request) AddCookie
func (r *Request) AddCookie(c *Cookie)

AddCookie adds a cookie to the request. Per RFC 6265 section 5.4, AddCookie does not attach more than one Cookie header field. That means all cookies, if any, are written into the same line, separated by semicolon.

func (*Request) Cookie
func (r *Request) Cookie(name string) (*Cookie, error)

Cookie returns the named cookie provided in the request or ErrNoCookie if not found.

func (*Request) Cookies
func (r *Request) Cookies() []*Cookie

Cookies parses and returns the HTTP cookies sent with the request.

所以,貌似是通过Request去记录,发送Cookie,即CookieJar的。

所以,自己去试试能否用上这个Request。

另外,也看到了:

func (*Response) Cookies
func (r *Response) Cookies() []*Cookie

Cookies parses and returns the cookies set in the Set-Cookie headers.

很明显,是其可以通过Response帮我们自动解析对应的Set-Cookie部分,而得到对应的Cookie的。

所以,可以去试试,想办法得到当前的Response,然后看看能否得到对应的Cookies。

然后保存到CookieJar中。

4.而之前的代码是:

    resp, err := http.Get(url)
    if err != nil {
        fmt.Printf("http get response errror=%s\n", err)
    }
    defer resp.Body.Close()
    body, err := ioutil.ReadAll(resp.Body)

所以,先去看看,对应的Get后,返回的内容中,是否有需要的Response等值。

然后就找到了:

func Get
func Get(url string) (resp *Response, err error)

Get issues a GET to the specified URL. If the response is one of the following redirect codes, Get follows the redirect, up to a maximum of 10 redirects:

301 (Moved Permanently)
302 (Found)
303 (See Other)
307 (Temporary Redirect)

An error is returned if there were too many redirects or if there was an HTTP protocol error. A non-2xx response doesn’t cause an error.

When err is nil, resp always contains a non-nil resp.Body. Caller should close resp.Body when done reading from it.

Get is a wrapper around DefaultClient.Get.

很明显,此处就是已经得到了resp,就是Response了。

所以去看看能否拿到对应的Response的Cookies。

然后先要搞懂:

【已解决】go语言中初始化类似于[]Cookie之类的列表变量

5.然后接着继续写代码和测试。

又出现"cannot use resp.Cookies (type func() []*http.Cookie) as type []http.Cookie in assignment"的错误:

【已解决】go语言中想要获得Response的Cookies结果出错:cannot use resp.Cookies (type func() []*http.Cookie) as type []http.Cookie in assignment

6.然后,期间又要涉及到,如何使用全局变量的事情:

【已解决】go语言中的全局变量

7..搞懂全局变量后,再先去搞懂如何使用for循环:

【已解决】go语言中的for循环

8.接着再去,完全搞懂,cookie中的所有的域的信息,或者是cookie本身的信息,然后循环打印出来。看看当前的cookie的值是什么。

对于每个cookie有哪些值,可以参考:

type Cookie

type Cookie struct {
        Name       string
        Value      string
        Path       string
        Domain     string
        Expires    time.Time
        RawExpires string

        // MaxAge=0 means no 'Max-Age' attribute specified.
        // MaxAge<0 means delete cookie now, equivalently 'Max-Age: 0'
        // MaxAge>0 means Max-Age attribute present and given in seconds
        MaxAge   int
        Secure   bool
        HttpOnly bool
        Raw      string
        Unparsed []string // Raw text of unparsed attribute-value pairs
}

然后最后是用如下代码:

func printCurCookies() {
    var cookieNum int = len(gCurCookies);
    gLogger.Info("cookieNum=%d", cookieNum)
    for i := 0; i < cookieNum; i++ {
        var curCk *http.Cookie = gCurCookies[i];
        //gLogger.Info("curCk.Raw=%s", curCk.Raw)
        gLogger.Info("------ Cookie [%d]------", i)
        gLogger.Info("Name\t=%s", curCk.Name)
        gLogger.Info("Value\t=%s", curCk.Value)
        gLogger.Info("Path\t=%s", curCk.Path)
        gLogger.Info("Domain\t=%s", curCk.Domain)
        gLogger.Info("Expires\t=%s", curCk.Expires)
        gLogger.Info("RawExpires=%s", curCk.RawExpires)
        gLogger.Info("MaxAge\t=%d", curCk.MaxAge)
        gLogger.Info("Secure\t=%t", curCk.Secure)
        gLogger.Info("HttpOnly=%t", curCk.HttpOnly)
        gLogger.Info("Raw\t=%s", curCk.Raw)
        gLogger.Info("Unparsed=%s", curCk.Unparsed)
    }
}

输出为:

[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:137) cookieNum=3
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:141) ------ Cookie [0]------
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:142) Name	=BDSVRTM
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:143) Value	=3
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:144) Path	=/
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:145) Domain	=
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:146) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:147) RawExpires=
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:148) MaxAge	=0
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:149) Secure	=false
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:150) HttpOnly=false
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:151) Raw	=BDSVRTM=3; path=/
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:152) Unparsed=[]
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:141) ------ Cookie [1]------
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:142) Name	=H_PS_PSSID
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:143) Value	=3415_1431_2975_2981
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:144) Path	=/
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:145) Domain	=.baidu.com
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:146) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:147) RawExpires=
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:148) MaxAge	=0
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:149) Secure	=false
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:150) HttpOnly=false
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:151) Raw	=H_PS_PSSID=3415_1431_2975_2981; path=/; domain=.baidu.com
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:152) Unparsed=[]
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:141) ------ Cookie [2]------
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:142) Name	=BAIDUID
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:143) Value	=AF99372EE54C9816618EED94475DDD26:FG=1
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:144) Path	=/
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:145) Domain	=.baidu.com
[2013/09/20 18:39:40 ] [INFO] (main.printCurCookies:146) Expires	=0001-01-01 00:00:00 +0000 UTC

9.其中,关于log输出,详见:

【已解决】go语言中实现log信息同时输出到文件和控制台(命令行)

10.此处,已经可以将cookie信息打印出来,放到log文件中,方便查看了。

剩下的,接着就是去,搞懂如何在发送http的GET或POST前,把对应cookie加进去了。

11.期间,先去搞懂字符串数组:

【已解决】go语言中的字符串数组

12.而在此期间,遇到一个问题:

【workaround】go语言中用log4go输出信息时有bug:只输出部分信息,甚至是无任何输出

13.终于,算是可以放心的使用log4go记录log信息了。。。

接着,再去搞懂,如何处理cookie的事情。

参考:

authenticated http client requests from golang

和之前的:

Go HTTP Post and use Cookies

打算去参考的去实现呢。

然后才慢慢的看懂:

官网中:

http://golang.org/pkg/net/http/#CookieJar

的解释:

A CookieJar manages storage and use of cookies in HTTP requests.

Implementations of CookieJar must be safe for concurrent use by multiple goroutines.

The net/http/cookiejar package provides a CookieJar implementation.

的意思是:

包,net/http/cookiejar,中,已经帮我们实现好了,对应的cookieJar,去管理Cookie了。

换句话说:

上面的帖子中的:

type myjar struct
SetCookies
Cookies

等代码,都是不需要我们自己写了,我们搞懂如何去用

net/http/cookiejar

就可以了。

14.然后去照葫芦画瓢,去看看如何用

net/http/cookiejar

去管理cookie。

貌似,最后是通过,下面核心代码:

package main

import (
    "io/ioutil"
    "net/http"
    "net/http/cookiejar"
)

/***************************************************************************************************
    Global Variables
***************************************************************************************************/
var gCurCookies []*http.Cookie;
var gCurCookieJar *cookiejar.Jar;

/***************************************************************************************************
    Functions
***************************************************************************************************/
//do init before all others
func initAll(){
    gCurCookies = nil
    //var err error;
    gCurCookieJar,_ = cookiejar.New(nil)
    
}

//get url response html
func getUrlRespHtml(url string) string{
    gLogger.Info("getUrlRespHtml, url=%s", url)
    
    var respHtml string = "";
    
    httpClient := &http.Client{
        CheckRedirect: nil,
        Jar:gCurCookieJar,
    }

    // httpResp, err := httpClient.Get("http://example.com")
    // // ...

    httpReq, err := http.NewRequest("GET", url, nil)
    // ...
    //httpReq.Header.Add("If-None-Match", `W/"wyzzy"`)
    httpResp, err := httpClient.Do(httpReq)
    // ...
    
    //httpResp, err := http.Get(url)
    //gLogger.Info("http.Get done")
    if err != nil {
        gLogger.Warn("http get url=%s response error=%s\n", url, err.Error())
    }
    gLogger.Info("httpResp.Header=%s", httpResp.Header)
    gLogger.Debug("httpResp.Status=%s", httpResp.Status)

    defer httpResp.Body.Close()
    // gLogger.Info("defer httpResp.Body.Close done")
    
    body, errReadAll := ioutil.ReadAll(httpResp.Body)
    //gLogger.Info("ioutil.ReadAll done")
    if errReadAll != nil {
        gLogger.Warn("get response for url=%s got error=%s\n", url, errReadAll.Error())
    }
    //gLogger.Debug("body=%s\n", body)

    //gCurCookies = httpResp.Cookies()
    //gCurCookieJar = httpClient.Jar;
    gCurCookies = gCurCookieJar.Cookies(httpReq.URL);
    //gLogger.Info("httpResp.Cookies done")
    
    //respHtml = "just for test log ok or not"
    respHtml = string(body)
    //gLogger.Info("httpResp body []byte to string done")

    return respHtml
}

func dbgPrintCurCookies() {
    var cookieNum int = len(gCurCookies);
    gLogger.Info("cookieNum=%d", cookieNum)
    for i := 0; i < cookieNum; i++ {
        var curCk *http.Cookie = gCurCookies[i];
        //gLogger.Info("curCk.Raw=%s", curCk.Raw)
        gLogger.Info("------ Cookie [%d]------", i)
        gLogger.Info("Name\t=%s", curCk.Name)
        gLogger.Info("Value\t=%s", curCk.Value)
        gLogger.Info("Path\t=%s", curCk.Path)
        gLogger.Info("Domain\t=%s", curCk.Domain)
        gLogger.Info("Expires\t=%s", curCk.Expires)
        gLogger.Info("RawExpires=%s", curCk.RawExpires)
        gLogger.Info("MaxAge\t=%d", curCk.MaxAge)
        gLogger.Info("Secure\t=%t", curCk.Secure)
        gLogger.Info("HttpOnly=%t", curCk.HttpOnly)
        gLogger.Info("Raw\t=%s", curCk.Raw)
        gLogger.Info("Unparsed=%s", curCk.Unparsed)
    }
}

func main() {
    initAll()

    //step1: access baidu url to get cookie BAIDUID
    gLogger.Info("====== 步骤1:获得BAIDUID的Cookie ======")
    var baiduMainUrl string = "http://www.baidu.com/";
    gLogger.Info("baiduMainUrl=%s", baiduMainUrl)
    respHtml := getUrlRespHtml(baiduMainUrl)
    gLogger.Debug("respHtml=%s", respHtml)
    dbgPrintCurCookies()
    
    //check cookie
    //...

    //step2: login, pass paras, extract resp cookie
    gLogger.Info("====== 步骤2:提取login_token ======");
    if bBaiduidCookieExist{
        //https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true
        var getapiUrl string = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
        var getApiRespHtml string = getUrlRespHtml(getapiUrl);
        gLogger.Debug("getApiRespHtml=%s", getApiRespHtml);
        dbgPrintCurCookies()
    }
}

是可以实现了cookie的管理了。

对应的输出为:

......
[2013/09/21 00:37:33 ] [INFO] (main.main:205) ====== 步骤1:获得BAIDUID的Cookie ======
[2013/09/21 00:37:33 ] [INFO] (main.main:207) baiduMainUrl=http://www.baidu.com/
[2013/09/21 00:37:33 ] [INFO] (main.getUrlRespHtml:125) getUrlRespHtml, url=http://www.baidu.com/
......
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:175) cookieNum=3
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:179) ------ Cookie [0]------
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:180) Name	=BDSVRTM
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:181) Value	=2
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:182) Path	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:183) Domain	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:184) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:185) RawExpires=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:186) MaxAge	=0
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:187) Secure	=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:188) HttpOnly=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:189) Raw	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:190) Unparsed=[]
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:179) ------ Cookie [1]------
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:180) Name	=H_PS_PSSID
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:181) Value	=3359_3380_1462_2976_2981_3090
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:182) Path	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:183) Domain	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:184) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:185) RawExpires=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:186) MaxAge	=0
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:187) Secure	=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:188) HttpOnly=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:189) Raw	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:190) Unparsed=[]
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:179) ------ Cookie [2]------
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:180) Name	=BAIDUID
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:181) Value	=AB215C36BAF3BD955D8781590D5A9E86:FG=1
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:182) Path	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:183) Domain	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:184) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:185) RawExpires=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:186) MaxAge	=0
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:187) Secure	=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:188) HttpOnly=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:189) Raw	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:190) Unparsed=[]
......
[2013/09/21 00:37:33 ] [INFO] (main.main:240) ====== 步骤2:提取login_token ======
[2013/09/21 00:37:33 ] [INFO] (main.getUrlRespHtml:125) getUrlRespHtml, url=https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true
[2013/09/21 00:37:33 ] [INFO] (main.getUrlRespHtml:148) httpResp.Header=map[Server:[] Date:[Fri, 20 Sep 2013 16:37:27 GMT] Content-Type:[application/json; charset=utf-8] Connection:[keep-alive] Set-Cookie:[HOSUPPORT=1; expires=Tue, 07-Dec-2021 16:37:27 GMT; path=/; domain=passport.baidu.com; httponly]]
[2013/09/21 00:37:33 ] [DEBG] (main.getUrlRespHtml:149) httpResp.Status=200 OK
[2013/09/21 00:37:33 ] [DEBG] (main.main:245) getApiRespHtml=
var bdPass=bdPass||{};
bdPass.api=bdPass.api||{};
bdPass.api.params=bdPass.api.params||{};
bdPass.api.params.login_token='278623fc5463aa25b0189ddd34165592';
	bdPass.api.params.login_tpl='mn';

			document.write('<script type="text/javascript" charset="UTF-8" src="https://passport.baidu.com/js/v2ApiUsedTangramFunctions.js?v=20130916"></script>');
		document.write('<script type="text/javascript" charset="UTF-8" src="https://passport.baidu.com/js/pass_api_login.js?v=20130916"></script>');

[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:175) cookieNum=3
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:179) ------ Cookie [0]------
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:180) Name	=H_PS_PSSID
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:181) Value	=3359_3380_1462_2976_2981_3090
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:182) Path	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:183) Domain	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:184) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:185) RawExpires=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:186) MaxAge	=0
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:187) Secure	=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:188) HttpOnly=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:189) Raw	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:190) Unparsed=[]
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:179) ------ Cookie [1]------
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:180) Name	=BAIDUID
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:181) Value	=AB215C36BAF3BD955D8781590D5A9E86:FG=1
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:182) Path	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:183) Domain	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:184) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:185) RawExpires=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:186) MaxAge	=0
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:187) Secure	=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:188) HttpOnly=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:189) Raw	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:190) Unparsed=[]
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:179) ------ Cookie [2]------
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:180) Name	=HOSUPPORT
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:181) Value	=1
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:182) Path	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:183) Domain	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:184) Expires	=0001-01-01 00:00:00 +0000 UTC
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:185) RawExpires=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:186) MaxAge	=0
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:187) Secure	=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:188) HttpOnly=false
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:189) Raw	=
[2013/09/21 00:37:33 ] [INFO] (main.dbgPrintCurCookies:190) Unparsed=[]

其中可见:

第一次访问:

http://www.baidu.com/

返回三个Cookie:BDSVRTM,H_PS_PSSID,BAIDUID

然后,这些cookie,可以传递进入第二次访问:

https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true

然后可以返回

(httpResp.Header中可以看到那个

Set-Cookie:[HOSUPPORT=1

)单个的cooke:HOSUPPORT

然后经过CookieJar的管理,而合并为:

目前最新的,3个Cookie:

H_PS_PSSID,BAIDUID,HOSUPPORT

其中:

BDSVRTM(暂时没有完全看懂是什么原因)被管理掉了,删除掉了

(可能是内部的是发现已经过期了?

还是domain不匹配?

但是debug输出的cookie的信息中,expire都是空的,都是

Expires    =0001-01-01 00:00:00 +0000 UTC

啊,而且domain也都是空的:

Domain    =

所以,还是有点奇怪的

不论如何,其算是可以实现对应的cookie的管理了。

只能等后续继续深入研究,如何控制这些cookie,以及其内部处理机制。

貌似可以抽空去看看对应的cookiejar.go的源码:

https://gist.github.com/andelf/2216371

 

【总结】

目前,通过下面(核心代码):

package main

import (
    "io/ioutil"
    "net/http"
    "net/http/cookiejar"
)

/***************************************************************************************************
    Global Variables
***************************************************************************************************/
var gCurCookies []*http.Cookie;
var gCurCookieJar *cookiejar.Jar;

/***************************************************************************************************
    Functions
***************************************************************************************************/
//do init before all others
func initAll(){
    gCurCookies = nil
    //var err error;
    gCurCookieJar,_ = cookiejar.New(nil)
    
}

//get url response html
func getUrlRespHtml(url string) string{
    gLogger.Info("getUrlRespHtml, url=%s", url)
    
    var respHtml string = "";
    
    httpClient := &http.Client{
        CheckRedirect: nil,
        Jar:gCurCookieJar,
    }

    // httpResp, err := httpClient.Get("http://example.com")
    // // ...

    httpReq, err := http.NewRequest("GET", url, nil)
    // ...
    //httpReq.Header.Add("If-None-Match", `W/"wyzzy"`)
    httpResp, err := httpClient.Do(httpReq)
    // ...
    
    //httpResp, err := http.Get(url)
    //gLogger.Info("http.Get done")
    if err != nil {
        gLogger.Warn("http get url=%s response error=%s\n", url, err.Error())
    }
    gLogger.Info("httpResp.Header=%s", httpResp.Header)
    gLogger.Debug("httpResp.Status=%s", httpResp.Status)

    defer httpResp.Body.Close()
    // gLogger.Info("defer httpResp.Body.Close done")
    
    body, errReadAll := ioutil.ReadAll(httpResp.Body)
    //gLogger.Info("ioutil.ReadAll done")
    if errReadAll != nil {
        gLogger.Warn("get response for url=%s got error=%s\n", url, errReadAll.Error())
    }
    //gLogger.Debug("body=%s\n", body)

    //gCurCookies = httpResp.Cookies()
    //gCurCookieJar = httpClient.Jar;
    gCurCookies = gCurCookieJar.Cookies(httpReq.URL);
    //gLogger.Info("httpResp.Cookies done")
    
    //respHtml = "just for test log ok or not"
    respHtml = string(body)
    //gLogger.Info("httpResp body []byte to string done")

    return respHtml
}

func dbgPrintCurCookies() {
    var cookieNum int = len(gCurCookies);
    gLogger.Info("cookieNum=%d", cookieNum)
    for i := 0; i < cookieNum; i++ {
        var curCk *http.Cookie = gCurCookies[i];
        //gLogger.Info("curCk.Raw=%s", curCk.Raw)
        gLogger.Info("------ Cookie [%d]------", i)
        gLogger.Info("Name\t=%s", curCk.Name)
        gLogger.Info("Value\t=%s", curCk.Value)
        gLogger.Info("Path\t=%s", curCk.Path)
        gLogger.Info("Domain\t=%s", curCk.Domain)
        gLogger.Info("Expires\t=%s", curCk.Expires)
        gLogger.Info("RawExpires=%s", curCk.RawExpires)
        gLogger.Info("MaxAge\t=%d", curCk.MaxAge)
        gLogger.Info("Secure\t=%t", curCk.Secure)
        gLogger.Info("HttpOnly=%t", curCk.HttpOnly)
        gLogger.Info("Raw\t=%s", curCk.Raw)
        gLogger.Info("Unparsed=%s", curCk.Unparsed)
    }
}

func main() {
    initAll()

    //step1: access baidu url to get cookie BAIDUID
    gLogger.Info("====== 步骤1:获得BAIDUID的Cookie ======")
    var baiduMainUrl string = "http://www.baidu.com/";
    gLogger.Info("baiduMainUrl=%s", baiduMainUrl)
    respHtml := getUrlRespHtml(baiduMainUrl)
    gLogger.Debug("respHtml=%s", respHtml)
    dbgPrintCurCookies()
    
    //check cookie
    //...

    //step2: login, pass paras, extract resp cookie
    gLogger.Info("====== 步骤2:提取login_token ======");
    if bBaiduidCookieExist{
        //https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true
        var getapiUrl string = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
        var getApiRespHtml string = getUrlRespHtml(getapiUrl);
        gLogger.Debug("getApiRespHtml=%s", getApiRespHtml);
        dbgPrintCurCookies()
    }
}

是可以实现:

基本的,对于cookie的管理的。



发表评论

电子邮件地址不会被公开。 必填项已用*标注

无觅相关文章插件,快速提升流量