【教程】模拟登陆网站之 Python版（内含两种版本的完整的可运行的代码）

顺序	访问地址	访问类型	发送的数据	需要获得/提取的返回的值
1	http://www.baidu.com/	GET	无	返回的cookie中的BAIDUID
2	https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true	GET	包含BAIDUID这个cookie	从返回的html中提取出token的值
3	https://passport.baidu.com/v2/api/?login	POST	一堆的post data，其中token的值是之前提取出来的	需要验证返回的cookie中，是否包含BDUSS，PTOKEN，STOKEN，SAVEUSERID

然后，最终就可以写出相关的，用于演示模拟登录百度首页的Python代码了。

【版本1：Python实现模拟登陆百度首页的完整代码之精简版】

这个是相对精简的一个版本：

#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Function:   Used to demostrate how to use Python code to emulate login baidu main page: http://www.baidu.com/
Note:       Before try to understand following code, firstly, please read the related articles:
            (1)【整理】关于抓取网页，分析网页内容，模拟登陆网站的逻辑/流程和注意事项
            https://www.crifan.com/summary_about_flow_process_of_fetch_webpage_simulate_login_website_and_some_notice/
            (2) 【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
            https://www.crifan.com/use_ie9_f12_to_analysis_the_internal_logical_process_of_login_baidu_main_page_website/
            (3) 【教程】模拟登陆网站 之 Python版
            https://www.crifan.com/emulate_login_website_using_python
Version:    2012-11-06
Author:     Crifan
"""

import re;
import cookielib;
import urllib;
import urllib2;
import optparse;

#------------------------------------------------------------------------------
# check all cookies in cookiesDict is exist in cookieJar or not
def checkAllCookiesExist(cookieNameList, cookieJar) :
    cookiesDict = {};
    for eachCookieName in cookieNameList :
        cookiesDict[eachCookieName] = False;
    
    allCookieFound = True;
    for cookie in cookieJar :
        if(cookie.name in cookiesDict) :
            cookiesDict[cookie.name] = True;
    
    for eachCookie in cookiesDict.keys() :
        if(not cookiesDict[eachCookie]) :
            allCookieFound = False;
            break;

    return allCookieFound;

#------------------------------------------------------------------------------
# just for print delimiter
def printDelimiter():
    print '-'*80;

#------------------------------------------------------------------------------
# main function to emulate login baidu
def emulateLoginBaidu():
    print "Function: Used to demostrate how to use Python code to emulate login baidu main page: http://www.baidu.com/";
    print "Usage: emulate_login_baidu_python.py -u yourBaiduUsername -p yourBaiduPassword";
    printDelimiter();

    # parse input parameters
    parser = optparse.OptionParser();
    parser.add_option("-u","--username",action="store",type="string",default='',dest="username",help="Your Baidu Username");
    parser.add_option("-p","--password",action="store",type="string",default='',dest="password",help="Your Baidu password");
    (options, args) = parser.parse_args();
    # export all options variables, then later variables can be used
    for i in dir(options):
        exec(i + " = options." + i);

    printDelimiter();
    print "[preparation] using cookieJar & HTTPCookieProcessor to automatically handle cookies";
    cj = cookielib.CookieJar();
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj));
    urllib2.install_opener(opener);

    printDelimiter();
    print "[step1] to get cookie BAIDUID";
    baiduMainUrl = "http://www.baidu.com/";
    resp = urllib2.urlopen(baiduMainUrl);
    #respInfo = resp.info();
    #print "respInfo=",respInfo;
    for index, cookie in enumerate(cj):
        print '[',index, ']',cookie;

    printDelimiter();
    print "[step2] to get token value";
    getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
    getapiResp = urllib2.urlopen(getapiUrl);
    #print "getapiResp=",getapiResp;
    getapiRespHtml = getapiResp.read();
    #print "getapiRespHtml=",getapiRespHtml;
    #bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
    foundTokenVal = re.search("bdPass\.api\.params\.login_token='(?P<tokenVal>\w+)';", getapiRespHtml);
    if(foundTokenVal):
        tokenVal = foundTokenVal.group("tokenVal");
        print "tokenVal=",tokenVal;

        printDelimiter();
        print "[step3] emulate login baidu";
        staticpage = "http://www.baidu.com/cache/user/html/jump.html";
        baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";
        postDict = {
            #'ppui_logintime': "",
            'charset'       : "utf-8",
            #'codestring'    : "",
            'token'         : tokenVal, #de3dbf1e8596642fa2ddf2921cd6257f
            'isPhone'       : "false",
            'index'         : "0",
            #'u'             : "",
            #'safeflg'       : "0",
            'staticpage'    : staticpage, #http%3A%2F%2Fwww.baidu.com%2Fcache%2Fuser%2Fhtml%2Fjump.html
            'loginType'     : "1",
            'tpl'           : "mn",
            'callback'      : "parent.bdPass.api.login._postCallback",
            'username'      : username,
            'password'      : password,
            #'verifycode'    : "",
            'mem_pass'      : "on",
        };
        postData = urllib.urlencode(postDict);
        # here will automatically encode values of parameters
        # such as:
        # encode http://www.baidu.com/cache/user/html/jump.html into http%3A%2F%2Fwww.baidu.com%2Fcache%2Fuser%2Fhtml%2Fjump.html
        #print "postData=",postData;
        req = urllib2.Request(baiduMainLoginUrl, postData);
        # in most case, for do POST request, the content-type, is application/x-www-form-urlencoded
        req.add_header('Content-Type', "application/x-www-form-urlencoded");
        resp = urllib2.urlopen(req);
        #for index, cookie in enumerate(cj):
        #    print '[',index, ']',cookie;
        cookiesToCheck = ['BDUSS', 'PTOKEN', 'STOKEN', 'SAVEUSERID'];
        loginBaiduOK = checkAllCookiesExist(cookiesToCheck, cj);
        if(loginBaiduOK):
            print "+++ Emulate login baidu is OK, ^_^";
        else:
            print "--- Failed to emulate login baidu !"
    else:
        print "Fail to extract token value from html=",getapiRespHtml;

if __name__=="__main__":
    emulateLoginBaidu();

【版本2：Python实现模拟登陆百度首页的完整代码之 crifanLib.py版】

这个是另外一个版本，其中利用到我自己的python库：crifanLib.py ：

#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Function:   Used to demostrate how to use Python code to emulate login baidu main page: http://www.baidu.com/
            Use the functions from crifanLib.py
Note:       Before try to understand following code, firstly, please read the related articles:
            (1)【整理】关于抓取网页，分析网页内容，模拟登陆网站的逻辑/流程和注意事项
            https://www.crifan.com/summary_about_flow_process_of_fetch_webpage_simulate_login_website_and_some_notice/
            (2) 【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
            https://www.crifan.com/use_ie9_f12_to_analysis_the_internal_logical_process_of_login_baidu_main_page_website/
            (3) 【教程】模拟登陆网站 之 Python版
            https://www.crifan.com/emulate_login_website_using_python
Version:    2012-11-07
Author:     Crifan
Contact:    admin (at) crifan.com
"""

import re;
import cookielib;
import urllib;
import urllib2;
import optparse;

#===============================================================================
# following are some functions, extracted from my python library: crifanLib.py
# for the whole crifanLib.py:
# online browser: http://code.google.com/p/crifanlib/source/browse/trunk/python/crifanLib.py
# download      : http://code.google.com/p/crifanlib/downloads/list
#===============================================================================

import zlib;

gConst = {
    'constUserAgent' : 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E)',
    #'constUserAgent' : "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0.1",
}

################################################################################
# Network: urllib/urllib2/http
################################################################################

#------------------------------------------------------------------------------
# get response from url
# note: if you have already used cookiejar, then here will automatically use it
# while using rllib2.Request
def getUrlResponse(url, postDict={}, headerDict={}, timeout=0, useGzip=False) :
    # makesure url is string, not unicode, otherwise urllib2.urlopen will error
    url = str(url);

    if (postDict) :
        postData = urllib.urlencode(postDict);
        req = urllib2.Request(url, postData);
        req.add_header('Content-Type', "application/x-www-form-urlencoded");
    else :
        req = urllib2.Request(url);

    if(headerDict) :
        #print "added header:",headerDict;
        for key in headerDict.keys() :
            req.add_header(key, headerDict[key]);

    defHeaderDict = {
        'User-Agent'    : gConst['constUserAgent'],
        'Cache-Control' : 'no-cache',
        'Accept'        : '*/*',
        'Connection'    : 'Keep-Alive',
    };

    # add default headers firstly
    for eachDefHd in defHeaderDict.keys() :
        #print "add default header: %s=%s"%(eachDefHd,defHeaderDict[eachDefHd]);
        req.add_header(eachDefHd, defHeaderDict[eachDefHd]);

    if(useGzip) :
        #print "use gzip for",url;
        req.add_header('Accept-Encoding', 'gzip, deflate');

    # add customized header later -> allow overwrite default header 
    if(headerDict) :
        #print "added header:",headerDict;
        for key in headerDict.keys() :
            req.add_header(key, headerDict[key]);

    if(timeout > 0) :
        # set timeout value if necessary
        resp = urllib2.urlopen(req, timeout=timeout);
    else :
        resp = urllib2.urlopen(req);
    
    return resp;

#------------------------------------------------------------------------------
# get response html==body from url
#def getUrlRespHtml(url, postDict={}, headerDict={}, timeout=0, useGzip=False) :
def getUrlRespHtml(url, postDict={}, headerDict={}, timeout=0, useGzip=True) :
    resp = getUrlResponse(url, postDict, headerDict, timeout, useGzip);
    respHtml = resp.read();
    if(useGzip) :
        #print "---before unzip, len(respHtml)=",len(respHtml);
        respInfo = resp.info();
        
        # Server: nginx/1.0.8
        # Date: Sun, 08 Apr 2012 12:30:35 GMT
        # Content-Type: text/html
        # Transfer-Encoding: chunked
        # Connection: close
        # Vary: Accept-Encoding
        # ...
        # Content-Encoding: gzip
        
        # sometime, the request use gzip,deflate, but actually returned is un-gzip html
        # -> response info not include above "Content-Encoding: gzip"
        # eg: http://blog.sina.com.cn/s/comment_730793bf010144j7_3.html
        # -> so here only decode when it is indeed is gziped data
        if( ("Content-Encoding" in respInfo) and (respInfo['Content-Encoding'] == "gzip")) :
            respHtml = zlib.decompress(respHtml, 16+zlib.MAX_WBITS);
            #print "+++ after unzip, len(respHtml)=",len(respHtml);

    return respHtml;

################################################################################
# Cookies
################################################################################

#------------------------------------------------------------------------------
# check all cookies in cookiesDict is exist in cookieJar or not
def checkAllCookiesExist(cookieNameList, cookieJar) :
    cookiesDict = {};
    for eachCookieName in cookieNameList :
        cookiesDict[eachCookieName] = False;
    
    allCookieFound = True;
    for cookie in cookieJar :
        if(cookie.name in cookiesDict) :
            cookiesDict[cookie.name] = True;
    
    for eachCookie in cookiesDict.keys() :
        if(not cookiesDict[eachCookie]) :
            allCookieFound = False;
            break;

    return allCookieFound;

#===============================================================================

#------------------------------------------------------------------------------
# just for print delimiter
def printDelimiter():
    print '-'*80;

#------------------------------------------------------------------------------
# main function to emulate login baidu
def emulateLoginBaidu():
    print "Function: Used to demostrate how to use Python code to emulate login baidu main page: http://www.baidu.com/";
    print "Usage: emulate_login_baidu_python.py -u yourBaiduUsername -p yourBaiduPassword";
    printDelimiter();

    # parse input parameters
    parser = optparse.OptionParser();
    parser.add_option("-u","--username",action="store",type="string",default='',dest="username",help="Your Baidu Username");
    parser.add_option("-p","--password",action="store",type="string",default='',dest="password",help="Your Baidu password");
    (options, args) = parser.parse_args();
    # export all options variables, then later variables can be used
    for i in dir(options):
        exec(i + " = options." + i);

    printDelimiter();
    print "[preparation] using cookieJar & HTTPCookieProcessor to automatically handle cookies";
    cj = cookielib.CookieJar();
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj));
    urllib2.install_opener(opener);

    printDelimiter();
    print "[step1] to get cookie BAIDUID";
    baiduMainUrl = "http://www.baidu.com/";
    resp = getUrlResponse(baiduMainUrl);
    # here you should see: BAIDUID
    for index, cookie in enumerate(cj):
        print '[',index, ']',cookie;

    printDelimiter();
    print "[step2] to get token value";
    getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true";
    getapiRespHtml = getUrlRespHtml(getapiUrl);
    #bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3';
    foundTokenVal = re.search("bdPass\.api\.params\.login_token='(?P<tokenVal>\w+)';", getapiRespHtml);
    if(foundTokenVal):
        tokenVal = foundTokenVal.group("tokenVal");
        print "tokenVal=",tokenVal;

        printDelimiter();
        print "[step3] emulate login baidu";
        staticpage = "http://www.baidu.com/cache/user/html/jump.html";
        baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login";
        postDict = {
            #'ppui_logintime': "",
            'charset'       : "utf-8",
            #'codestring'    : "",
            'token'         : tokenVal, #de3dbf1e8596642fa2ddf2921cd6257f
            'isPhone'       : "false",
            'index'         : "0",
            #'u'             : "",
            #'safeflg'       : "0",
            'staticpage'    : staticpage, #http%3A%2F%2Fwww.baidu.com%2Fcache%2Fuser%2Fhtml%2Fjump.html
            'loginType'     : "1",
            'tpl'           : "mn",
            'callback'      : "parent.bdPass.api.login._postCallback",
            'username'      : username,
            'password'      : password,
            #'verifycode'    : "",
            'mem_pass'      : "on",
        };
        loginRespHtml = getUrlRespHtml(baiduMainLoginUrl, postDict);
        cookiesToCheck = ['BDUSS', 'PTOKEN', 'STOKEN', 'SAVEUSERID'];
        loginBaiduOK = checkAllCookiesExist(cookiesToCheck, cj);
        if(loginBaiduOK):
            print "+++ Emulate login baidu is OK, ^_^";
        else:
            print "--- Failed to emulate login baidu !"
    else:
        print "Fail to extract token value from html=",getapiRespHtml;

if __name__=="__main__":
    emulateLoginBaidu();

此版本的目的在于，方便后来人使用网络相关的函数，不用关心内部细节。

并且，相关的函数，也可以供以后再次利用。

注：关于crifanLib.py：

在线浏览：crifanLib.py

下载：crifanLib_2012-11-07.7z

上述两种版本的代码，对应的输出，都是：

D:\tmp\tmp_dev_root\python\emulate_login_baidu_python>emulate_login_baidu_python.py -u crifan -p xxxxxx
Function: Used to demostrate how to use Python code to emulate login baidu main page: http://www.baidu.com/
Usage: emulate_login_baidu_python.py -u yourBaiduUsername -p yourBaiduPassword
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
[preparation] using cookieJar & HTTPCookieProcessor to automatically handle cookies
--------------------------------------------------------------------------------
[step1] to get cookie BAIDUID
[ 0 ] <Cookie BAIDUID=8D85C6528FDF7B5F49C746A18524495B:FG=1 for .baidu.com/>
--------------------------------------------------------------------------------
[step2] to get token value
tokenVal= 4d3f004bbe3e6f0cfa435abd38dd9fec
--------------------------------------------------------------------------------
[step3] emulate login baidu
+++ Emulate login baidu is OK, ^_^

【总结】

总的来说，其实分析网站登陆的过程，所涉及的内部逻辑，其实比用代码写出来要难多了。

而分析网站登陆过程的大概逻辑，要比用工具去具体的分析，要重要的多。

而这一堆的过程，之前自己折腾时，也正是苦于无完整的教程，所以，才有现在的一堆的帖子，来从头到尾的解释，从概念，到逻辑，到分析，到实现的整个过程。

全部都看完，应该对这部分内容，就大概有个了解的。

剩下的东西，就是实际的操练了，就是自己折腾的过程了。

希望上述所有的概念，逻辑，方法，代码，对你有用。

转载请注明：在路上 » 【教程】模拟登陆网站之 Python版（内含两种版本的完整的可运行的代码）

Post Views: 4,271

你好，怎样实现登录这个网站 https://www.ss-fast.com/ucenter/#?act=free_plan PS: 我想通过Shadowsocks翻墙代码为：sslocal -s tky.jp.v0.ss-fast.com -p 873 -k f6YVx3 -b 127.0.0.1 -l 1080 但Shadowsocks密码15分钟变换一次用python怎样抓取密码并实现15分钟更新执行上述代码，“f6YVx3”为密码15分钟变换一次

侧耳倾听8年前 (2016-06-11)回复

这么垃圾的程序, 也真是难为你了, 除了误人子弟,没别的用处

百度安全管家9年前 (2015-10-02)回复

怎么不见你发点不垃圾的。哪怕垃圾也没见一个。光用嘴放PI
垃圾管家9年前 (2015-11-19)回复

好人一生平安

WangSir9年前 (2015-02-02)回复

楼主你写的真是太详细了对于菜鸟真是非常好非常好的指导大赞！

yeddatian9年前 (2014-12-25)回复

模拟登陆后怎么爬数据啊

liuyanping10年前 (2014-10-14)回复

一句话说不清楚，你自己看我的整套的教程，就懂了：详解抓取网站，模拟登陆，抓取动态网页的原理和实现（Python，C#等）
crifan10年前 (2014-10-20)回复

博主你好，我现在也想模拟登录百度，但是发现需要POST的数据里面多了个rsakey参数，似乎现在用到了RSA加密。password参数也是一串乱码，请问这个应该如何处理。

sugar10年前 (2014-10-07)回复

我也不知道，因为我没去花精力研究。不清楚呢。
crifan10年前 (2014-10-11)回复

楼主你太彪悍了，佩服你。

patrick jane10年前 (2014-09-04)回复

Good day! Would you mind if I share your blog with my twitter group? There's a lot of folks that I think would really enjoy your content. Please let me know. Thanks

sex toys10年前 (2014-03-01)回复

你好，这些资料非常不错，但是那个检测有没有登陆成功的函数是不是有问题？虽然检测的是失败，但是我在登陆后重新获取百度主页的源码，能够找到账号信息，应该是成功的

WCXONTHEWAY10年前 (2014-01-09)回复

谢谢你的教程，写的真的不错，一直对模拟登陆的细节都不清，看了你的文章有一种豁然开朗的感觉。

baicai11年前 (2013-09-19)回复

博主，请问个问题，我尝试模拟登陆百度但是cookie中SAVEUSERID这个cookie获取不下来请问是什么原因？

Sty_Wolf11年前 (2013-08-19)回复

我后来在写其他的教程：【教程】模拟登陆百度之Java代码版和【记录】用go语言实现模拟登陆百度期间，遇到类似的问题了：在自动管理cookie时，由于发现返回的SAVEUSERID，已经是被deleted，所以就删除掉了，所以让你以为这个SAVEUSERID，是获得不到的。所以，此处，实际上不理会这个SAVEUSERID，只去检测其他的BDUSS，PTOKEN，STOKEN，就可以了。
crifan11年前 (2013-09-22)回复

很详细的进程,不过还有一个问题想请教博主,登录所得的cookie信息能否保存起来,以实现下次打开程序的时候是已经登录的状态(不要直接保存用户名和密码)

well11年前 (2013-05-14)回复

参考：【整理】Python中Cookie的处理：自动处理Cookie，保存为Cookie文件，从文件载入Cookie
crifan11年前 (2013-05-16)回复

百度登陆遇到验证码是不是就failed

lijf11年前 (2013-04-15)回复

应该是的。至少目前的教程中，是没有处理验证码这部分的。处理验证码，是个相对来说比较复杂的事情，暂时不考虑。
crifan11年前 (2013-09-22)回复

好不容易找到一个能用的，可能因为百度登陆改版了吧，找的旧的教程都用不了～～楼主写的很详细呀，但是精简版都好长，汗汗汗。。。我研究研究～

小媛11年前 (2013-02-17)回复

试着也模仿了一个登陆down.51cto.com的python版，目的是签到，登陆部分我试过用错误的密码登陆，cookies里面获取不到token，用正确的则可以，是否证明登陆成功？然后使用一个函数去post那个签到的url，分析发现post data是none，但是不知道里面的t数值是哪里来的。 def getfreecredits(): url = 'http://down.51cto.com/download.php?do=getfreecredits&t=0.7784573023846759' req = urllib2.Request(url); req.add_header('Content-Type', "application/x-www-form-urlencoded"); resp = urllib2.urlopen(req);

four512年前 (2012-12-10)回复

一直以来在找python模拟登陆的教程，一直都没找到你这么详细的。谢谢了。回去试试写个baidu贴吧的签到工具

four512年前 (2012-12-05)回复

【教程】模拟登陆网站之 Python版（内含两种版本的完整的可运行的代码）

与本文相关的文章

Hi，您需要填写昵称和邮箱！

网友最新评论 (28)