最新消息:VPS服务器又从fzhost.net换回Linode了,主题仍用朋友推荐的大前端D8

【已解决】想要通过分析网易博客的html源码,以得知网易是如何获得一个帖子的评论的

Crawl_EmulateLogin crifan 456浏览 0评论

【声明】

本帖只是出于技术研究目的,分享而已。

其他参考此法的非法用途,于本帖和本人无关。

 

【目的】

想要通过分析网易博客的html源码,然后得到对应的是通过什么url去获得一个帖子的评论的。

这样的分析,目的是问了之后通过python去抓起博客的帖子及其评论的。

这类的分析,网上相关资源比较少。写在此,给后来人一些参考。

【分析过程】

1.以这个网易博客帖子为例:

http://againinput4.blog.163.com/blog/static/172799491201010159650483/

点击右键,用notepad++打开源码,通篇大概浏览了一下,发现html源码中,没有直接的帖子评论信息。

也看到别人在163-blog-mover.py中提到,网易是通过AJAX去获得帖子评论信息的。

看起来和baidu空间类似,都是通过后期调用某个url,去获得对应的帖子的评论的。

所以,就是要去分析,其中所用的javascript中的源码,是哪里去发送这个url请求的,以及url地址是多少。

2. 此处,可以找到,有两个重要的javascript:

http://b1.bst.126.net/newpage/r/j/pc.js?v=6827128336

http://b1.bst.126.net/newpage/r/j/m/m-3/pm.js?v=1789902274

此处一个是pc.js,一个是pm.js。将其下载下来,开始一点点分析。

3.直接下载下来的js源码,看起来太乱。

之前自己不懂得利用网络资源,然后是自己一点点回车换行等手动去一点点格式化js源码,累死了,效率太低。

后来网上找到可以格式化javascript的工具:

http://www.gosu.pl/decoder/

将上述两个源码分别格式化后,再去分析,就容易看了。

4.虽然网易的这个js源码已经格式化了,但是由于其被压缩过了,所以里面的变量和函数,都是1,2个字母的,再加上函数很多很多,所以实在很难看懂原始的函数的功能,也就很难找到对应发送url地址的地方。

其中,注意到,我们所要分析的博客的帖子中的评论个数部分的字符:

评论(1)

所以,原先html源代码中,可以去找对应的”评论(‘,是可以找到对应的部分的。

但是也还是只能够找到变量_spaniCommentCount,然后对应的pm.js中,也是可以找到对应的赋值的:

                     dr.ub = function (cQ)
                     {
                         if (! !E.aq(‘$_spanCommentCount’)) {
                             E.aq(‘$_spanCommentCount’).innerText = cQ || this.bq.commentCount;
                         }
                         if (! !E.aq(‘$_spaniCommentCount’)) {
                             E.aq(‘$_spaniCommentCount’).innerText = cQ || this.bq.commentCount;
                         }
                     };

其值,是由cQ决定的,而cQ是当前函数ub的参数。

而ub又被N多地方调用到,所以实在很难找到对应的是哪里调用的。

5.后来去js源码中搜索comments,comment,cmt等,只搜索到相关的变量commentCount,mainCommentCount,还是没找到产生url的地方。

6.后来注意到了commentCount,mainCommentCount等变量,都是变量I的一个域值,即I.mainCommentCount,I.commentCount,所以想到,要找到底是谁赋值给I的,然后找到了:

ek.bR=function(bv,H)
               {
                   H=H||{};
                   this.I=H.data;
                   this.aS(bv,H);
               };

是H.data。

7.接着再去找,到底H是从哪里来的,最后没有找到有效的赋值的地方。总的说,压缩后的代码,还是很难看懂。

8.后来无意间,看到了这部分代码:

                 fK.dW = function (H, bS)
                 {
                     J.bi(location.dwr, ‘BlogBeanNew’, ‘getComments’, H.ckey, H.limit, H.offset, bS);
                 };
很明显,就是这里,产生了对应的调用,去获得对应的评论数据的。

但是去搜索dW,却还是没找到。只有dw,但是javascript是对大小写敏感的,所以dW和dw不是同一个变量。

9.注意到上面,包括’BlogBeanNew’,所以无意间去搜索

然后发现一个相关的帖子:

网易博客分析——你明白的_weixue108_新浪博客

http://blog.sina.com.cn/s/blog_40e4b5660100urlt.html

对应的,类似内容,这里也有:

http://blog.163.com/umaster@126/blog/static/140543847201159104910296/

但是上述帖子,都只是介绍的getBlogs去获得对应的blog帖子的,而我要的是获得帖子的评论的,虽然可以将“getBlogs”替换为’getComments’,但是其他参数,也还是不是自己要的。所以,还是要自己解决,找到对应的完整的url。

10.在此起见,随便折腾,也无意间折腾过IE9,然后无意间发现,IE9自带了工具:

Tools -> F12 Develop Tools,其中可以查看到对应的网页的html,css,javascript等源码,设置可以去debug,但是折腾了半天,打断点,执行等,除了发现html源码高亮显示的不错之外,也是没有太多发现。

但是最后,终于发现,好像也可以去点击network,然后打开某网页,抓取网页执行过程中所发送的url请求,也是可以找到对应的所需要的地址的:

http://api.blog.163.com/againinput4/dwr/call/plaincall/BlogBeanNew.getComments.dwr

但是将此地址输入到地址栏运行,结果是:

 //#DWR-REPLY
if (window.dwr) dwr.engine._remoteHandleBatchException({ name:’org.directwebremoting.extend.ServerException’, message:’The specified call count is not a number’ });
else if (window.parent.dwr) window.parent.dwr.engine._remoteHandleBatchException({ name:’org.directwebremoting.extend.ServerException’, message:’The specified call count is not a number’ });

这明显不是我要的关于帖子的评论的数据。

11.注意到网易博客分析——你明白的_weixue108_新浪博客 中提到了“谷歌开发人员工具”,所以就去看看其到底是啥东东,结果就找到了这里的介绍:

http://code.google.com/intl/zh-CN/chrome/devtools/docs/overview.html

https://support.google.com/chrome/bin/answer.py?hl=zh-Hans&topic=29302&answer=184054&rd=2

发现是chrome自带的,所以就去下载了一个chrome,然后去试试。

经过折腾,发现其也带url过滤分析功能:

工具->开发人员工具(Ctrl+Shift+I),然后点击Network后,再去在地址栏中输入你要分析的网页:

http://againinput4.blog.163.com/blog/static/172799491201010159650483/

然后就会帮你分析出来对应的各种url地址,其中也是有:

BlogBeanNew.getComments.dwr

http://api.blog.163.com/againinput4/dwr/call/plaincall/

然后点击选中,就可以查看到对应的各种详细信息了,其中包括Headers:,Preview,Response,Cookies,Timing等。

而Headers中,就包括了我此处所需要的url参数:

    1. Request Payload
    2. callCount=1 scriptSessionId=${scriptSessionId}187 c0-scriptName=BlogBeanNew c0-methodName=getComments c0-id=0 c0-param0=string:fks_094067082083086070082083080095085081083068093095082074085 c0-param1=number:1 c0-param2=number:0

然后对应的Response中的数据是:

<div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: rgb(0,116,0);">//#DWR-INSERT</span></div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: rgb(0,116,0);">//#DWR-REPLY</span></div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">var</span> <span style="BOX-SIZING: border-box; COLOR: black;">s0</span>={};<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">var</span> <span style="BOX-SIZING: border-box; COLOR: black;">s1</span>=[];<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>[<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">'abstract'</span>]=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"不好用!"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"fks_094067082083086070082083080095085081083068093095082074085"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogPermalink</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"blog/static/172799491201010159650483"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogTitle</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"诺基亚/Nokia E5 手机拍照声音的相关解释和去除办法"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogUserId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">172799491</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogUserName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"againinput4"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">circleId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">0</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">circleName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">circleUrlName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">content</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"&lt;P&gt;不好用!&lt;/P&gt;"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">id</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"fks_081068080082082064085095081095085081083068093095082074085"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">ip</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"210.32.143.7"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">ipName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"浙江 杭州"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">lastUpdateTime</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">1321254579933</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">mainComId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"-1"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">moveFrom</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">popup</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">false</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publishTime</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">1321254579950</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publishTimeStr</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"15:09:39"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherAvatar</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">0</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherAvatarUrl</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"http://img.bimg.126.net/photo/hmZoNQaqzZALvVp0rE7faA==/0.jpg"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherEmail</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">""</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">197436315</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"ch_yuan"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherNickname</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"coraline"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherUrl</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">replyComId</span>="-1";s0.replyToUserId=172799491;s0.replyToUserName="againinput4";s0.replyToUserNick="crifan";s0.shortPublishDateStr="2011-11-14";s0.spam=0;s0.subComments=s1;s0.synchMiniBlog=false;s0.valid=0;</div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><br style="BOX-SIZING: border-box;" /></div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: black;">dwr</span>.<span style="BOX-SIZING: border-box; COLOR: black;">engine</span>.<span style="BOX-SIZING: border-box; COLOR: black;">_remoteHandleCallback</span>(<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">'728048'</span>,<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">'0'</span>,[<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>]);</div><p><br /> </p>
<font size="2">对于其中的u开头的数据,刚开始没注意不知道是什么,然后去用python中解析了下:</font>
<p>&gt;&gt;&gt; print u"不好用!".encode('gb18030')<br />不好用! </p>
发现就是我此处所苦苦寻找的,当前博客的评论数据信息。

而对于上面的request payload请求数据来说,需要参考

网易博客分析——你明白的_weixue108_新浪博客

http://blog.sina.com.cn/s/blog_40e4b5660100urlt.html

中介绍的,自己拼接起来,得到我此处最终需要的url。

而对于这个url所返回的评论信息,就是上面中s0[‘abstract’]中的unicode数据,以及其他相关数据。

 至此,一切都明了了。Over。

 

【总结:网易博客中,是如何产生当前博客帖子的评论数据的】

为了获得当前163博客帖子的评论数据,是在html源码中的js源码:

http://b1.bst.126.net/newpage/r/j/m/m-3/pm.js?v=1789902274

中调用了

J.bi(location.dwr, ‘BlogBeanNew’, ‘getComments’, H.ckey, H.limit, H.offset, bS);

产生相关url请求是:

url是:

http://api.blog.163.com/againinput4/dwr/call/plaincall/BlogBeanNew.getComments.dwr

参数是:

callCount=1
scriptSessionId=${scriptSessionId}187
c0-scriptName=BlogBeanNew
c0-methodName=getComments
c0-id=0
c0-param0=string:fks_094067082083086070082083080095085081083068093095082074085
c0-param1=number:1
c0-param2=number:0
batchId=728048

组合出来就是:

http://api.blog.163.com/againinput4/dwr/call/plaincall/BlogBeanNew.getComments.dwr?&callCount=1&scriptSessionId=${scriptSessionId}187&c0-scriptName=BlogBeanNew&c0-methodName=getComments&c0-id=0&c0-param0=string:fks_094067082083086070082083080095085081083068093095082074085&c0-param1=number:1&c0-param2=number:0&batchId=728048

打开此url后,返回数据为:

<div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: rgb(0,116,0);">//#DWR-INSERT</span></div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: rgb(0,116,0);">//#DWR-REPLY</span></div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">var</span> <span style="BOX-SIZING: border-box; COLOR: black;">s0</span>={};<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">var</span> <span style="BOX-SIZING: border-box; COLOR: black;">s1</span>=[];<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>[<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">'abstract'</span>]=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"不好用!"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"fks_094067082083086070082083080095085081083068093095082074085"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogPermalink</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"blog/static/172799491201010159650483"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogTitle</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"诺基亚/Nokia E5 手机拍照声音的相关解释和去除办法"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogUserId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">172799491</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">blogUserName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"againinput4"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">circleId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">0</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">circleName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">circleUrlName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">content</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"&lt;P&gt;不好用!&lt;/P&gt;"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">id</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"fks_081068080082082064085095081095085081083068093095082074085"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">ip</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"210.32.143.7"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">ipName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"浙江 杭州"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">lastUpdateTime</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">1321254579933</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">mainComId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"-1"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">moveFrom</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">popup</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">false</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publishTime</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">1321254579950</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publishTimeStr</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"15:09:39"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherAvatar</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">0</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherAvatarUrl</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"http://img.bimg.126.net/photo/hmZoNQaqzZALvVp0rE7faA==/0.jpg"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherEmail</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">""</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherId</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(28,0,207);">197436315</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherName</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"ch_yuan"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherNickname</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">"coraline"</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">publisherUrl</span>=<span style="BOX-SIZING: border-box; COLOR: rgb(170,13,145);">null</span>;<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>.<span style="BOX-SIZING: border-box; COLOR: black;">replyComId</span>="-1";s0.replyToUserId=172799491;s0.replyToUserName="againinput4";s0.replyToUserNick="crifan";s0.shortPublishDateStr="2011-11-14";s0.spam=0;s0.subComments=s1;s0.synchMiniBlog=false;s0.valid=0;</div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><br style="BOX-SIZING: border-box;" /></div><div style="BOX-SIZING: border-box; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; PADDING-LEFT: 2px; LETTER-SPACING: normal; FONT: 12px/14px Consolas, 'Lucida Console', monospace; WHITE-SPACE: pre; ORPHANS: 2; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px;" class="webkit-line-content"><span style="BOX-SIZING: border-box; COLOR: black;">dwr</span>.<span style="BOX-SIZING: border-box; COLOR: black;">engine</span>.<span style="BOX-SIZING: border-box; COLOR: black;">_remoteHandleCallback</span>(<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">'728048'</span>,<span style="BOX-SIZING: border-box; COLOR: rgb(196,26,22);">'0'</span>,[<span style="BOX-SIZING: border-box; COLOR: black;">s0</span>]);<br /> </div>
<font size="2">其中评论数据是s0['abstract']的unicode编码的字符。</font>

 

声明:网易中会自动把uXXXX的unicode字符原始数据,转换为字符显示出来。所以,u开头的4E0D,597D,7528,FF01,会被自动翻译为“不可用!”这几个中文字符。特此说明一下,以免其他看此贴的人迷惑了。

 

【经验总结】

1。凡是有源码,理论上来说,没有解决不了的问题。

2。有源码的基础上,再加上脑子和思路要足够清晰,思路也要广泛,然后再加上对应的基本知识和毅力,最后,总会搞定问题的。

3。关于背景知识和基础方面,此处能分析出结果,也有一定的背景知识因素在里面,因为之前分析过百度空间的帖子是如何获得对应的评论的,所以,才有基础来分析这个更复杂的163的博客中,是如何获得评论数据的。

转载请注明:在路上 » 【已解决】想要通过分析网易博客的html源码,以得知网易是如何获得一个帖子的评论的

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址

网友最新评论 (1)

  1. 如何翻页呢,源代码中只有batchid不同是否通过它来翻页呢, 如果是的话,它并不规则怎么提取呢,
    jeff2年前 (2016-08-01)回复
17 queries in 0.295 seconds, using 10.44MB memory