【记录】给BlogsToWordPress添加支持导出网易的心情随笔

【背景】

之前的

BlogsToWordpress

不支持网易的心情随笔。

现在去添加此功能。

【解决过程】

1.结果使用：

BlogsToWordpress.py -s http://blog.163.com/ni_chen

竟然结果连第一个帖子地址都找不到了。

2.所以去用Firebug调试网易博客，发现，原先的获得对应的帖子信息的访问，从之前的GET变成现在的POST了。

所以，把旧的GET的代码，

getBlogUrl = genGetBlogsUrl(userId, startBlogIdx, onceGetNum);
logging.info("getBlogUrl=%s", getBlogUrl);
# get blogs
blogsResp = crifanLib.getUrlRespHtml(getBlogUrl);
  
  
#------------------------------------------------------------------------------
# generate get blogs URL
def genGetBlogsUrl(userId, startBlogIdx, onceGetNum):
    getBlogsUrl = '';
  
    try :
        # http://api.blog.163.com/againinput4/dwr/call/plaincall/BlogBeanNew.getBlogs.dwr
        # callCount=1
        # scriptSessionId=${scriptSessionId}187
        # c0-scriptName=BlogBeanNew
        # c0-methodName=getBlogs
        # c0-id=0
        # c0-param0=number:172799491
        # c0-param1=number:0
        # c0-param2=number:20
        # batchId=955290
  
        paraDict = {
            'callCount'     :   '1',
            'scriptSessionId':  '${scriptSessionId}187',
            'c0-scriptName' :   'BlogBeanNew',
            'c0-methodName' :   'getBlogs',
            'c0-id'         :   '0',
            'c0-param0'     :   '',
            'c0-param1'     :   '',
            'c0-param2'     :   '',
            'batchId'       :   '1',
        };
        paraDict['c0-param0'] = "number:" + str(userId);
        paraDict['c0-param1'] = "number:" + str(startBlogIdx);
        paraDict['c0-param2'] = "number:" + str(onceGetNum);
          
        mainUrl = gConst['blogApi163'] + '/' + gVal['blogUser'] + '/' + 'dwr/call/plaincall/BlogBeanNew.getBlogs.dwr';
        getBlogsUrl = crifanLib.genFullUrl(mainUrl, paraDict);
  
        logging.debug("Generated get blogs url %s", getBlogsUrl);
    except :
        logging.debug("Can not generate get blog url.");
  
    return getBlogsUrl;

希望改为最新的，POST的代码。

3。调试期间发现：

其中的post data，还不是普通的以&分隔的，而是以换行分隔的：

callCount=1
scriptSessionId=${scriptSessionId}187
c0-scriptName=BlogBeanNew
c0-methodName=getBlogs
c0-id=0
c0-param0=number:186541395
c0-param1=number:0
c0-param2=number:1
batchId=123756

所以，还是比较特殊的。

结果折腾期间，还出错了：

LINE 959 : INFO     getBlogsDwrMainUrl=http://api.blog.163.com/ni_chen/dwr/call/plaincall/BlogBeanNew.getBlogs.dwr
LINE 971 : INFO     postData=c0-id=0
batchId=1
c0-param1=number:0
scriptSessionId=${scriptSessionId}187
c0-param2=number:400
c0-methodName=getBlogs
c0-param0=number:186541395
c0-scriptName=BlogBeanNew
callCount=1
LINE 973 : INFO     req=<urllib2.Request instance at 0x0000000002F45C88>
LINE 1011 : INFO     resp=<addinfourl at 49572616L whose fp = <socket._fileobject object at 0x0000000002F2B408>>
LINE 1016 : INFO     gVal[‘cj’]=<_LWPCookieJar.LWPCookieJar[<Cookie USERTRACK=221.224.111.74.1366017130803085 for .163.com/>, <Cookie NTESBLOGSI=C625520A2DCC43BB15C722EA6984CD64.app-64-8010 for .blog.163.com/>]>
LINE 1013 : INFO     blogsDwrRespHtml=//#DWR-REPLY
if (window.dwr) dwr.engine._remoteHandleBatchException({ name:’org.directwebremoting.extend.ServerException’, message:’The specified call count is not a number’ });
else if (window.parent.dwr) window.parent.dwr.engine._remoteHandleBatchException({ name:’org.directwebremoting.extend.ServerException’, message:’The specified call count is not a number’ });

参考：

httpclient 调用DWR应用时发生The specified call count is not a number 错误

去把默认的：

1	`req.add_header('Content-Type',` `"application/x-www-form-urlencoded");`

改为：

1	`req.add_header('Content-Type',` `"text/plain");`

才最终获得对应的DWR返回的内容：

//#DWR-INSERT

//#DWR-REPLY

var s0={};var s1={};var s2={};var s3=[];s0.abstractSysGen=1;s0.accessCount=20;s0.allowComment=-100;s0.allowView=-100;s0.blogAbstract="<div><br></div><div><div style=\"line-height: 22px;\" ><font size=\"4\" style=\"line-height: 28px;\" ><br style=\"line-height: 28px;\" ></font></div><div style=\"line-height: 22px;\" ><font size=\"4\" style=\"line-height: 28px;\" ><br style=\"line-height: 28px;\" ></font></div></div><font size=\"4\" >\u535A\u5BA2\u5DF2\u7ECF\u5168\u90E8\u642C\u8D70\u4E86\uFF0C\u8BF7\u79FB\u9A7E\u81F3\uFF1A</font><a target=\"_blank\" rel=\"nofollow\" href=\"http://nichen.info/blogs\" >http://nichen.info</a><br><div><font size=\"4\" ><br></font></div><div><font size=\"4\" ><br></font></div>";s0.blogAttachments=null;s0.blogCount=s1;s0.blogExt=s2;s0.circleCount=0;s0.circleIdList=s3;s0.circleIds=null;s0.classId="fks_084064092094082071087087087095085094087070080087085074081";s0.className="\u968F\u7B14";s0.commentCount=0;s0.comments=null;s0.content="<div><br></div><div><div style=\"line-height: 22px;\" ><font size=\"4\" style=\"line-height: 28px;\" ><br style=\"line-height: 28px;\" ></font></div><div style=\"line-height: 22px;\" ><font size=\"4\" style=\"line-height: 28px;\" ><br style=\"line-height: 28px;\" ></font></div></div><font size=\"4\" >\u535A\u5BA2\u5DF2\u7ECF\u5168\u90E8\u642C\u8D70\u4E86\uFF0C\u8BF7\u79FB\u9A7E\u81F3\uFF1A</font><a target=\"_blank\" rel=\"nofollow\" href=\"http://nichen.info/blogs\" >http://nichen.info</a><br><div><font size=\"4\" ><br></font></div><div><font size=\"4\" ><br></font></div>";s0.contentPlainText=null;s0.id="fks_087065081087083065081085085071072087089069081082087064093083";s0.ip="147.46.115.126";s0.isBlogAbstractComplete=false;s0.isPublished=1;s0.keyName="ID";s0.keyWordCheckedState=0;s0.lastAccessCountUpdateTime=1365142180844;s0.matchedKeyWord=false;s0.modifyTime=1365142180643;s0.moveFrom="NONE";s0.permaSerial="18654139520132782258253";s0.permalink="blog/static/18654139520132782258253";s0.photoIds=null;s0.photoStoreTypes=null;s0.publishTime=1362615778253;s0.publishTimeStr="8:22:58";s0.publisherId=0;s0.publisherNickname=null;s0.publisherUsername=null;s0.rank=5;s0.recomBlogHome=false;s0.ref=false;s0.shortPublishDateStr="2013-3-7";s0.synchMiniBlog=-1;s0.tag="";s0.title="\u642C\u8D70\u5566";s0.trackbackCount=0;s0.trackbackUrl="blog/18654139520132782258253.track";s0.userId=186541395;s0.userName="ni_chen";s0.userNickname="Neysa";s0.valid=0;s0.zipContent=null;

s1.accessCount=20;s1.blogId=1251225334;s1.commentCount=0;s1.mainCommentCount=0;s1.permaSerial="18654139520132782258253";s1.recommendCount=0;s1.trackbackCount=0;s1.userId=186541395;

s2.blogId=1251225334;s2.doubanResourceInfo=null;s2.miniBlogCard=0;s2.userId=186541395;s2.voteId=0;

dwr.engine._remoteHandleCallback(‘1′,’0’,[s0]);

接下来，就可以正常的去解析此内容了。

4.最后，终于可以使用POST的代码，获得对应的内容了：

blogsDwrRespHtml = getBlogsDwrRespHtml(userId, startBlogIdx, onceGetNum);
logging.debug("blogsDwrRespHtml=%s", blogsDwrRespHtml);
  
def getBlogsDwrRespHtml(userId, startBlogIdx, onceGetNum):
    # getBlogUrl = genGetBlogsUrl(userId, startBlogIdx, onceGetNum);
    # logging.info("getBlogUrl=%s", getBlogUrl);
    # # get blogs
    # blogsRespHtml = crifanLib.getUrlRespHtml(getBlogUrl);
      
    #change GET to POST
  
    #http://api.blog.163.com/ni_chen/dwr/call/plaincall/BlogBeanNew.getBlogs.dwr
    #callCount=1
    #scriptSessionId=${scriptSessionId}187
    #c0-scriptName=BlogBeanNew
    #c0-methodName=getBlogs
    #c0-id=0
    #c0-param0=number:186541395
    #c0-param1=number:0
    #c0-param2=number:1
    #batchId=494302
      
    # http://api.blog.163.com/againinput4/dwr/call/plaincall/BlogBeanNew.getBlogs.dwr
    # callCount=1
    # scriptSessionId=${scriptSessionId}187
    # c0-scriptName=BlogBeanNew
    # c0-methodName=getBlogs
    # c0-id=0
    # c0-param0=number:172799491
    # c0-param1=number:0
    # c0-param2=number:20
    # batchId=955290
  
    postDict = {
        'callCount'     :   '1',
        'scriptSessionId':  '${scriptSessionId}187',
        'c0-scriptName' :   'BlogBeanNew',
        'c0-methodName' :   'getBlogs',
        'c0-id'         :   '0',
        'c0-param0'     :   '',
        'c0-param1'     :   '',
        'c0-param2'     :   '',
        'batchId'       :   '1',
    };
    postDict['c0-param0'] = "number:" + str(userId);
    postDict['c0-param1'] = "number:" + str(startBlogIdx);
    postDict['c0-param2'] = "number:" + str(onceGetNum);
    #http://api.blog.163.com/ni_chen/dwr/call/plaincall/BlogBeanNew.getBlogs.dwr
    getBlogsDwrMainUrl = gConst['blogApi163'] + '/' + gVal['blogUser'] + '/' + 'dwr/call/plaincall/BlogBeanNew.getBlogs.dwr';
    logging.debug("getBlogsDwrMainUrl=%s", getBlogsDwrMainUrl);
          
    #Referer    http://api.blog.163.com/crossdomain.html?t=20100205
    headerDict = {
        'Referer'       :   'http://api.blog.163.com/crossdomain.html?t=20100205',
        'Content-Type'  :   "text/plain",
    };
    blogsRespHtml = crifanLib.getUrlRespHtml(getBlogsDwrMainUrl, postDict=postDict, headerDict=headerDict, postDataDelimiter='\r\n');
  
    logging.debug("blogsRespHtml=%s", blogsRespHtml);
          
    return blogsRespHtml;

5.等待后续再去添加网易的心情随笔。

【后记 2013-09-22】

1.后来，实现了对应的抓取心情随笔：

【教程】以抓取网易博客帖子中的最近读者信息为例，手把手教你如何抓取动态网页中的内容

以及：

【记录】用Python解析网易163博客的心情随笔FeelingCard返回的DWR-REPLY数据

转载请注明：在路上 » 【记录】给BlogsToWordPress添加支持导出网易的心情随笔

Post Views: 1,914

与本文相关的文章

订阅在路上