【已解决】PySpider中如何下载mp4视频文件到本地

折腾：

期间，此处已经抓取到mp4视频地址了：

showVideoCallback: response.url=http://xx.xx?m=home&c=match_new&a=video&show_id=1xxx1

curShowInfoDict={‘act_id’: ‘3’, ‘child_type’: ‘1’, ‘course_id’: ‘xx’, ‘cover_img’: ‘https://xx.xx/2018-06-04/5b14e22b8850a.jpg’, ‘create_time’: ‘1512737780’, ‘head_img’: ‘https://xx.xx/2018-06-23/5b2de55693ad9.jpg’, ‘href’: ‘/index.php?m=home&c=match_new&a=video&show_id=1xxx1’, ‘id’: ‘489’, ‘match_type’: ‘2’, ‘name’: ‘唐xx’, ‘rewards’: ‘0’, ‘scores’: ‘63.20’, ‘shares’: ‘2’, ‘show_id’: ‘1xx1’, ‘show_score’: ‘0’, ‘status’: ‘1’, ‘supports’: ‘104’, ‘uid’: ‘5xx’}

title=学学xx

{‘name’: ‘唐xx’,

‘scores’: ‘63.20’,

‘shares’: ‘2’,

‘show_id’: ‘1xxx1’,

‘supports’: ‘104’,

‘title’: ‘学学xx’,

‘url’: ‘http://xx.xx?m=home&c=match_new&a=video&show_id=1xxx1’,

‘videoUrl’: ‘https://xx.xx/2017-12-08/id15xxxuxx.mp4’}

然后就需要去搞清楚：

PySpider中如何下载文件

目前能想到的是：

难道要用requests第三方库去直接下载和保存文件？

pyspider 下载文件

pyspider示例代码七：自动登陆并获得PDF文件下载地址 – microman – 博客园

pyspider – pysipder下载文件超时 – SegmentFault 思否

用的是：urllib2.urlopen去下载文件的

[python]使用pyspider下载meizitu的图片 – 简书

结果 – pyspider中文文档 – pyspider中文网

[python]使用pyspider下载meizitu的图片 – 简书

倒是可以考虑借用PySpider的自带response，把content保存到本地文件中去的。

常见问题 · pyspider文档 · 看云

pyspider—爬取下载图片 – silianpan – 博客园

【总结】

PySpider中用：

import re
import os
import codecs
import json
OutputFullPath = "/Users/crifan/dev/dev_root/xxx/output"
    def downloadVideoCallback(self, response):
        itemUrl = response.url
        print("downloadVideoCallback: itemUrl=%s,response=%s" % (itemUrl, response))
        if not os.path.exists(OutputFullPath):
            os.makedirs(OutputFullPath)
            print("Ok to create folder %s" % OutputFullPath)
        itemInfoDict = response.save
        filename = "%s-%s-%s" % (
            itemInfoDict["show_id"],
            itemInfoDict["name"],
            itemInfoDict["title"])
        print("filename=%s" % filename)
        jsonFilename = filename + ".json"
        videoSuffix = itemUrl.split(".")[-1]
        videoFileName = filename + "." + videoSuffix
        print("jsonFilename=%s,videoSuffix=%s,videoFileName=%s" % (jsonFilename, videoSuffix, videoFileName))
        # http://xx.xx/index.php?m=home&c=match_new&a=video&show_id=xxx4
        # "梁x"
        # "57.20"
        # "2"
        # "1xx34"
        # "94"
        # "跟xx心"
        # "http://xx.x.x/index.php?m=home&c=match_new&a=video&show_id=xx4"
        # "https://xx.x.x./2017-12-08/15126857208283977349.mp4"
        jsonFilePath = os.path.join(OutputFullPath, jsonFilename)
        print("jsonFilePath=%s" % jsonFilePath)
        self.saveJsonToFile(jsonFilePath, itemInfoDict)
        videoBinData = response.content
        videoFilePath = os.path.join(OutputFullPath, videoFileName)
        self.saveDataToFile(videoFilePath, videoBinData)
    def saveDataToFile(self, fullFilename, binaryData):
        with open(fullFilename, ‘wb’) as fp:
            fp.write(binaryData)
            fp.close()
            print("Complete save file %s" % fullFilename)
    def saveJsonToFile(self, fullFilename, jsonValue):
        with codecs.open(fullFilename, ‘w’, encoding="utf-8") as jsonFp:
            json.dump(jsonValue, jsonFp, indent=2, ensure_ascii=False)
            print("Complete save json %s" % fullFilename)

实现了保存文件到对应目录：

【后记20190411】

后来整理出下载二进制文件的函数：

def saveDataToFile(fullFilename, binaryData):
    """save binary data info file"""
    with open(fullFilename, 'wb') as fp:
        fp.write(binaryData)
        fp.close()
        print("Complete save file %s" % fullFilename)

class Handler(BaseHandler):

    def downloadFileCallback(self, response):
        fileInfo = response.save
        print("fileInfo=%s" % fileInfo)

        binData = response.content
        fileFullPath = os.path.join(fileInfo["saveFolder"], fileInfo["filename"])
        print("fileFullPath=%s" % fileFullPath)
        saveDataToFile(fileFullPath, binData)

    def downloadFile(self, fileInfo):
        urlToDownload = fileInfo["fileUrl"]
        print("urlToDownload=%s" % urlToDownload)
        self.crawl(urlToDownload,
            callback=self.downloadFileCallback,
            save=fileInfo)

调用举例：

      # download audio file
      #   "path": "Audio/1808/20180911222516379.mp3",
      audioFileUrlTail = singleAudioDict["path"]
      print("audioFileUrlTail=%s" % audioFileUrlTail)
      if audioFileUrlTail:
          audioFileInfo = {
              "fileUrl": gResourcesRoot + "/" + audioFileUrlTail,
              "filename": ("Aduios_%s_" % audioId) + audioFileUrlTail.replace("/", "_"),
              "saveFolder": curSingleAudioFolder,
          }
          self.downloadFile(audioFileInfo)

转载请注明：在路上 » 【已解决】PySpider中如何下载mp4视频文件到本地

Post Views: 2,446

与本文相关的文章