最新消息:20181230 VPS服务器已从Linode换到腾讯云香港,主题仍用朋友推荐的大前端D8

【已解决】PySpider中如何下载mp4视频文件到本地

pyspider crifan 125浏览 0评论

折腾:

【已解决】使用PySpider去爬取某网站中的视频

期间,此处已经抓取到mp4视频地址了:

showVideoCallback: response.url=http://xx.xx?m=home&c=match_new&a=video&show_id=1xxx1

curShowInfoDict={‘act_id’: ‘3’, ‘child_type’: ‘1’, ‘course_id’: ‘xx’, ‘cover_img’: ‘https://xx.xx/2018-06-04/5b14e22b8850a.jpg’, ‘create_time’: ‘1512737780’, ‘head_img’: ‘https://xx.xx/2018-06-23/5b2de55693ad9.jpg’, ‘href’: ‘/index.php?m=home&c=match_new&a=video&show_id=1xxx1’, ‘id’: ‘489’, ‘match_type’: ‘2’, ‘name’: ‘唐xx’, ‘rewards’: ‘0’, ‘scores’: ‘63.20’, ‘shares’: ‘2’, ‘show_id’: ‘1xx1’, ‘show_score’: ‘0’, ‘status’: ‘1’, ‘supports’: ‘104’, ‘uid’: ‘5xx’}

title=学学xx

{‘name’: ‘唐xx’,

‘scores’: ‘63.20’,

‘shares’: ‘2’,

‘show_id’: ‘1xxx1’,

‘supports’: ‘104’,

‘title’: ‘学学xx’,

‘url’: ‘http://xx.xx?m=home&c=match_new&a=video&show_id=1xxx1’,

‘videoUrl’: ‘https://xx.xx/2017-12-08/id15xxxuxx.mp4’}

然后就需要去搞清楚:

PySpider中如何下载文件

目前能想到的是:

难道要用requests第三方库去直接下载和保存文件?

pyspider 下载文件

pyspider示例代码七:自动登陆并获得PDF文件下载地址 – microman – 博客园

pyspider – pysipder下载文件超时 – SegmentFault 思否

用的是:urllib2.urlopen去下载文件的

[python]使用pyspider下载meizitu的图片 – 简书

结果 – pyspider中文文档 – pyspider中文网

[python]使用pyspider下载meizitu的图片 – 简书

倒是可以考虑借用PySpider的自带response,把content保存到本地文件中去的。

常见问题 · pyspider文档 · 看云

pyspider—爬取下载图片 – silianpan – 博客园

【总结】

PySpider中用:

import re

import os

import codecs

import json

OutputFullPath = "/Users/crifan/dev/dev_root/xxx/output"

    def downloadVideoCallback(self, response):

        itemUrl = response.url

        print("downloadVideoCallback: itemUrl=%s,response=%s" % (itemUrl, response))

        if not os.path.exists(OutputFullPath):

            os.makedirs(OutputFullPath)

            print("Ok to create folder %s" % OutputFullPath)

        itemInfoDict = response.save

        filename = "%s-%s-%s" % (

            itemInfoDict["show_id"],

            itemInfoDict["name"],

            itemInfoDict["title"])

        print("filename=%s" % filename)

        jsonFilename = filename + ".json"

        videoSuffix = itemUrl.split(".")[-1]

        videoFileName = filename + "." + videoSuffix

        print("jsonFilename=%s,videoSuffix=%s,videoFileName=%s" % (jsonFilename, videoSuffix, videoFileName))

        # http://xx.xx/index.php?m=home&c=match_new&a=video&show_id=xxx4

        # "梁x"

        # "57.20"

        # "2"

        # "1xx34"

        # "94"

        # "跟xx心"

        # "http://xx.x.x/index.php?m=home&c=match_new&a=video&show_id=xx4"

        # "https://xx.x.x./2017-12-08/15126857208283977349.mp4"

        jsonFilePath = os.path.join(OutputFullPath, jsonFilename)

        print("jsonFilePath=%s" % jsonFilePath)

        self.saveJsonToFile(jsonFilePath, itemInfoDict)

        videoBinData = response.content

        videoFilePath = os.path.join(OutputFullPath, videoFileName)

        self.saveDataToFile(videoFilePath, videoBinData)

    def saveDataToFile(self, fullFilename, binaryData):

        with open(fullFilename, ‘wb’) as fp:

            fp.write(binaryData)

            fp.close()

            print("Complete save file %s" % fullFilename)

    def saveJsonToFile(self, fullFilename, jsonValue):

        with codecs.open(fullFilename, ‘w’, encoding="utf-8") as jsonFp:

            json.dump(jsonValue, jsonFp, indent=2, ensure_ascii=False)

            print("Complete save json %s" % fullFilename)

实现了保存文件到对应目录:

转载请注明:在路上 » 【已解决】PySpider中如何下载mp4视频文件到本地

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
61 queries in 0.095 seconds, using 9.95MB memory