最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】汽车之家车型车系数据:解决部分车型基本参数字段个数不一致问题

个数 crifan 180浏览 0评论
对于:
【已解决】汽车之家车型车系数据:优化去掉js加速抓取车型参数配置
去调试,结果运行报错
list index out of range
mIndex=16
in getItemFirstValue
很明显是:
列表index超出最大个数了。
去找到出错情况的页面去调试
找到原因了
对于:
之前汽油车
【汉兰达 2018款 2.0T 两驱精英版 5座 国VI参数配置表】价格单_丰田_汽车之家
有的上面字段
对于此处的:
【九龙A6 2018款 2.0T豪华型参数配置表】价格单_九龙_汽车之家
却没有:
所以:
需要想办法去找到两者区别
什么时候才会有上面字段
什么时候才会没有
难道是:
https://car.autohome.com.cn/config/spec/1006466.html
是:
车身结构:客车
没有这些 普通的车,比如
车身结构:5门5座SUV
才有的字段?因为看起来都是 加速之类的
普通客车不看中这些参数,所以没有?
所以就要把之前以为的 汽油车固定的字段:
            if carEnergyType == "汽油":
                # https://car.autohome.com.cn/config/spec/43593.html
                # https://car.autohome.com.cn/config/spec/41572.html

                # self.processGasolineCar(valueContent, carModelDict)

                # https://car.autohome.com.cn/config/spec/1006466.html

                gasolineCarKeyIdxMapDict = {
                    "carModelEnvStandard" : 3,
                    "carModelReleaseTime" : 4,
                    "carModelMaxPower" : 5,
                    "carModelMaxTorque" : 6,
                    "carModelEngine" : 7,
                    "carModelGearBox" : 8,
                    "carModelSize" : 9,
                    "carModelBodyStructure" : 10,
                    "carModelMaxSpeed" : 11,
                    "carModelOfficialSpeedupTime" : 12,
                    "carModelActualTestSpeedupTime" : 13,
                    "carModelActualTestBrakeDistance" : 14,
                    "carModelMiitCompositeFuelConsumption" : 15,
                    "carModelActualFuelConsumption" : 16,
                }
                wholeWarrantyIdx = 17
需要去想办法,动态调整了。
当然知道,最完美的情况是:直接匹配字段名字
但是由于此处字段是js和css特殊处理的
所以没法直接匹配
好像突然发现了一个细节:
好像id是一致的
纯电动:
debug/奥迪Q2L_etron_纯电智享型_39893_notRunJs_config.json
{
        "id": 1186,
        "name": "<span class='hs_kw8_configpl'></span><span class='hs_kw2_configpl'></span>(N·m)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "290"
        }, {
          "specid": 42875,
          "value": "290"
        }]
      }, 
。。。
{
        "id": 1246,
        "name": "最高车速(km/h)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "150"
        }, {
          "specid": 42875,
          "value": "150"
        }]
      }
。。。
      }, {
        "id": 1255,
        "name": "整车<span class='hs_kw36_configpl'></span>",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
        }, {
          "specid": 42875,
          "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
        }]
      }]
和:
汽油车
debug/九龙A6_1006466.json
{
  "id": 1186,
  "name": "<span class='hs_kw14_configrI'></span><span class='hs_kw4_configrI'></span>(N·m)",
  "pnid": "1_-1",
  "valueitems": [{
    "specid": 1006466,
    "value": "290"
  }, {
    "specid": 1006465,
    "value": "260"
  }, {
    "specid": 1006467,
    "value": "330"
  }]
},
。。。
}, {
  "id": 1246,
  "name": "最高车速(km/h)",
  "pnid": "1_-1",
  "valueitems": [{
    "specid": 1006466,
    "value": "-"
  }, {
    "specid": 1006465,
    "value": "-"
  }, {
    "specid": 1006467,
    "value": "-"
  }]
}, {
  "id": 1251,
  "name": "工信部<span class='hs_kw41_configrI'></span><span class='hs_kw35_configrI'></span>(L/100km)",
  "pnid": "1_-1",
  "valueitems": [{
    "specid": 1006466,
    "value": "-"
  }, {
    "specid": 1006465,
    "value": "-"
  }, {
    "specid": 1006467,
    "value": "-"
  }]
}, {
  "id": 1255,
  "name": "整车<span class='hs_kw48_configrI'></span>",
  "pnid": "1_-1",
  "valueitems": [{
    "specid": 1006466,
    "value": "-"
  }, {
    "specid": 1006465,
    "value": "-"
  }, {
    "specid": 1006467,
    "value": "-"
  }]
}]
-》
  • 1186:最大扭矩(N·m)
  • 1251:工信部综合油耗:(L/100km)
等等
-》所以可以直接通过字段中的id去匹配要的字段即可
从逻辑上更加简单了。
而无需判断是哪种燃油类型了
不过貌似有些字段的id是0
所以需要去搞清楚
https://car.autohome.com.cn/config/spec/41572.html
去整理对应id和字段关系
  "id": 1149,
  "name": "能源类型",
等等
发现有id是0的:
  "id": 0,
  "name": "上市<span class='hs_kw51_configvR'></span>",
待后续找规律

目前对于
https://car.autohome.com.cn/config/spec/41572.html
来说,只有
  "id": 0,
  "name": "上市<span class='hs_kw51_configvR'></span>", # 上市时间
的id是0,其他基本参数的字段的id都不是0,就好办了。
再去多找几个看看是否同样规律
汽油车
不过,突然发现,之前调试的html中已有完全id的字段定义:
debug/奥迪A3_configSpec_43593.html
        var keyLink = [{
            "id": 1339,
            "link": "https://car.autohome.com.cn/baike/detail_8_26_1339.html",
            "name": "<span class='hs_kw27_baikefn'></span><span class='hs_kw85_baikefn'></span>/<span class='hs_kw9_baikefn'></span>"
        }, {
            "id": 1340,
            "link": "https://car.autohome.com.cn/baike/detail_8_27_1340.html",
            "name": "尾门玻璃<span class='hs_kw63_baikefn'></span>开启"
        }, {
            "id": 1341,
            "link": "https://car.autohome.com.cn/baike/detail_8_30_1341.html",
            "name": "<span class='hs_kw76_baikefn'></span>数量"
        }, {
            "id": 1342,
            "link": "https://car.autohome.com.cn/baike/detail_8_31_1342.html",
            "name": "<span class='hs_kw12_baikefn'></span>大灯雨雾模式"
        }, {
            "id": 1343,
            "link": "https://car.autohome.com.cn/baike/detail_8_30_1343.html",
            "name": "车载CD/DVD"
        }, {
。。。
        }, {
            "id": 1234,
            "link": "https://car.autohome.com.cn/baike/detail_7_21_1234.html",
            "name": "<span class='hs_kw12_baikefn'></span>电动机<span class='hs_kw7_baikefn'></span><span class='hs_kw35_baikefn'></span>(kW)"
        }, {
            "id": 1242,
            "link": "https://car.autohome.com.cn/baike/detail_8_31_1242.html",
            "name": "车<span class='hs_kw12_baikefn'></span>雾灯"
        }, {
            "id": 1245,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1245.html",
            "name": "变速箱"
        }, {
            "id": 1246,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1246.html",
            "name": "最高车速(km/h)"
        }, {
            "id": 1250,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1250.html",
            "name": "官方0-100km/h加速(s)"
        }, {
            "id": 1251,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1251.html",
            "name": "工信部<span class='hs_kw22_baikefn'></span><span class='hs_kw17_baikefn'></span>(L/100km)"
        }, {
            "id": 1252,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1252.html",
            "name": "<span class='hs_kw68_baikefn'></span>0-100km/h加速(s)"
        }, {
            "id": 1253,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1253.html",
            "name": "<span class='hs_kw68_baikefn'></span>100-0km/h制动(m)"
        }, {
            "id": 1254,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1254.html",
            "name": "<span class='hs_kw68_baikefn'></span><span class='hs_kw17_baikefn'></span>(L/100km)"
        }, {
            "id": 1255,
            "link": "https://car.autohome.com.cn/baike/detail_7_18_1255.html",
            "name": "整车<span class='hs_kw77_baikefn'></span>"
        }, {
            "id": 1256,
            "link": "https://car.autohome.com.cn/baike/detail_7_19_1256.html",
            "name": "<span class='hs_kw68_baikefn'></span><span class='hs_kw2_baikefn'></span>(mm)"
        }, {
。。。
        }, {
            "id": 1290,
            "link": "https://car.autohome.com.cn/baike/detail_7_21_1290.html",
            "name": "百公里耗<span class='hs_kw56_baikefn'></span>(kWh/100km)"
        }, {
            "id": 1291,
            "link": "https://car.autohome.com.cn/baike/detail_7_21_1291.html",
            "name": "工信部纯电续航里程(km)"
        }, {
。。。
不过对于定义具体字段,用处没想的那么大
还是需要事先研究清楚,定义好
搜了:
"id": 0,
id是0的,并不多,只有9个左右。
其他几十个,都是有id的。
目前上面的需要的内容中,特殊的
上市时间
目前id是0
这部分值是
                    }, {
                        "id": 0,
                        "name": "上市<span class='hs_kw61_configHa'></span>",
                        "pnid": "1_-1",
                        "valueitems": [{
                            "specid": 43593,
                            "value": "2020.04"
                        }, {
                            "specid": 42418,
                            "value": "2019.10"
                        }, {
。。。
-》可以通过
  • name符合 上市开头(或 时间结束)
    • 找了其他地方,没有 上市 开头的字段了
      • 不会重复,这个逻辑可用
  • value是 YYYY.MM 格式
去判断
所以,目前够用了。
再去补全 其他类型车的字段
但是补全了电动车字段后:
            # 电动车 参数
            {
                "id": 1291,
                "name": "工信部纯电续航里程(km)",
                "key": "carModelMiitEnduranceMileagePureElectric",
            }, {
                "id": 1292,
                # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "快充时间(小时)",
                "key": "carModelQuickCharge",
            }, {
                "id": 0,
                # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "慢充时间(小时)",
                "key": "carModelSlowCharge",
            }, {
                "id": 0,
                "name": "快充电量百分比",
                "key": "carModelQuickChargePercent",
            }, {
                "id": 0,
                "name": "电动机(Ps)",
                "key": "carModelHorsePowerElectric",
            }, {
                "id": 0,
                # "name": "<span class='hs_kw22_configpl'></span>续航里程(km)",
                "name": "实测续航里程(km)",
                "key": "carModelActualTestEnduranceMileage",
            }, {
                "id": 0,
                # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "实测快充时间(小时)",
                "key": "carModelActualTestQuickCharge",
            }, {
                "id": 0,
                # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "实测慢充时间(小时)",
                "key": "carModelActualTestSlowCharge",
            }
发现个问题:
有多个字段的id是0
且根据name 没法直接判断是哪个
尤其是:
                "id": 0,
                # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "慢充时间(小时)",

                "id": 0,
                # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "实测快充时间(小时)",

                "id": 0,
                # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "实测慢充时间(小时)",
都是充电时间,根本无法区分开
不过如果实在区分不开,对于后2个字段:
  • 实测快充时间(小时)
  • 实测慢充时间(小时)
就:不去抓取
因为也看到,除了:
https://car.autohome.com.cn/config/spec/42875.html
另外的 纯电动
【丰田C-HR EV参数配置表】_丰田_丰田C-HR EV配置_价格单_汽车之家
字段也都是空:
不过对于:
  • 慢充时间(小时)
都是有值的
所以最好还是去抓取的。
不过实在不行,可以去根据位置判断:
慢充时间
的前面一个 肯定是:
快充时间
-》
而 
快充时间
是有id的
            }, {
                "id": 1292,
                # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "快充时间(小时)",
                "key": "carModelQuickCharge",
            }, {
                "id": 0,
                # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "慢充时间(小时)",
                "key": "carModelSlowCharge",
            },
所以可以先找 快充时间的index,
再加1后,
且:
id=0
name 末尾是 (小时)
确定就是:
慢充时间
了。
突然想到:
对于:
  • 实测快充时间(小时)
  • 实测慢充时间(小时)
也可以根据位置去计算:
2个位置肯定是:
  • 实测续航里程(km)
后面的2个
https://car.autohome.com.cn/config/series/5270.html#pvareaid=3454437
所以也可以用位置去判断
即:
找到:
  • 实测续航里程(km)
的后面2个
(如果没超过list的index的话)
然后后面2个,都满足:
id=0
name末尾是 (小时)
则就可以确定分别是:
  • 实测快充时间(小时)
  • 实测慢充时间(小时)
至此,去写代码
目前已经用代码:
    @catch_status_code_error
    def carConfigSpecCallback(self, response):
        print("in carConfigSpecCallback")
        curCarModelDict = response.save
        print("curCarModelDict=%s" % curCarModelDict)
        carModelDict = copy.deepcopy(curCarModelDict)


        configSpecHtml = response.text
        # print("configSpecHtml=%s" % configSpecHtml)
        # print("")


        # # for debug
        # return


        # # config json item index - spec table html item index = 2
        # ItemIndexDiff = 2


        # isUseSpecTableHtml = True
        # isUseConfigJson = False
        # valueContent = None
        # energyTypeIdx = 2


        # # Method 1: after run js, extract item value from spec table html
        # """
        # <table class="tbcs" id="tab_0" style="width: 932px;">
        #     <tbody>
        #         <tr>
        #             <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5">
        #             <h3><span>基本参数</span></h3>
        #             </th>
        #         </tr>
        #         <tr data-pnid="1_-1" id="tr_0">
        # """
        # tbodyDoc = response.doc("table[id='tab_0'] tbody")
        # print("tbodyDoc=%s" % tbodyDoc)
        # valueContent = tbodyDoc
        # isUseSpecTableHtml = True
        # isUseConfigJson = False
        # energyTypeIdx = 2


        # Method 2: not run js, extract item value from config json
        # get value from config json
        # var config = {"message" ...... "returncode":"0","taskid":"8be676a3-e023-4fa9-826d-09cd42a1810c","time":"2020-08-27 20:56:17"};
        foundConfigJson = re.search("var\s*config\s*=\s*(?P<configJson>\{[^;]+\});", configSpecHtml)
        print("foundConfigJson=%s" % foundConfigJson)
        if foundConfigJson:
            configJson = foundConfigJson.group("configJson")
            print("configJson=%s" % configJson)
            # configDict = json.loads(configJson, encoding="utf-8")
            configDict = json.loads(configJson)
            print("configDict=%s" % configDict)


            # if "result" in configDict:
            configResultDict = configDict["result"]
            print("configResultDict=%s" % configResultDict)
            # if "paramtypeitems" in configResultDict:
            paramTypeItemDictList = configResultDict["paramtypeitems"]
            print("paramTypeItemDictList=%s" % paramTypeItemDictList)
            # paramTypeItemNum = len(paramTypeItemDictList)
            # print("paramTypeItemNum=%s" % paramTypeItemNum)
            basicParamDict = paramTypeItemDictList[0]
            print("basicParamDict=%s" % basicParamDict)
            basicItemDictList = basicParamDict["paramitems"]
            print("basicItemDictList=%s" % basicItemDictList)
            # print("type(basicItemDictList)=%s" % type(basicItemDictList))
            # basicItemNum = len(basicItemDictList)
            # print("basicItemNum=%s" % basicItemNum)


            # valueContent = basicItemDictList
            # isUseSpecTableHtml = False
            # isUseConfigJson = True


            # process each basic parameter
            basicItemDictLen = len(basicItemDictList)
            print("basicItemDictLen=%s" % basicItemDictLen)
            for curIdx, eachItemDict in enumerate(basicItemDictList):
                print("[%d] eachItemDict=%s" % (curIdx, eachItemDict))
                curItemId = eachItemDict["id"]
                print("curItemId=%s" % curItemId)
                curItemName = eachItemDict["name"]
                print("curItemName=%s" % curItemName)
                curItemFirstValue = self.extractValueItemsValue(eachItemDict)
                print("curItemFirstValue=%s" % curItemFirstValue)


                curIdNameKeyMapDict = None
                if curItemId != 0:
                    curIdNameKeyMapDict = self.findMappingDict(curItemId)
                else:
                    # id = 0
                    foundSpan = re.search("<span", curItemName)
                    print("foundSpan=%s" % foundSpan)
                    isSpecialName = bool(foundSpan)
                    print("isSpecialName=%s" % isSpecialName)
                    if isSpecialName:
                        # id=0 and contain '<span' special name
                        foundSuffixHour = re.search("</span>\(小时\)$", curItemName)
                        print("foundSuffixHour=%s" % foundSuffixHour)
                        isSpecialSuffixHour = bool(foundSuffixHour)
                        print("isSpecialSuffixHour=%s" % isSpecialSuffixHour)
                        if isSpecialSuffixHour:
                            prevIsQuickCharge = self.isPrevItemIsQuickCharge(curIdx, basicItemDictList)
                            print("prevIsQuickCharge=%s" % prevIsQuickCharge)
                            if prevIsQuickCharge:
                                # current is MUST 慢充时间(小时)
                                curIdNameKeyMapDict = {
                                    "id": 0,
                                    # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                                    "name": "慢充时间(小时)",
                                    "namePattern": "</span>\(小时\)$",
                                    "key": "carModelSlowCharge",
                                }
                            
                            if not curIdNameKeyMapDict:
                                prevIsActualTestEnduranceMileage = self.isPrevItemIsActualTestEnduranceMileage(curIdx, basicItemDictList)
                                print("prevIsActualTestEnduranceMileage=%s" % prevIsActualTestEnduranceMileage)
                                if prevIsActualTestEnduranceMileage:
                                    # current is MUST 实测快充时间(小时)
                                    curIdNameKeyMapDict = {
                                        "id": 0,
                                        # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                                        "name": "实测快充时间(小时)",
                                        "namePattern": "</span>\(小时\)$",
                                        "key": "carModelActualTestQuickCharge",
                                    }


                            if not curIdNameKeyMapDict:
                                prevPrevIsActualTestEnduranceMileage = self.isPrevPrevItemIsActualTestEnduranceMileage(curIdx, basicItemDictList)
                                print("prevPrevIsActualTestEnduranceMileage=%s" % prevPrevIsActualTestEnduranceMileage)
                                if prevPrevIsActualTestEnduranceMileage:
                                    # current is MUST 实测慢充时间(小时)
                                    curIdNameKeyMapDict = {
                                        "id": 0,
                                        # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                                        "name": "实测慢充时间(小时)",
                                        "namePattern": "</span>\(小时\)$",
                                        "key": "carModelActualTestSlowCharge",
                                    }
                        else:
                            curIdNameKeyMapDict = self.findMappingDict(0, curItemName)
                    else:
                        curIdNameKeyMapDict = self.findMappingDict(0, curItemName)


                print("curIdNameKeyMapDict=%s" % curIdNameKeyMapDict)
                if curIdNameKeyMapDict:
                    curItemKey = curIdNameKeyMapDict["key"]
                    print("curItemKey=%s" % curItemKey)
                    if curItemKey == "carModelWholeWarranty":
                        print("process special carModelWholeWarranty")
                        # 整车质保
                        # 三<span class='hs_kw5_configJS'></span>10<span class='hs_kw0_configJS'></span>公里
                        print("curItemFirstValue=%s" % curItemFirstValue)
                        curItemFirstValue = self.extractWholeWarranty(curItemFirstValue)
                        print("curItemFirstValue=%s" % curItemFirstValue)


                    carModelDict[curItemKey] = curItemFirstValue
                    print("+++ added %s=%s" % (curItemKey, curItemFirstValue))


            print("after extract all item value: carModelDict=%s" % carModelDict)
            self.saveSingleResult(carModelDict)
        else:
            self.saveSingleResult(carModelDict)


        # if isUseConfigJson:
        #     energyTypeIdx += ItemIndexDiff


        # if valueContent:
        #     self.processDiffEneryTypeCar(carModelDict, valueContent, energyTypeIdx, isUseConfigJson, ItemIndexDiff)
        # else:
        #     self.saveSingleResult(carModelDict)


    def isPrevItemIsQuickCharge(self, curIdx, itemDictList):
        print("in isPrevItemIsQuickCharge")
        print("curIdx=%s" % curIdx)


        prevIsQuickCharge = False


        if curIdx > 0:
            prevIdx = curIdx - 1
            print("prevIdx=%s" % prevIdx)
            prevItemDict = itemDictList[prevIdx]
            print("prevItemDict=%s" % prevItemDict)
            prevItemId = prevItemDict["id"]
            print("prevItemId=%s" % prevItemId)
            prevItemName = prevItemDict["name"]
            print("prevItemName=%s" % prevItemName)
            """
                "id": 1292,
                # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "快充时间(小时)",
            """
            QuickChargeItemId = 1292
            if prevItemId == QuickChargeItemId:
                prevIsQuickCharge = True


        print("prevIsQuickCharge=%s" % prevIsQuickCharge)
        return prevIsQuickCharge


    def checkIsActualTestEnduranceMileage(self, prevSomeNum, curIdx, itemDictList):
        print("in checkIsActualTestEnduranceMileage")
        print("prevSomeNum=%s, curIdx=%s" % (prevSomeNum, curIdx))


        isActualTestEnduranceMileage = False


        minAllowIdx = prevSomeNum - 1


        if curIdx > minAllowIdx:
            prevSomeIdx = curIdx - prevSomeNum
            print("prevSomeIdx=%s" % prevSomeIdx)
            prevSomeItemDict = itemDictList[prevSomeIdx]
            print("prevSomeItemDict=%s" % prevSomeItemDict)
            prevSomeItemId = prevSomeItemDict["id"]
            print("prevSomeItemId=%s" % prevSomeItemId)
            prevSomeItemName = prevSomeItemDict["name"]
            print("prevSomeItemName=%s" % prevSomeItemName)


            if prevSomeItemId == 0:
                """
                    "id": 0,
                    # "name": "<span class='hs_kw22_configpl'></span>续航里程(km)",
                    "name": "实测续航里程(km)",
                    "namePattern": "</span>续航里程\(km\)$",
                    "key": "carModelActualTestEnduranceMileage",
                """
                foundActualTestEnduranceMileage = re.search("</span>续航里程\(km\)$", prevSomeItemName)
                print("foundActualTestEnduranceMileage=%s" % foundActualTestEnduranceMileage)
                if foundActualTestEnduranceMileage:
                    isActualTestEnduranceMileage = True


        print("isActualTestEnduranceMileage=%s" % isActualTestEnduranceMileage)
        return isActualTestEnduranceMileage


    def isPrevItemIsActualTestEnduranceMileage(self, curIdx, itemDictList):
        print("in isPrevItemIsActualTestEnduranceMileage")
        print("curIdx=%s" % curIdx)
        return self.checkIsActualTestEnduranceMileage(1, curIdx, itemDictList)


    def isPrevPrevItemIsActualTestEnduranceMileage(self, curIdx, itemDictList):
        print("in isPrevPrevItemIsActualTestEnduranceMileage")
        print("curIdx=%s" % curIdx)
        return self.checkIsActualTestEnduranceMileage(2, curIdx, itemDictList)


    def findMappingDict(self, itemId=0, itemName=""):
        foundMapDict = None


        paramIdNameKeyMapDict = [
            # 汽油车 参数
            # https://car.autohome.com.cn/config/spec/41572.html
            # https://car.autohome.com.cn/config/spec/1006465.html
            {
                "id": 1149,
                "name": "能源类型",
                "key": "carEnergyType",
            }, {
                "id": 1311,
                "name": "环保标准",
                "key": "carModelEnvStandard",
            }, {
                "id": 0,
                # "name": "上市<span class='hs_kw51_configvR'></span>", # 上市时间
                "name": "上市时间",
                "namePattern": "^上市",
                "key": "carModelReleaseTime",
            }, {
                "id": 1185,
                # "name": "<span class='hs_kw40_configvR'></span><span class='hs_kw15_configvR'></span>(kW)",
                "name": "最大功率(kW)",
                "key": "carModelMaxPower",
            }, {
                "id": 1186,
                # "name": "<span class='hs_kw40_configvR'></span><span class='hs_kw61_configvR'></span>(N·m)",
                "name": "最大扭矩(N·m)",
                "key": "carModelMaxTorque",
            }, {
                "id": 1150,
                "name": "发动机",
                "key": "carModelEngine",
            }, {
                "id": 1245,
                "name": "变速箱",
                "key": "carModelGearBox",
            }, {
                "id": 1148,
                "name": "长*宽*高(mm)",
                "key": "carModelSize",
            }, {
                "id": 1147,
                "name": "车身结构",
                "key": "carModelBodyStructure",
            }, {
                "id": 1246,
                "name": "最高车速(km/h)",
                "key": "carModelMaxSpeed",
            }, {
                "id": 1250,
                "name": "官方0-100km/h加速(s)",
                "key": "carModelOfficialSpeedupTime",
            }, {
                "id": 1252,
                # "name": "<span class='hs_kw26_configvR'></span>0-100km/h加速(s)",
                "name": "实测0-100km/h加速(s)",
                "key": "carModelActualTestSpeedupTime",
            }, {
                "id": 1253,
                # "name": "<span class='hs_kw26_configvR'></span>100-0km/h制动(m)",
                "name": "实测100-0km/h制动(m)",
                "key": "carModelActualTestBrakeDistance",
            }, {
                "id": 1251,
                # "name": "工信部<span class='hs_kw10_configvR'></span><span class='hs_kw43_configvR'></span>(L/100km)",
                "name": "工信部综合油耗(L/100km)",
                "key": "carModelMiitCompositeFuelConsumption",
            }, {
                "id": 1254,
                # "name": "<span class='hs_kw26_configvR'></span><span class='hs_kw43_configvR'></span>(L/100km)",
                "name": "实测油耗(L/100km)",
                "key": "carModelActualFuelConsumption",
            }, {
                "id": 1255,
                # "name": "整车<span class='hs_kw73_configvR'></span>",
                "name": "整车质保",
                "key": "carModelWholeWarranty",
            },


            # 电动车 参数
            # https://car.autohome.com.cn/config/spec/39893.html
            # https://car.autohome.com.cn/config/spec/42875.html
            {
                "id": 1291,
                "name": "工信部纯电续航里程(km)",
                "key": "carModelMiitEnduranceMileagePureElectric",
            }, {
                "id": 1292,
                # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "name": "快充时间(小时)",
                "key": "carModelQuickCharge",
            # }, {
            #     "id": 0,
            #     # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
            #     "name": "慢充时间(小时)",
            #     "namePattern": "</span>\(小时\)$",
            #     "key": "carModelSlowCharge",
            }, {
                "id": 0,
                # https://car.autohome.com.cn/config/spec/39893.html
                # {'id': 0, 'name': "<span class='hs_kw39_configMh'></span><span class='hs_kw11_configMh'></span>百分比", 'pnid': '1_-1', 'valueitems': [{'specid': 39893, 'value': '80'}, {'specid': 42875, 'value': '80'}]}
                "name": "快充电量百分比",
                "namePattern": "</span>百分比$",
                "key": "carModelQuickChargePercent",
            }, {
                "id": 0,
                "name": "电动机(Ps)",
                "key": "carModelHorsePowerElectric",
            }, {
                "id": 0,
                # "name": "<span class='hs_kw22_configpl'></span>续航里程(km)",
                "name": "实测续航里程(km)",
                "namePattern": "</span>续航里程\(km\)$",
                "key": "carModelActualTestEnduranceMileage",
            # }, {
            #     "id": 0,
            #     # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
            #     "name": "实测快充时间(小时)",
            #     "namePattern": "</span>\(小时\)$",
            #     "key": "carModelActualTestQuickCharge",
            # }, {
            #     "id": 0,
            #     # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
            #     "name": "实测慢充时间(小时)",
            #     "namePattern": "</span>\(小时\)$",
            #     "key": "carModelActualTestSlowCharge",
            }
        ]


        isItemZero = itemId == 0
        print("isItemZero=%s" % isItemZero)


        foundSpan = re.search("<span", itemName)
        print("foundSpan=%s" % foundSpan)
        isSpecialName = bool(foundSpan)
        print("isSpecialName=%s" % isSpecialName)
        isNotSpecialName = not isSpecialName
        print("isNotSpecialName=%s" % isNotSpecialName)


        if not isItemZero:
            for eachMapDict in paramIdNameKeyMapDict:
                eachItemId = eachMapDict["id"]
                if eachItemId == itemId:
                    foundMapDict = eachMapDict
                    break


        if not foundMapDict:
            if itemName and isNotSpecialName:
                for eachMapDict in paramIdNameKeyMapDict:
                    eachItemName = eachMapDict["name"]
                    if eachItemName == itemName:
                        foundMapDict = eachMapDict
                        break


        if not foundMapDict:
            if (isItemZero and isSpecialName):
                for eachMapDict in paramIdNameKeyMapDict:
                    if "namePattern" in eachMapDict:
                        eachItemNamePattern = eachMapDict["namePattern"]
                        print("eachItemNamePattern=%s" % eachItemNamePattern)
                        foundMatchName = re.search(eachItemNamePattern, itemName)
                        print("foundMatchName=%s" % foundMatchName)
                        if foundMatchName:
                            foundMapDict = eachMapDict
                            break
        print("foundMapDict=%s from id=%s, name=%s" % (foundMapDict, itemId, itemName))
        return foundMapDict
目前跑出来的数据,没有出错:
数据中发现:
能源类型 除了之前的:
  • 汽油
  • 纯电动
  • 插电式混合动力
  • 油电混合
之前还有:
  • 柴油
  • 汽油+48V轻混系统
  • 增程式
以及:
【未解决】汽车之家车型车系数据:能源类型是空白的车型
另外看了看几个特殊的:
  • 汽油+48V轻混系统
  • 增程式
只有 东风风光的几款车型,比如:
https://www.autohome.com.cn/spec/41459/#pvareaid=3454492
所以可以忽略。
另外好像还有个问题:
【未解决】汽车之家车型车系数据:carBrandId是空

转载请注明:在路上 » 【已解决】汽车之家车型车系数据:解决部分车型基本参数字段个数不一致问题

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
86 queries in 0.131 seconds, using 21.11MB memory