对于:
【已解决】汽车之家车型车系数据:优化去掉js加速抓取车型参数配置
去调试,结果运行报错
list index out of range mIndex=16 in getItemFirstValue
很明显是:
列表index超出最大个数了。
去找到出错情况的页面去调试
找到原因了
对于:
之前汽油车
有的上面字段
对于此处的:
却没有:
所以:
需要想办法去找到两者区别
什么时候才会有上面字段
什么时候才会没有
难道是:
是:
车身结构:客车
没有这些 普通的车,比如
车身结构:5门5座SUV
才有的字段?因为看起来都是 加速之类的
普通客车不看中这些参数,所以没有?
所以就要把之前以为的 汽油车固定的字段:
if carEnergyType == "汽油": # https://car.autohome.com.cn/config/spec/43593.html # https://car.autohome.com.cn/config/spec/41572.html # self.processGasolineCar(valueContent, carModelDict) # https://car.autohome.com.cn/config/spec/1006466.html gasolineCarKeyIdxMapDict = { "carModelEnvStandard" : 3, "carModelReleaseTime" : 4, "carModelMaxPower" : 5, "carModelMaxTorque" : 6, "carModelEngine" : 7, "carModelGearBox" : 8, "carModelSize" : 9, "carModelBodyStructure" : 10, "carModelMaxSpeed" : 11, "carModelOfficialSpeedupTime" : 12, "carModelActualTestSpeedupTime" : 13, "carModelActualTestBrakeDistance" : 14, "carModelMiitCompositeFuelConsumption" : 15, "carModelActualFuelConsumption" : 16, } wholeWarrantyIdx = 17
需要去想办法,动态调整了。
当然知道,最完美的情况是:直接匹配字段名字
但是由于此处字段是js和css特殊处理的
所以没法直接匹配
好像突然发现了一个细节:
好像id是一致的
纯电动:
debug/奥迪Q2L_etron_纯电智享型_39893_notRunJs_config.json
{ "id": 1186, "name": "<span class='hs_kw8_configpl'></span><span class='hs_kw2_configpl'></span>(N·m)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "290" }, { "specid": 42875, "value": "290" }] }, 。。。 { "id": 1246, "name": "最高车速(km/h)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "150" }, { "specid": 42875, "value": "150" }] } 。。。 }, { "id": 1255, "name": "整车<span class='hs_kw36_configpl'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里" }, { "specid": 42875, "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里" }] }]
和:
汽油车
debug/九龙A6_1006466.json
{ "id": 1186, "name": "<span class='hs_kw14_configrI'></span><span class='hs_kw4_configrI'></span>(N·m)", "pnid": "1_-1", "valueitems": [{ "specid": 1006466, "value": "290" }, { "specid": 1006465, "value": "260" }, { "specid": 1006467, "value": "330" }] }, 。。。 }, { "id": 1246, "name": "最高车速(km/h)", "pnid": "1_-1", "valueitems": [{ "specid": 1006466, "value": "-" }, { "specid": 1006465, "value": "-" }, { "specid": 1006467, "value": "-" }] }, { "id": 1251, "name": "工信部<span class='hs_kw41_configrI'></span><span class='hs_kw35_configrI'></span>(L/100km)", "pnid": "1_-1", "valueitems": [{ "specid": 1006466, "value": "-" }, { "specid": 1006465, "value": "-" }, { "specid": 1006467, "value": "-" }] }, { "id": 1255, "name": "整车<span class='hs_kw48_configrI'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 1006466, "value": "-" }, { "specid": 1006465, "value": "-" }, { "specid": 1006467, "value": "-" }] }]
-》
- 1186:最大扭矩(N·m)
- 1251:工信部综合油耗:(L/100km)
等等
-》所以可以直接通过字段中的id去匹配要的字段即可
从逻辑上更加简单了。
而无需判断是哪种燃油类型了
不过貌似有些字段的id是0
所以需要去搞清楚
去整理对应id和字段关系
"id": 1149, "name": "能源类型",
等等
发现有id是0的:
"id": 0, "name": "上市<span class='hs_kw51_configvR'></span>",
待后续找规律
目前对于
来说,只有
"id": 0, "name": "上市<span class='hs_kw51_configvR'></span>", # 上市时间
的id是0,其他基本参数的字段的id都不是0,就好办了。
再去多找几个看看是否同样规律
汽油车
不过,突然发现,之前调试的html中已有完全id的字段定义:
debug/奥迪A3_configSpec_43593.html
var keyLink = [{ "id": 1339, "link": "https://car.autohome.com.cn/baike/detail_8_26_1339.html", "name": "<span class='hs_kw27_baikefn'></span><span class='hs_kw85_baikefn'></span>/<span class='hs_kw9_baikefn'></span>" }, { "id": 1340, "link": "https://car.autohome.com.cn/baike/detail_8_27_1340.html", "name": "尾门玻璃<span class='hs_kw63_baikefn'></span>开启" }, { "id": 1341, "link": "https://car.autohome.com.cn/baike/detail_8_30_1341.html", "name": "<span class='hs_kw76_baikefn'></span>数量" }, { "id": 1342, "link": "https://car.autohome.com.cn/baike/detail_8_31_1342.html", "name": "<span class='hs_kw12_baikefn'></span>大灯雨雾模式" }, { "id": 1343, "link": "https://car.autohome.com.cn/baike/detail_8_30_1343.html", "name": "车载CD/DVD" }, { 。。。 }, { "id": 1234, "link": "https://car.autohome.com.cn/baike/detail_7_21_1234.html", "name": "<span class='hs_kw12_baikefn'></span>电动机<span class='hs_kw7_baikefn'></span><span class='hs_kw35_baikefn'></span>(kW)" }, { "id": 1242, "link": "https://car.autohome.com.cn/baike/detail_8_31_1242.html", "name": "车<span class='hs_kw12_baikefn'></span>雾灯" }, { "id": 1245, "link": "https://car.autohome.com.cn/baike/detail_7_18_1245.html", "name": "变速箱" }, { "id": 1246, "link": "https://car.autohome.com.cn/baike/detail_7_18_1246.html", "name": "最高车速(km/h)" }, { "id": 1250, "link": "https://car.autohome.com.cn/baike/detail_7_18_1250.html", "name": "官方0-100km/h加速(s)" }, { "id": 1251, "link": "https://car.autohome.com.cn/baike/detail_7_18_1251.html", "name": "工信部<span class='hs_kw22_baikefn'></span><span class='hs_kw17_baikefn'></span>(L/100km)" }, { "id": 1252, "link": "https://car.autohome.com.cn/baike/detail_7_18_1252.html", "name": "<span class='hs_kw68_baikefn'></span>0-100km/h加速(s)" }, { "id": 1253, "link": "https://car.autohome.com.cn/baike/detail_7_18_1253.html", "name": "<span class='hs_kw68_baikefn'></span>100-0km/h制动(m)" }, { "id": 1254, "link": "https://car.autohome.com.cn/baike/detail_7_18_1254.html", "name": "<span class='hs_kw68_baikefn'></span><span class='hs_kw17_baikefn'></span>(L/100km)" }, { "id": 1255, "link": "https://car.autohome.com.cn/baike/detail_7_18_1255.html", "name": "整车<span class='hs_kw77_baikefn'></span>" }, { "id": 1256, "link": "https://car.autohome.com.cn/baike/detail_7_19_1256.html", "name": "<span class='hs_kw68_baikefn'></span><span class='hs_kw2_baikefn'></span>(mm)" }, { 。。。 }, { "id": 1290, "link": "https://car.autohome.com.cn/baike/detail_7_21_1290.html", "name": "百公里耗<span class='hs_kw56_baikefn'></span>(kWh/100km)" }, { "id": 1291, "link": "https://car.autohome.com.cn/baike/detail_7_21_1291.html", "name": "工信部纯电续航里程(km)" }, { 。。。
不过对于定义具体字段,用处没想的那么大
还是需要事先研究清楚,定义好
搜了:
"id": 0,
id是0的,并不多,只有9个左右。
其他几十个,都是有id的。
目前上面的需要的内容中,特殊的
上市时间
目前id是0
这部分值是
}, { "id": 0, "name": "上市<span class='hs_kw61_configHa'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 43593, "value": "2020.04" }, { "specid": 42418, "value": "2019.10" }, { 。。。
-》可以通过
- name符合 上市开头(或 时间结束)
- 找了其他地方,没有 上市 开头的字段了
- 不会重复,这个逻辑可用
- value是 YYYY.MM 格式
去判断
所以,目前够用了。
再去补全 其他类型车的字段
但是补全了电动车字段后:
# 电动车 参数 { "id": 1291, "name": "工信部纯电续航里程(km)", "key": "carModelMiitEnduranceMileagePureElectric", }, { "id": 1292, # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "快充时间(小时)", "key": "carModelQuickCharge", }, { "id": 0, # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "慢充时间(小时)", "key": "carModelSlowCharge", }, { "id": 0, "name": "快充电量百分比", "key": "carModelQuickChargePercent", }, { "id": 0, "name": "电动机(Ps)", "key": "carModelHorsePowerElectric", }, { "id": 0, # "name": "<span class='hs_kw22_configpl'></span>续航里程(km)", "name": "实测续航里程(km)", "key": "carModelActualTestEnduranceMileage", }, { "id": 0, # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "实测快充时间(小时)", "key": "carModelActualTestQuickCharge", }, { "id": 0, # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "实测慢充时间(小时)", "key": "carModelActualTestSlowCharge", }
发现个问题:
有多个字段的id是0
且根据name 没法直接判断是哪个
尤其是:
"id": 0, # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "慢充时间(小时)", "id": 0, # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "实测快充时间(小时)", "id": 0, # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "实测慢充时间(小时)",
都是充电时间,根本无法区分开
不过如果实在区分不开,对于后2个字段:
- 实测快充时间(小时)
- 实测慢充时间(小时)
就:不去抓取
因为也看到,除了:
另外的 纯电动
字段也都是空:
不过对于:
- 慢充时间(小时)
都是有值的
所以最好还是去抓取的。
不过实在不行,可以去根据位置判断:
慢充时间
的前面一个 肯定是:
快充时间
-》
而
快充时间
是有id的
}, { "id": 1292, # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "快充时间(小时)", "key": "carModelQuickCharge", }, { "id": 0, # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "慢充时间(小时)", "key": "carModelSlowCharge", },
所以可以先找 快充时间的index,
再加1后,
且:
id=0
name 末尾是 (小时)
确定就是:
慢充时间
了。
突然想到:
对于:
- 实测快充时间(小时)
- 实测慢充时间(小时)
也可以根据位置去计算:
2个位置肯定是:
- 实测续航里程(km)
后面的2个
所以也可以用位置去判断
即:
找到:
- 实测续航里程(km)
的后面2个
(如果没超过list的index的话)
然后后面2个,都满足:
id=0
name末尾是 (小时)
则就可以确定分别是:
- 实测快充时间(小时)
- 实测慢充时间(小时)
至此,去写代码
目前已经用代码:
@catch_status_code_error def carConfigSpecCallback(self, response): print("in carConfigSpecCallback") curCarModelDict = response.save print("curCarModelDict=%s" % curCarModelDict) carModelDict = copy.deepcopy(curCarModelDict) configSpecHtml = response.text # print("configSpecHtml=%s" % configSpecHtml) # print("") # # for debug # return # # config json item index - spec table html item index = 2 # ItemIndexDiff = 2 # isUseSpecTableHtml = True # isUseConfigJson = False # valueContent = None # energyTypeIdx = 2 # # Method 1: after run js, extract item value from spec table html # """ # <table class="tbcs" id="tab_0" style="width: 932px;"> # <tbody> # <tr> # <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5"> # <h3><span>基本参数</span></h3> # </th> # </tr> # <tr data-pnid="1_-1" id="tr_0"> # """ # tbodyDoc = response.doc("table[id='tab_0'] tbody") # print("tbodyDoc=%s" % tbodyDoc) # valueContent = tbodyDoc # isUseSpecTableHtml = True # isUseConfigJson = False # energyTypeIdx = 2 # Method 2: not run js, extract item value from config json # get value from config json # var config = {"message" ...... "returncode":"0","taskid":"8be676a3-e023-4fa9-826d-09cd42a1810c","time":"2020-08-27 20:56:17"}; foundConfigJson = re.search("var\s*config\s*=\s*(?P<configJson>\{[^;]+\});", configSpecHtml) print("foundConfigJson=%s" % foundConfigJson) if foundConfigJson: configJson = foundConfigJson.group("configJson") print("configJson=%s" % configJson) # configDict = json.loads(configJson, encoding="utf-8") configDict = json.loads(configJson) print("configDict=%s" % configDict) # if "result" in configDict: configResultDict = configDict["result"] print("configResultDict=%s" % configResultDict) # if "paramtypeitems" in configResultDict: paramTypeItemDictList = configResultDict["paramtypeitems"] print("paramTypeItemDictList=%s" % paramTypeItemDictList) # paramTypeItemNum = len(paramTypeItemDictList) # print("paramTypeItemNum=%s" % paramTypeItemNum) basicParamDict = paramTypeItemDictList[0] print("basicParamDict=%s" % basicParamDict) basicItemDictList = basicParamDict["paramitems"] print("basicItemDictList=%s" % basicItemDictList) # print("type(basicItemDictList)=%s" % type(basicItemDictList)) # basicItemNum = len(basicItemDictList) # print("basicItemNum=%s" % basicItemNum) # valueContent = basicItemDictList # isUseSpecTableHtml = False # isUseConfigJson = True # process each basic parameter basicItemDictLen = len(basicItemDictList) print("basicItemDictLen=%s" % basicItemDictLen) for curIdx, eachItemDict in enumerate(basicItemDictList): print("[%d] eachItemDict=%s" % (curIdx, eachItemDict)) curItemId = eachItemDict["id"] print("curItemId=%s" % curItemId) curItemName = eachItemDict["name"] print("curItemName=%s" % curItemName) curItemFirstValue = self.extractValueItemsValue(eachItemDict) print("curItemFirstValue=%s" % curItemFirstValue) curIdNameKeyMapDict = None if curItemId != 0: curIdNameKeyMapDict = self.findMappingDict(curItemId) else: # id = 0 foundSpan = re.search("<span", curItemName) print("foundSpan=%s" % foundSpan) isSpecialName = bool(foundSpan) print("isSpecialName=%s" % isSpecialName) if isSpecialName: # id=0 and contain '<span' special name foundSuffixHour = re.search("</span>\(小时\)$", curItemName) print("foundSuffixHour=%s" % foundSuffixHour) isSpecialSuffixHour = bool(foundSuffixHour) print("isSpecialSuffixHour=%s" % isSpecialSuffixHour) if isSpecialSuffixHour: prevIsQuickCharge = self.isPrevItemIsQuickCharge(curIdx, basicItemDictList) print("prevIsQuickCharge=%s" % prevIsQuickCharge) if prevIsQuickCharge: # current is MUST 慢充时间(小时) curIdNameKeyMapDict = { "id": 0, # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "慢充时间(小时)", "namePattern": "</span>\(小时\)$", "key": "carModelSlowCharge", } if not curIdNameKeyMapDict: prevIsActualTestEnduranceMileage = self.isPrevItemIsActualTestEnduranceMileage(curIdx, basicItemDictList) print("prevIsActualTestEnduranceMileage=%s" % prevIsActualTestEnduranceMileage) if prevIsActualTestEnduranceMileage: # current is MUST 实测快充时间(小时) curIdNameKeyMapDict = { "id": 0, # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "实测快充时间(小时)", "namePattern": "</span>\(小时\)$", "key": "carModelActualTestQuickCharge", } if not curIdNameKeyMapDict: prevPrevIsActualTestEnduranceMileage = self.isPrevPrevItemIsActualTestEnduranceMileage(curIdx, basicItemDictList) print("prevPrevIsActualTestEnduranceMileage=%s" % prevPrevIsActualTestEnduranceMileage) if prevPrevIsActualTestEnduranceMileage: # current is MUST 实测慢充时间(小时) curIdNameKeyMapDict = { "id": 0, # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "实测慢充时间(小时)", "namePattern": "</span>\(小时\)$", "key": "carModelActualTestSlowCharge", } else: curIdNameKeyMapDict = self.findMappingDict(0, curItemName) else: curIdNameKeyMapDict = self.findMappingDict(0, curItemName) print("curIdNameKeyMapDict=%s" % curIdNameKeyMapDict) if curIdNameKeyMapDict: curItemKey = curIdNameKeyMapDict["key"] print("curItemKey=%s" % curItemKey) if curItemKey == "carModelWholeWarranty": print("process special carModelWholeWarranty") # 整车质保 # 三<span class='hs_kw5_configJS'></span>10<span class='hs_kw0_configJS'></span>公里 print("curItemFirstValue=%s" % curItemFirstValue) curItemFirstValue = self.extractWholeWarranty(curItemFirstValue) print("curItemFirstValue=%s" % curItemFirstValue) carModelDict[curItemKey] = curItemFirstValue print("+++ added %s=%s" % (curItemKey, curItemFirstValue)) print("after extract all item value: carModelDict=%s" % carModelDict) self.saveSingleResult(carModelDict) else: self.saveSingleResult(carModelDict) # if isUseConfigJson: # energyTypeIdx += ItemIndexDiff # if valueContent: # self.processDiffEneryTypeCar(carModelDict, valueContent, energyTypeIdx, isUseConfigJson, ItemIndexDiff) # else: # self.saveSingleResult(carModelDict) def isPrevItemIsQuickCharge(self, curIdx, itemDictList): print("in isPrevItemIsQuickCharge") print("curIdx=%s" % curIdx) prevIsQuickCharge = False if curIdx > 0: prevIdx = curIdx - 1 print("prevIdx=%s" % prevIdx) prevItemDict = itemDictList[prevIdx] print("prevItemDict=%s" % prevItemDict) prevItemId = prevItemDict["id"] print("prevItemId=%s" % prevItemId) prevItemName = prevItemDict["name"] print("prevItemName=%s" % prevItemName) """ "id": 1292, # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "快充时间(小时)", """ QuickChargeItemId = 1292 if prevItemId == QuickChargeItemId: prevIsQuickCharge = True print("prevIsQuickCharge=%s" % prevIsQuickCharge) return prevIsQuickCharge def checkIsActualTestEnduranceMileage(self, prevSomeNum, curIdx, itemDictList): print("in checkIsActualTestEnduranceMileage") print("prevSomeNum=%s, curIdx=%s" % (prevSomeNum, curIdx)) isActualTestEnduranceMileage = False minAllowIdx = prevSomeNum - 1 if curIdx > minAllowIdx: prevSomeIdx = curIdx - prevSomeNum print("prevSomeIdx=%s" % prevSomeIdx) prevSomeItemDict = itemDictList[prevSomeIdx] print("prevSomeItemDict=%s" % prevSomeItemDict) prevSomeItemId = prevSomeItemDict["id"] print("prevSomeItemId=%s" % prevSomeItemId) prevSomeItemName = prevSomeItemDict["name"] print("prevSomeItemName=%s" % prevSomeItemName) if prevSomeItemId == 0: """ "id": 0, # "name": "<span class='hs_kw22_configpl'></span>续航里程(km)", "name": "实测续航里程(km)", "namePattern": "</span>续航里程\(km\)$", "key": "carModelActualTestEnduranceMileage", """ foundActualTestEnduranceMileage = re.search("</span>续航里程\(km\)$", prevSomeItemName) print("foundActualTestEnduranceMileage=%s" % foundActualTestEnduranceMileage) if foundActualTestEnduranceMileage: isActualTestEnduranceMileage = True print("isActualTestEnduranceMileage=%s" % isActualTestEnduranceMileage) return isActualTestEnduranceMileage def isPrevItemIsActualTestEnduranceMileage(self, curIdx, itemDictList): print("in isPrevItemIsActualTestEnduranceMileage") print("curIdx=%s" % curIdx) return self.checkIsActualTestEnduranceMileage(1, curIdx, itemDictList) def isPrevPrevItemIsActualTestEnduranceMileage(self, curIdx, itemDictList): print("in isPrevPrevItemIsActualTestEnduranceMileage") print("curIdx=%s" % curIdx) return self.checkIsActualTestEnduranceMileage(2, curIdx, itemDictList) def findMappingDict(self, itemId=0, itemName=""): foundMapDict = None paramIdNameKeyMapDict = [ # 汽油车 参数 # https://car.autohome.com.cn/config/spec/41572.html # https://car.autohome.com.cn/config/spec/1006465.html { "id": 1149, "name": "能源类型", "key": "carEnergyType", }, { "id": 1311, "name": "环保标准", "key": "carModelEnvStandard", }, { "id": 0, # "name": "上市<span class='hs_kw51_configvR'></span>", # 上市时间 "name": "上市时间", "namePattern": "^上市", "key": "carModelReleaseTime", }, { "id": 1185, # "name": "<span class='hs_kw40_configvR'></span><span class='hs_kw15_configvR'></span>(kW)", "name": "最大功率(kW)", "key": "carModelMaxPower", }, { "id": 1186, # "name": "<span class='hs_kw40_configvR'></span><span class='hs_kw61_configvR'></span>(N·m)", "name": "最大扭矩(N·m)", "key": "carModelMaxTorque", }, { "id": 1150, "name": "发动机", "key": "carModelEngine", }, { "id": 1245, "name": "变速箱", "key": "carModelGearBox", }, { "id": 1148, "name": "长*宽*高(mm)", "key": "carModelSize", }, { "id": 1147, "name": "车身结构", "key": "carModelBodyStructure", }, { "id": 1246, "name": "最高车速(km/h)", "key": "carModelMaxSpeed", }, { "id": 1250, "name": "官方0-100km/h加速(s)", "key": "carModelOfficialSpeedupTime", }, { "id": 1252, # "name": "<span class='hs_kw26_configvR'></span>0-100km/h加速(s)", "name": "实测0-100km/h加速(s)", "key": "carModelActualTestSpeedupTime", }, { "id": 1253, # "name": "<span class='hs_kw26_configvR'></span>100-0km/h制动(m)", "name": "实测100-0km/h制动(m)", "key": "carModelActualTestBrakeDistance", }, { "id": 1251, # "name": "工信部<span class='hs_kw10_configvR'></span><span class='hs_kw43_configvR'></span>(L/100km)", "name": "工信部综合油耗(L/100km)", "key": "carModelMiitCompositeFuelConsumption", }, { "id": 1254, # "name": "<span class='hs_kw26_configvR'></span><span class='hs_kw43_configvR'></span>(L/100km)", "name": "实测油耗(L/100km)", "key": "carModelActualFuelConsumption", }, { "id": 1255, # "name": "整车<span class='hs_kw73_configvR'></span>", "name": "整车质保", "key": "carModelWholeWarranty", }, # 电动车 参数 # https://car.autohome.com.cn/config/spec/39893.html # https://car.autohome.com.cn/config/spec/42875.html { "id": 1291, "name": "工信部纯电续航里程(km)", "key": "carModelMiitEnduranceMileagePureElectric", }, { "id": 1292, # "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "name": "快充时间(小时)", "key": "carModelQuickCharge", # }, { # "id": 0, # # "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", # "name": "慢充时间(小时)", # "namePattern": "</span>\(小时\)$", # "key": "carModelSlowCharge", }, { "id": 0, # https://car.autohome.com.cn/config/spec/39893.html # {'id': 0, 'name': "<span class='hs_kw39_configMh'></span><span class='hs_kw11_configMh'></span>百分比", 'pnid': '1_-1', 'valueitems': [{'specid': 39893, 'value': '80'}, {'specid': 42875, 'value': '80'}]} "name": "快充电量百分比", "namePattern": "</span>百分比$", "key": "carModelQuickChargePercent", }, { "id": 0, "name": "电动机(Ps)", "key": "carModelHorsePowerElectric", }, { "id": 0, # "name": "<span class='hs_kw22_configpl'></span>续航里程(km)", "name": "实测续航里程(km)", "namePattern": "</span>续航里程\(km\)$", "key": "carModelActualTestEnduranceMileage", # }, { # "id": 0, # # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", # "name": "实测快充时间(小时)", # "namePattern": "</span>\(小时\)$", # "key": "carModelActualTestQuickCharge", # }, { # "id": 0, # # "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", # "name": "实测慢充时间(小时)", # "namePattern": "</span>\(小时\)$", # "key": "carModelActualTestSlowCharge", } ] isItemZero = itemId == 0 print("isItemZero=%s" % isItemZero) foundSpan = re.search("<span", itemName) print("foundSpan=%s" % foundSpan) isSpecialName = bool(foundSpan) print("isSpecialName=%s" % isSpecialName) isNotSpecialName = not isSpecialName print("isNotSpecialName=%s" % isNotSpecialName) if not isItemZero: for eachMapDict in paramIdNameKeyMapDict: eachItemId = eachMapDict["id"] if eachItemId == itemId: foundMapDict = eachMapDict break if not foundMapDict: if itemName and isNotSpecialName: for eachMapDict in paramIdNameKeyMapDict: eachItemName = eachMapDict["name"] if eachItemName == itemName: foundMapDict = eachMapDict break if not foundMapDict: if (isItemZero and isSpecialName): for eachMapDict in paramIdNameKeyMapDict: if "namePattern" in eachMapDict: eachItemNamePattern = eachMapDict["namePattern"] print("eachItemNamePattern=%s" % eachItemNamePattern) foundMatchName = re.search(eachItemNamePattern, itemName) print("foundMatchName=%s" % foundMatchName) if foundMatchName: foundMapDict = eachMapDict break print("foundMapDict=%s from id=%s, name=%s" % (foundMapDict, itemId, itemName)) return foundMapDict
目前跑出来的数据,没有出错:
数据中发现:
能源类型 除了之前的:
- 汽油
- 纯电动
- 插电式混合动力
- 油电混合
之前还有:
- 柴油
- 汽油+48V轻混系统
- 增程式
以及:
【未解决】汽车之家车型车系数据:能源类型是空白的车型
另外看了看几个特殊的:
- 汽油+48V轻混系统
- 增程式
只有 东风风光的几款车型,比如:
所以可以忽略。
另外好像还有个问题:
【未解决】汽车之家车型车系数据:carBrandId是空