折腾:
【已解决】汽车之家车型车系数据:抓取车型的详细参数配置
期间,已经基本上实现了获取参数配置数据了。
但是有个问题:
进入参数配置页面时,是通过:
# https://car.autohome.com.cn/config/spec/43593.html print("carConfigSpecUrl=%s" % carConfigSpecUrl) self.crawl(carConfigSpecUrl, fetch_type="js", callback=self.carConfigSpecCallback, save=carModelDict, )
即运行了js部分的
这样后续的
""" <table class="tbcs" id="tab_0" style="width: 932px;"> <tbody> <tr> <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5"> <h3><span>基本参数</span></h3> </th> </tr> <tr data-pnid="1_-1" id="tr_0"> """ tbodyDoc = response.doc("table[id='tab_0'] tbody") print("tbodyDoc=%s" % tbodyDoc)
才不会是空的。
才能获取,运行了js后的,html中表格部分的值,才是已合并后的文字。
但是带来了问题:
每个页面都需要额外运行js,很慢。
如果批量爬取,速度太慢。
现在希望是实现加速:去掉js执行部分。
看看不用js返回的html中,能否直接从json数据中获取这些值。
之前就研究过的,config的json中包含对应的值。
看看如何去提取
不过此处由于单个车系车型太多,不好分析具体的值
换个车系里车型少的,就2个的
去看看
发现config处理后:
{ "message": "<span class='hs_kw29_configpl'></span>", "result": { "paramtypeitems": [{ "name": "基本参数", "paramitems": [{ "id": 0, "name": "车型<span class='hs_kw33_configpl'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "<span class='hs_kw9_configpl'></span>Q2L e-tron 2019款 Q2L e-tron 纯电智享型" }, { "specid": 42875, "value": "<span class='hs_kw9_configpl'></span>Q2L e-tron 2019款 Q2L e-tron 纯电智酷型" }] }, { "id": 0, "name": "厂<span class='hs_kw15_configpl'></span><span class='hs_kw0_configpl'></span><span class='hs_kw57_configpl'></span><span class='hs_kw55_configpl'></span>(<span class='hs_kw14_configpl'></span>)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "23.73<span class='hs_kw1_configpl'></span>" }, { "specid": 42875, "value": "22.68<span class='hs_kw1_configpl'></span>" }] }, { "id": 52, "name": "厂<span class='hs_kw15_configpl'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "<span class='hs_kw26_configpl'></span>-<span class='hs_kw47_configpl'></span><span class='hs_kw9_configpl'></span>" }, { "specid": 42875, "value": "<span class='hs_kw26_configpl'></span>-<span class='hs_kw47_configpl'></span><span class='hs_kw9_configpl'></span>" }] }, { "id": 53, "name": "级别", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "<span class='hs_kw16_configpl'></span>" }, { "specid": 42875, "value": "<span class='hs_kw16_configpl'></span>" }] }, { "id": 1149, "name": "能源类型", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "纯电动" }, { "specid": 42875, "value": "纯电动" }] }, { "id": 0, "name": "上市<span class='hs_kw40_configpl'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "2019.11" }, { "specid": 42875, "value": "2019.11" }] }, { "id": 1291, "name": "工信部纯电续航里程(km)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "265" }, { "specid": 42875, "value": "265" }] }, { "id": 1292, "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "0.6" }, { "specid": 42875, "value": "0.6" }] }, { "id": 0, "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "17" }, { "specid": 42875, "value": "17" }] }, { "id": 0, "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw11_configpl'></span>百分比", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "80" }, { "specid": 42875, "value": "80" }] }, { "id": 1185, "name": "<span class='hs_kw8_configpl'></span><span class='hs_kw42_configpl'></span>(kW)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "100" }, { "specid": 42875, "value": "100" }] }, { "id": 1186, "name": "<span class='hs_kw8_configpl'></span><span class='hs_kw2_configpl'></span>(N·m)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "290" }, { "specid": 42875, "value": "290" }] }, { "id": 0, "name": "电动机(Ps)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "136" }, { "specid": 42875, "value": "136" }] }, { "id": 1148, "name": "长*宽*高(mm)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "4237*1785*1548" }, { "specid": 42875, "value": "4237*1785*1548" }] }, { "id": 1147, "name": "车身结构", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "5门5座SUV" }, { "specid": 42875, "value": "5门5座SUV" }] }, { "id": 1246, "name": "最高车速(km/h)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "150" }, { "specid": 42875, "value": "150" }] }, { "id": 1250, "name": "官方0-100km/h加速(s)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "-" }, { "specid": 42875, "value": "-" }] }, { "id": 1252, "name": "<span class='hs_kw22_configpl'></span>0-100km/h加速(s)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "-" }, { "specid": 42875, "value": "-" }] }, { "id": 1253, "name": "<span class='hs_kw22_configpl'></span>100-0km/h制动(m)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "-" }, { "specid": 42875, "value": "-" }] }, { "id": 0, "name": "<span class='hs_kw22_configpl'></span>续航里程(km)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "-" }, { "specid": 42875, "value": "-" }] }, { "id": 0, "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "-" }, { "specid": 42875, "value": "-" }] }, { "id": 0, "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "-" }, { "specid": 42875, "value": "-" }] }, { "id": 1255, "name": "整车<span class='hs_kw36_configpl'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里" }, { "specid": 42875, "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里" }] }] }, 。。。
其中的内容,很方便提取,也和运行js后的结果是一样的
对应着页面上的:
所以,还是容易处理的。
所以去把之前的,从html中提出first value的代码:
def getItemFirstValue(self, rootDoc, trNumber, isRespDoc=False): """ <tr data-pnid="1_-1" id="tr_2"> <th> <div id="1149"><a href="https://car.autohome.com.cn/baike/detail_7_18_1149.html#pvareaid=2042252">能源类型</a> </div> </th> <td style="background:#F0F3F8;"> <div>纯电动</div> </td> <tr data-pnid="1_-1" id="tr_3"> <th> <div id="0">上市<span class="hs_kw40_configxv"></span></div> </th> <td style="background:#F0F3F8;"> <div>2019.11</div> </td> <td> <div>2019.11</div> </td> <td> <div></div> </td> <td> <div></div> </td> </tr> """ trQuery = "tr[id='tr_%s']" % trNumber # print("trQuery=%s" % trQuery) trDoc = rootDoc.find(trQuery) # print("trDoc=%s" % trDoc) tdDocGenerator = trDoc.items("td") # print("tdDocGenerator=%s" % tdDocGenerator) tdDocList = list(tdDocGenerator) # print("tdDocList=%s" % tdDocList) firstTdDoc = tdDocList[0] # print("firstTdDoc=%s" % firstTdDoc) firstTdDivDoc = firstTdDoc.find("div") print("firstTdDivDoc=%s" % firstTdDivDoc) if isRespDoc: respItem = firstTdDivDoc else: firstItemValue = firstTdDivDoc.text() respItem = firstItemValue print("respItem=%s" % respItem) return respItem
去改为从config中提取
这样就不用run js了。
最后代码是:
def getItemFirstValue(self, inputContent, itemIndex): print("in getItemFirstValue") # firstItemValue = self.extractTrFirstTdValue(inputContent, itemIndex) firstItemValue = self.extractDictListFirstValue(inputContent, itemIndex) return firstItemValue def extractDictListFirstValue(self, paramItemDictList, itemIndex): """ [ ..., { "id": 1149, "name": "能源类型", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "纯电动" }, { "specid": 42875, "value": "纯电动" }] } ..., { "id": 1292, "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "0.6" }, { "specid": 42875, "value": "0.6" }] }, ... , { "id": 1255, "name": "整车<span class='hs_kw36_configpl'></span>", "pnid": "1_-1", "valueitems": [{ "specid": 39893, "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里" }, { "specid": 42875, "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里" }] } ] """ paramItemDict = paramItemDictList[itemIndex] print("paramItemDict=%s" % paramItemDict) valueItemList = paramItemDict["valueitems"] print("valueItemList=%s" % valueItemList) firstItemDict = valueItemList[0] print("firstItemDict=%s" % firstItemDict) # firstItemDict={'specid': 43593, 'value': "<span class='hs_kw57_configxt'></span>-<span class='hs_kw21_configxt'></span><span class='hs_kw24_configxt'></span>"} firtItemValue = firstItemDict["value"] # firtItemValue=<span class='hs_kw57_configxt'></span>-<span class='hs_kw21_configxt'></span><span class='hs_kw24_configxt'></span> print("firtItemValue=%s" % firtItemValue) return firtItemValue # def extractTrFirstTdValue(self, rootDoc, trNumber, isRespDoc=False): def extractTrFirstTdValue(self, rootDoc, trNumber): """ <tr data-pnid="1_-1" id="tr_2"> <th> <div id="1149"><a href="https://car.autohome.com.cn/baike/detail_7_18_1149.html#pvareaid=2042252">能源类型</a> </div> </th> <td style="background:#F0F3F8;"> <div>纯电动</div> </td> <tr data-pnid="1_-1" id="tr_3"> <th> <div id="0">上市<span class="hs_kw40_configxv"></span></div> </th> <td style="background:#F0F3F8;"> <div>2019.11</div> </td> <td> <div>2019.11</div> </td> <td> <div></div> </td> <td> <div></div> </td> </tr> <tr data-pnid="1_-1" id="tr_20" style="background: rgb(255, 255, 255);"> <th> <div id="1255"><a href="https://car.autohome.com.cn/baike/detail_7_18_1255.html#pvareaid=2042252">整车<span class="hs_kw36_configaJ"></span></a></div> </th> <td style="background:#F0F3F8;"> <div>三<span class="hs_kw7_configaJ"></span>10<span class="hs_kw1_configaJ"></span>公里</div> </td> <td> <div>三<span class="hs_kw7_configaJ"></span>10<span class="hs_kw1_configaJ"></span>公里</div> </td> <td> <div></div> </td> <td> <div></div> </td> </tr> """ trQuery = "tr[id='tr_%s']" % trNumber # print("trQuery=%s" % trQuery) trDoc = rootDoc.find(trQuery) # print("trDoc=%s" % trDoc) tdDocGenerator = trDoc.items("td") # print("tdDocGenerator=%s" % tdDocGenerator) tdDocList = list(tdDocGenerator) # print("tdDocList=%s" % tdDocList) firstTdDoc = tdDocList[0] # print("firstTdDoc=%s" % firstTdDoc) firstTdDivDoc = firstTdDoc.find("div") print("firstTdDivDoc=%s" % firstTdDivDoc) # if isRespDoc: # respItem = firstTdDivDoc # else: # firstItemValue = firstTdDivDoc.text() # respItem = firstItemValue # print("respItem=%s" % respItem) # return respItem respItemHtml = firstTdDivDoc.html() print("respItemHtml=%s" % respItemHtml) return respItemHtml
以及相关的函数也优化一下
比如 整车质保:
# def extractWholeWarranty(self, firstDivDoc): def extractWholeWarranty(self, firstDivHtml): carModelWholeWarranty = "" # <div>三<span class="hs_kw7_configxv"></span>10<span class="hs_kw1_configxv"></span>公里</div> # print("firstDivDoc=%s" % firstDivDoc) # carModelWholeWarranty = firstDivDoc.text() # 三10公里 # firstDivHtml = firstDivDoc.html() print("firstDivHtml=%s" % firstDivHtml) # 三<span class="hs_kw7_configCC"></span>10<span class="hs_kw1_configCC"></span>公里 # carWholeQualityQuarantee = re.sub("[^<>]+(?P<firstSpan><span.+?></span>)[^<>]+(?P<secondSpan><span.+?></span>)[^<>]+", ) foundYearDistance = re.search("(?P<warrantyYear>[^<>]+)<span.+?></span>(?P<distanceNumber>[^<>]+)<span.+?></span>(?P<distanceUnit>[^<>]+)", firstDivHtml) if foundYearDistance: warrantyYear = foundYearDistance.group("warrantyYear") distanceNumber = foundYearDistance.group("distanceNumber") distanceUnit = foundYearDistance.group("distanceUnit") carModelWholeWarranty = "%s年或%s万%s" % (warrantyYear, distanceNumber, distanceUnit) else: # special: # https://car.autohome.com.cn/config/spec/46700.html # <div>三<span class="hs_kw58_configWh"></span></div> # 三<span class="hs_kw58_configOf"></span> foundYearNotLimitDistance = re.search("(?P<warrantyYear>[^<>]+)<span.+?></span>", firstDivHtml) if foundYearNotLimitDistance: warrantyYear = foundYearNotLimitDistance.group("warrantyYear") carModelWholeWarranty = "%s年不限公里" % warrantyYear print("carModelWholeWarranty=%s" % carModelWholeWarranty) return carModelWholeWarranty def getWholeWarranty(self, inputContent, itemIndex): # firstDivDoc = self.getItemFirstValue(inputContent, itemIndex, isRespDoc=True) # print("firstDivDoc=%s" % firstDivDoc) # carModelWholeWarranty = self.extractWholeWarranty(firstDivDoc) firstDivDocHtml = self.getItemFirstValue(inputContent, itemIndex) print("firstDivDocHtml=%s" % firstDivDocHtml) carModelWholeWarranty = self.extractWholeWarranty(firstDivDocHtml) return carModelWholeWarranty
也去把处理不同能源类型的共有部分整合处理,最后是:
@catch_status_code_error def carConfigSpecCallback(self, response): print("in carConfigSpecCallback") curCarModelDict = response.save print("curCarModelDict=%s" % curCarModelDict) carModelDict = copy.deepcopy(curCarModelDict) configSpecHtml = response.text # print("configSpecHtml=%s" % configSpecHtml) # print("") # # for debug # return # config json item index - spec table html item index = 2 ItemIndexDiff = 2 isUseSpecTableHtml = True isUseConfigJson = False valueContent = None energyTypeIdx = 2 # # Method 1: after run js, extract item value from spec table html # """ # <table class="tbcs" id="tab_0" style="width: 932px;"> # <tbody> # <tr> # <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5"> # <h3><span>基本参数</span></h3> # </th> # </tr> # <tr data-pnid="1_-1" id="tr_0"> # """ # tbodyDoc = response.doc("table[id='tab_0'] tbody") # print("tbodyDoc=%s" % tbodyDoc) # valueContent = tbodyDoc # isUseSpecTableHtml = True # isUseConfigJson = False # energyTypeIdx = 2 # Method 2: not run js, extract item value from config json # get value from config json # var config = {"message" ...... "returncode":"0","taskid":"8be676a3-e023-4fa9-826d-09cd42a1810c","time":"2020-08-27 20:56:17"}; foundConfigJson = re.search("var\s*config\s*=\s*(?P<configJson>\{[^;]+\});", configSpecHtml) print("foundConfigJson=%s" % foundConfigJson) if foundConfigJson: configJson = foundConfigJson.group("configJson") print("configJson=%s" % configJson) # configDict = json.loads(configJson, encoding="utf-8") configDict = json.loads(configJson) print("configDict=%s" % configDict) # if "result" in configDict: configResultDict = configDict["result"] print("configResultDict=%s" % configResultDict) # if "paramtypeitems" in configResultDict: paramTypeItemDictList = configResultDict["paramtypeitems"] print("paramTypeItemDictList=%s" % paramTypeItemDictList) # paramTypeItemNum = len(paramTypeItemDictList) # print("paramTypeItemNum=%s" % paramTypeItemNum) basicParamDict = paramTypeItemDictList[0] print("basicParamDict=%s" % basicParamDict) basicItemDictList = basicParamDict["paramitems"] print("basicItemDictList=%s" % basicItemDictList) # print("type(basicItemDictList)=%s" % type(basicItemDictList)) # basicItemNum = len(basicItemDictList) # print("basicItemNum=%s" % basicItemNum) valueContent = basicItemDictList isUseSpecTableHtml = False isUseConfigJson = True if isUseConfigJson: energyTypeIdx += ItemIndexDiff if valueContent: carEnergyType = self.getItemFirstValue(valueContent, energyTypeIdx) # 纯电动 / 汽油 / 插电式混合动力 / 油电混合 carModelDict["carEnergyType"] = carEnergyType if carEnergyType == "汽油": # https://car.autohome.com.cn/config/spec/43593.html # https://car.autohome.com.cn/config/spec/41572.html # self.processGasolineCar(valueContent, carModelDict) gasolineCarKeyIdxMapDict = { "carModelEnvStandard" : 3, "carModelReleaseTime" : 4, "carModelMaxPower" : 5, "carModelMaxTorque" : 6, "carModelEngine" : 7, "carModelGearBox" : 8, "carModelSize" : 9, "carModelBodyStructure" : 10, "carModelMaxSpeed" : 11, "carModelOfficialSpeedupTime" : 12, "carModelActualTestSpeedupTime" : 13, "carModelActualTestBrakeDistance" : 14, "carModelMiitCompositeFuelConsumption" : 15, "carModelActualFuelConsumption" : 16, } wholeWarrantyIdx = 17 if isUseConfigJson: for eachKey in gasolineCarKeyIdxMapDict.keys(): gasolineCarKeyIdxMapDict[eachKey] += ItemIndexDiff wholeWarrantyIdx += ItemIndexDiff self.processSingleEneryTypeCar(gasolineCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict) elif carEnergyType == "纯电动": # https://car.autohome.com.cn/config/spec/42875.html # self.processPureElectricCar(valueContent, carModelDict) pureElectricCarKeyIdxMapDict = { "carModelReleaseTime": 3, "carModelMiitEnduranceMileagePureElectric": 4, "carModelQuickCharge": 5, "carModelSlowCharge": 6, "carModelQuickChargePercent": 7, "carModelMaxPower": 8, "carModelMaxTorque": 9, "carModelHorsePowerElectric": 10, "carModelSize": 11, "carModelBodyStructure": 12, "carModelMaxSpeed": 13, "carModelOfficialSpeedupTime": 14, "carModelActualTestSpeedupTime": 15, "carModelActualTestBrakeDistance": 16, "carModelActualTestEnduranceMileage": 17, "carModelActualTestQuickCharge": 18, "carModelActualTestSlowCharge": 19, } wholeWarrantyIdx = 20 if isUseConfigJson: for eachKey in pureElectricCarKeyIdxMapDict.keys(): pureElectricCarKeyIdxMapDict[eachKey] += ItemIndexDiff wholeWarrantyIdx += ItemIndexDiff self.processSingleEneryTypeCar(pureElectricCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict) elif carEnergyType == "插电式混合动力": # https://car.autohome.com.cn/config/series/4460.html # self.processPhevCar(valueContent, carModelDict) phevCarKeyIdxMapDict = { "carModelEnvStandard": 3, "carModelReleaseTime": 4, "carModelMiitEnduranceMileagePureElectric": 5, "carModelQuickCharge": 6, "carModelSlowCharge": 7, "carModelQuickChargePercent": 8, "carModelMaxPower": 9, "carModelMaxTorque": 10, "carModelEngine": 11, "carModelHorsePowerElectric": 12, "carModelGearBox": 13, "carModelSize": 14, "carModelBodyStructure": 15, "carModelMaxSpeed": 16, "carModelOfficialSpeedupTime": 17, "carModelActualTestSpeedupTime": 18, "carModelActualTestBrakeDistance": 19, "carModelActualTestEnduranceMileage": 20, "carModelActualTestQuickCharge": 21, "carModelActualTestSlowCharge": 22, "carModelMiitCompositeFuelConsumption": 23, "carModelActualFuelConsumption": 24, } wholeWarrantyIdx = 25 if isUseConfigJson: for eachKey in phevCarKeyIdxMapDict.keys(): phevCarKeyIdxMapDict[eachKey] += ItemIndexDiff wholeWarrantyIdx += ItemIndexDiff self.processSingleEneryTypeCar(phevCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict) elif carEnergyType == "油电混合": # https://car.autohome.com.cn/config/spec/35507.html # self.processHevCar(valueContent, carModelDict) hevCarKeyIdxMapDict = { "carModelEnvStandard": 3, "carModelReleaseTime": 4, "carModelMaxPower": 5, "carModelMaxTorque": 6, "carModelEngine": 7, "carModelHorsePowerElectric": 8, "carModelGearBox": 9, "carModelSize": 10, "carModelBodyStructure": 11, "carModelMaxSpeed": 12, "carModelOfficialSpeedupTime": 13, "carModelActualTestSpeedupTime": 14, "carModelActualTestBrakeDistance": 15, "carModelMiitCompositeFuelConsumption": 16, "carModelActualFuelConsumption": 17, } wholeWarrantyIdx = 18 if isUseConfigJson: for eachKey in hevCarKeyIdxMapDict.keys(): hevCarKeyIdxMapDict[eachKey] += ItemIndexDiff wholeWarrantyIdx += ItemIndexDiff self.processSingleEneryTypeCar(hevCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict) else: errMsg = "TODO: add support %s!" % carEnergyType raise Exception(errMsg) else: self.saveSingleResult(carModelDict) def processSingleEneryTypeCar(self, keyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict): for eachItemKey in keyIdxMapDict.keys(): print("eachItemKey=%s" % eachItemKey) eachItemIndex = keyIdxMapDict[eachItemKey] print("eachItemIndex=%s" % eachItemIndex) eachItemValue = self.getItemFirstValue(valueContent, eachItemIndex) print("eachItemValue=%s" % eachItemValue) carModelDict[eachItemKey] = eachItemValue # 整车质保 carModelWholeWarranty = self.getWholeWarranty(valueContent, wholeWarrantyIdx) # 三年或10万公里 print("carModelWholeWarranty=%s" % carModelWholeWarranty) carModelDict["carModelWholeWarranty"] = carModelWholeWarranty self.saveSingleResult(carModelDict)
然后把之前的都注释掉了:
# def processGasolineCar(self, valueContent, carModelDict): # # 汽油 # # https://car.autohome.com.cn/config/spec/43593.html # # https://car.autohome.com.cn/config/spec/41572.html # # 环保标准 # carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国VI # carModelDict["carModelEnvStandard"] = carModelEnvStandard # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2020.04 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 5) # 110 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 6) # 250 # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 发动机 # carModelEngine = self.getItemFirstValue(valueContent, 7) # 1.4T 150马力 L4 # carModelDict["carModelEngine"] = carModelEngine # # 变速箱 # carModelGearBox = self.getItemFirstValue(valueContent, 8) # 7挡双离合 # carModelDict["carModelGearBox"] = carModelGearBox # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 9) # 4312*1785*1426 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 10) # 5门5座两厢车 # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 11) # 200 # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 12) # 8.4 # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 13) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 14) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 工信部综合油耗(L/100km) # carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 15) # 5.8 # carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption # # 实测油耗(L/100km) # carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 16) # - # carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption # self.saveSingleResult(carModelDict) # def processPureElectricCar(self, valueContent, carModelDict): # # 纯电动 # # https://car.autohome.com.cn/config/spec/42875.html # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 3) # 2019.11 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 工信部纯电续航里程(km) # carModelMiitEnduranceMileagePureElectric = self.getItemFirstValue(valueContent, 4) # 265 # carModelDict["carModelMiitEnduranceMileagePureElectric"] = carModelMiitEnduranceMileagePureElectric # # 快充时间(小时) # carModelQuickCharge = self.getItemFirstValue(valueContent, 5) # 0.6 # carModelDict["carModelQuickCharge"] = carModelQuickCharge # # 慢充时间(小时) # carModelSlowCharge = self.getItemFirstValue(valueContent, 6) # 17 # carModelDict["carModelSlowCharge"] = carModelSlowCharge # # 快充电量百分比 # carModelQuickChargePercent = self.getItemFirstValue(valueContent, 7) # 80 # carModelDict["carModelQuickChargePercent"] = carModelQuickChargePercent # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 8) # 100 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 9) # 290 # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 电动机(Ps) # carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 10) # 136 # carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 11) # 4237*1785*1548 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 12) # 5门5座SUV # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 13) # 150 # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 14) # - # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 15) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 16) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 实测续航里程(km) # carModelActualTestEnduranceMileage = self.getItemFirstValue(valueContent, 17) # - # carModelDict["carModelActualTestEnduranceMileage"] = carModelActualTestEnduranceMileage # # 实测快充时间(小时) # carModelActualTestQuickCharge = self.getItemFirstValue(valueContent, 18) # - # carModelDict["carModelActualTestQuickCharge"] = carModelActualTestQuickCharge # # 实测慢充时间(小时) # carModelActualTestSlowCharge = self.getItemFirstValue(valueContent, 19) # - # carModelDict["carModelActualTestSlowCharge"] = carModelActualTestSlowCharge # # 整车质保 # carModelWholeWarranty = self.getWholeWarranty(valueContent, 20) # 三年或10万公里 # carModelDict["carModelWholeWarranty"] = carModelWholeWarranty # self.saveSingleResult(carModelDict) # def processPhevCar(self, valueContent, carModelDict): # # 插电式混合动力 = PHEV = Plug-in Hybrid Electric vehicle # # https://car.autohome.com.cn/config/series/4460.html # # 环保标准 # carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国V # carModelDict["carModelEnvStandard"] = carModelEnvStandard # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2018.11 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 工信部纯电续航里程(km) # carModelMiitEnduranceMileagePureElectric = self.getItemFirstValue(valueContent, 5) # 56 # carModelDict["carModelMiitEnduranceMileagePureElectric"] = carModelMiitEnduranceMileagePureElectric # # 快充时间(小时) # carModelQuickCharge = self.getItemFirstValue(valueContent, 6) # 2.5 # carModelDict["carModelQuickCharge"] = carModelQuickCharge # # 慢充时间(小时) # carModelSlowCharge = self.getItemFirstValue(valueContent, 7) # 10.8 # carModelDict["carModelSlowCharge"] = carModelSlowCharge # # 快充电量百分比 # carModelQuickChargePercent = self.getItemFirstValue(valueContent, 8) # - # carModelDict["carModelQuickChargePercent"] = carModelQuickChargePercent # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 9) # 270 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 10) # 700 # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 发动机 # carModelEngine = self.getItemFirstValue(valueContent, 11) # 2.0T 252马力 L4 # carModelDict["carModelEngine"] = carModelEngine # # 电动机(Ps) # carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 12) # 128 # carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric # # 变速箱 # carModelGearBox = self.getItemFirstValue(valueContent, 13) # 8挡手自一体 # carModelDict["carModelGearBox"] = carModelGearBox # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 14) # 5071*1968*1716 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 15) # 5门5座SUV # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 16) # 228 # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 17) # 5.9 # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 18) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 19) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 实测续航里程(km) # carModelActualTestEnduranceMileage = self.getItemFirstValue(valueContent, 20) # - # carModelDict["carModelActualTestEnduranceMileage"] = carModelActualTestEnduranceMileage # # 实测快充时间(小时) # carModelActualTestQuickCharge = self.getItemFirstValue(valueContent, 21) # - # carModelDict["carModelActualTestQuickCharge"] = carModelActualTestQuickCharge # # 实测慢充时间(小时) # carModelActualTestSlowCharge = self.getItemFirstValue(valueContent, 22) # - # carModelDict["carModelActualTestSlowCharge"] = carModelActualTestSlowCharge # # 工信部综合油耗(L/100km) # carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 23) # 2.4 # carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption # # 实测油耗(L/100km) # carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 24) # - # carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption # # 整车质保 # carModelWholeWarranty = self.getWholeWarranty(valueContent, 25) # 三年或10万公里 # carModelDict["carModelWholeWarranty"] = carModelWholeWarranty # self.saveSingleResult(carModelDict) # def processHevCar(self, valueContent, carModelDict): # # 混合电动汽车=HEV=Hybrid Electric Vehicle # # https://car.autohome.com.cn/config/spec/35507.html # # 环保标准 # carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国IV(国V) # carModelDict["carModelEnvStandard"] = carModelEnvStandard # # 上市时间 # carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2018.08 # carModelDict["carModelReleaseTime"] = carModelReleaseTime # # 最大功率(kW) # carModelMaxPower = self.getItemFirstValue(valueContent, 5) # 100 # carModelDict["carModelMaxPower"] = carModelMaxPower # # 最大扭矩(N·m) # carModelMaxTorque = self.getItemFirstValue(valueContent, 6) # - # carModelDict["carModelMaxTorque"] = carModelMaxTorque # # 发动机 # carModelEngine = self.getItemFirstValue(valueContent, 7) # 1.8L 99马力 L4 # carModelDict["carModelEngine"] = carModelEngine # # 电动机(Ps) # carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 8) # 82 # carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric # # 变速箱 # carModelGearBox = self.getItemFirstValue(valueContent, 9) # E-CVT无级变速 # carModelDict["carModelGearBox"] = carModelGearBox # # 长*宽*高(mm) # carModelSize = self.getItemFirstValue(valueContent, 10) # 4360*1765*1455 # carModelDict["carModelSize"] = carModelSize # # 车身结构 # carModelBodyStructure = self.getItemFirstValue(valueContent, 11) # 5门5座SUV # carModelDict["carModelBodyStructure"] = carModelBodyStructure # # 最高车速(km/h) # carModelMaxSpeed = self.getItemFirstValue(valueContent, 12) # - # carModelDict["carModelMaxSpeed"] = carModelMaxSpeed # # 官方0-100km/h加速(s) # carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 13) # - # carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime # # 实测0-100km/h加速(s) # carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 14) # - # carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime # # 实测100-0km/h制动(m) # carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 15) # - # carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance # # 工信部综合油耗(L/100km) # carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 16) # 4.6 # carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption # # 实测油耗(L/100km) # carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 17) # - # carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption # # 整车质保 # carModelWholeWarranty = self.getWholeWarranty(valueContent, 18) # 六年或15万公里 # carModelDict["carModelWholeWarranty"] = carModelWholeWarranty # self.saveSingleResult(carModelDict)
即可 无需运行js,也可以获取结果了:
[ [ "autohome_20200827", { "carBrandId": "91", "carBrandLogoUrl": "https://car3.autoimg.cn/cardfs/series/g26/M05/AE/94/100x100_f40_autohomecar__wKgHEVs9tm6ASWlTAAAUz_2mWTY720.png", "carBrandName": "红旗", "carEnergyType": "汽油", "carMerchantName": "一汽红旗", "carMerchantUrl": "https://car.autohome.com.cn/price/brand-91-190.html#pvareaid=2042363", "carModelActualFuelConsumption": "", "carModelActualTestBrakeDistance": "", "carModelActualTestEnduranceMileage": "", "carModelActualTestQuickCharge": "", "carModelActualTestSlowCharge": "", "carModelActualTestSpeedupTime": "", "carModelBodyStructure": "4门5座三厢车", "carModelDataSift2": "", "carModelDataSift3": "", "carModelDataSift4": "", "carModelDriveType": "前置四驱", "carModelEngine": "6.0L 408马力 V12", "carModelEnvStandard": "未知", "carModelGearBox": "6挡手自一体", "carModelGroupName": "6.0升 自然吸气 408马力 未知", "carModelHorsePowerElectric": "", "carModelMaxPower": "300", "carModelMaxSpeed": "", "carModelMaxTorque": "550", "carModelMiitCompositeFuelConsumption": "", "carModelMiitEnduranceMileagePureElectric": "", "carModelMsrp": "", "carModelName": "2014款 6.0L 帜尊版", "carModelOfficialSpeedupTime": "", "carModelQuickCharge": "", "carModelQuickChargePercent": "", "carModelReleaseTime": "2014.03", "carModelSize": "5555*2018*1578", "carModelSlowCharge": "", "carModelSpecId": "15822", "carModelSpecUrl": "https://www.autohome.com.cn/spec/15822/#pvareaid=3454492", "carModelWholeWarranty": "四年或10万公里", "carModelYear": "2014款", "carSeriesId": "3108", "carSeriesLevelId": "6", "carSeriesLevelName": "大型车", "carSeriesMainImgUrl": "https://car3.autoimg.cn/cardfs/product/g24/M05/13/DB/380x285_0_q87_autohomecar__wKgHH1rdRjyAGXKwAAbriozDnBw527.jpg", "carSeriesMaxPrice": "0.00万", "carSeriesMinPrice": "0.00万", "carSeriesMsrp": "", "carSeriesMsrpUrl": "", "carSeriesName": "红旗L5", "carSeriesUrl": "https://www.autohome.com.cn/3108/#levelsource=000000000_0&pvareaid=101594" }, "https://www.autohome.com.cn/spec/15822/#pvareaid=3454492" ] ]
后记:
转载请注明:在路上 » 【已解决】汽车之家车型车系数据:优化去掉js加速抓取车型参数配置