最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】汽车之家车型车系数据:优化去掉js加速抓取车型参数配置

参数 crifan 1786浏览 0评论
折腾:
【已解决】汽车之家车型车系数据:抓取车型的详细参数配置
期间,已经基本上实现了获取参数配置数据了。
但是有个问题:
进入参数配置页面时,是通过:
        # https://car.autohome.com.cn/config/spec/43593.html
        print("carConfigSpecUrl=%s" % carConfigSpecUrl)
        self.crawl(carConfigSpecUrl,
            fetch_type="js",
            callback=self.carConfigSpecCallback,
            save=carModelDict,
        )
即运行了js部分的
这样后续的
        """
        <table class="tbcs" id="tab_0" style="width: 932px;">
            <tbody>
                <tr>
                    <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5">
                    <h3><span>基本参数</span></h3>
                    </th>
                </tr>
                <tr data-pnid="1_-1" id="tr_0">
        """
        tbodyDoc = response.doc("table[id='tab_0'] tbody")
        print("tbodyDoc=%s" % tbodyDoc)
才不会是空的。
才能获取,运行了js后的,html中表格部分的值,才是已合并后的文字。
但是带来了问题:
每个页面都需要额外运行js,很慢。
如果批量爬取,速度太慢。
现在希望是实现加速:去掉js执行部分。
看看不用js返回的html中,能否直接从json数据中获取这些值。
之前就研究过的,config的json中包含对应的值。
看看如何去提取
不过此处由于单个车系车型太多,不好分析具体的值
换个车系里车型少的,就2个的
https://car.autohome.com.cn/config/spec/42875.html
去看看
发现config处理后:
{
  "message": "<span class='hs_kw29_configpl'></span>",
  "result": {
    "paramtypeitems": [{
      "name": "基本参数",
      "paramitems": [{
        "id": 0,
        "name": "车型<span class='hs_kw33_configpl'></span>",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "<span class='hs_kw9_configpl'></span>Q2L e-tron 2019款 Q2L e-tron 纯电智享型"
        }, {
          "specid": 42875,
          "value": "<span class='hs_kw9_configpl'></span>Q2L e-tron 2019款 Q2L e-tron 纯电智酷型"
        }]
      }, {
        "id": 0,
        "name": "厂<span class='hs_kw15_configpl'></span><span class='hs_kw0_configpl'></span><span class='hs_kw57_configpl'></span><span class='hs_kw55_configpl'></span>(<span class='hs_kw14_configpl'></span>)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "23.73<span class='hs_kw1_configpl'></span>"
        }, {
          "specid": 42875,
          "value": "22.68<span class='hs_kw1_configpl'></span>"
        }]
      }, {
        "id": 52,
        "name": "厂<span class='hs_kw15_configpl'></span>",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "<span class='hs_kw26_configpl'></span>-<span class='hs_kw47_configpl'></span><span class='hs_kw9_configpl'></span>"
        }, {
          "specid": 42875,
          "value": "<span class='hs_kw26_configpl'></span>-<span class='hs_kw47_configpl'></span><span class='hs_kw9_configpl'></span>"
        }]
      }, {
        "id": 53,
        "name": "级别",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "<span class='hs_kw16_configpl'></span>"
        }, {
          "specid": 42875,
          "value": "<span class='hs_kw16_configpl'></span>"
        }]
      }, {
        "id": 1149,
        "name": "能源类型",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "纯电动"
        }, {
          "specid": 42875,
          "value": "纯电动"
        }]
      }, {
        "id": 0,
        "name": "上市<span class='hs_kw40_configpl'></span>",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "2019.11"
        }, {
          "specid": 42875,
          "value": "2019.11"
        }]
      }, {
        "id": 1291,
        "name": "工信部纯电续航里程(km)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "265"
        }, {
          "specid": 42875,
          "value": "265"
        }]
      }, {
        "id": 1292,
        "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "0.6"
        }, {
          "specid": 42875,
          "value": "0.6"
        }]
      }, {
        "id": 0,
        "name": "<span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "17"
        }, {
          "specid": 42875,
          "value": "17"
        }]
      }, {
        "id": 0,
        "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw11_configpl'></span>百分比",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "80"
        }, {
          "specid": 42875,
          "value": "80"
        }]
      }, {
        "id": 1185,
        "name": "<span class='hs_kw8_configpl'></span><span class='hs_kw42_configpl'></span>(kW)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "100"
        }, {
          "specid": 42875,
          "value": "100"
        }]
      }, {
        "id": 1186,
        "name": "<span class='hs_kw8_configpl'></span><span class='hs_kw2_configpl'></span>(N·m)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "290"
        }, {
          "specid": 42875,
          "value": "290"
        }]
      }, {
        "id": 0,
        "name": "电动机(Ps)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "136"
        }, {
          "specid": 42875,
          "value": "136"
        }]
      }, {
        "id": 1148,
        "name": "长*宽*高(mm)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "4237*1785*1548"
        }, {
          "specid": 42875,
          "value": "4237*1785*1548"
        }]
      }, {
        "id": 1147,
        "name": "车身结构",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "5门5座SUV"
        }, {
          "specid": 42875,
          "value": "5门5座SUV"
        }]
      }, {
        "id": 1246,
        "name": "最高车速(km/h)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "150"
        }, {
          "specid": 42875,
          "value": "150"
        }]
      }, {
        "id": 1250,
        "name": "官方0-100km/h加速(s)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "-"
        }, {
          "specid": 42875,
          "value": "-"
        }]
      }, {
        "id": 1252,
        "name": "<span class='hs_kw22_configpl'></span>0-100km/h加速(s)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "-"
        }, {
          "specid": 42875,
          "value": "-"
        }]
      }, {
        "id": 1253,
        "name": "<span class='hs_kw22_configpl'></span>100-0km/h制动(m)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "-"
        }, {
          "specid": 42875,
          "value": "-"
        }]
      }, {
        "id": 0,
        "name": "<span class='hs_kw22_configpl'></span>续航里程(km)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "-"
        }, {
          "specid": 42875,
          "value": "-"
        }]
      }, {
        "id": 0,
        "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "-"
        }, {
          "specid": 42875,
          "value": "-"
        }]
      }, {
        "id": 0,
        "name": "<span class='hs_kw22_configpl'></span><span class='hs_kw10_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "-"
        }, {
          "specid": 42875,
          "value": "-"
        }]
      }, {
        "id": 1255,
        "name": "整车<span class='hs_kw36_configpl'></span>",
        "pnid": "1_-1",
        "valueitems": [{
          "specid": 39893,
          "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
        }, {
          "specid": 42875,
          "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
        }]
      }]
    },
。。。
其中的内容,很方便提取,也和运行js后的结果是一样的
对应着页面上的:
https://car.autohome.com.cn/config/spec/42875.html
所以,还是容易处理的。
所以去把之前的,从html中提出first value的代码:
    def getItemFirstValue(self, rootDoc, trNumber, isRespDoc=False):
        """
        <tr data-pnid="1_-1" id="tr_2">
            <th>
                <div id="1149"><a href="https://car.autohome.com.cn/baike/detail_7_18_1149.html#pvareaid=2042252">能源类型</a>
                </div>
            </th>
            <td style="background:#F0F3F8;">
                <div>纯电动</div>
            </td>


        <tr data-pnid="1_-1" id="tr_3">
            <th>
                <div id="0">上市<span class="hs_kw40_configxv"></span></div>
            </th>
            <td style="background:#F0F3F8;">
                <div>2019.11</div>
            </td>
            <td>
                <div>2019.11</div>
            </td>
            <td>
                <div></div>
            </td>
            <td>
                <div></div>
            </td>
        </tr>
        """
        trQuery = "tr[id='tr_%s']" % trNumber
        # print("trQuery=%s" % trQuery)
        trDoc = rootDoc.find(trQuery)
        # print("trDoc=%s" % trDoc)
        tdDocGenerator = trDoc.items("td")
        # print("tdDocGenerator=%s" % tdDocGenerator)
        tdDocList = list(tdDocGenerator)
        # print("tdDocList=%s" % tdDocList)
        firstTdDoc = tdDocList[0]
        # print("firstTdDoc=%s" % firstTdDoc)
        firstTdDivDoc = firstTdDoc.find("div")
        print("firstTdDivDoc=%s" % firstTdDivDoc)
        if isRespDoc:
            respItem = firstTdDivDoc
        else:
            firstItemValue = firstTdDivDoc.text()
            respItem = firstItemValue
        print("respItem=%s" % respItem)
        return respItem
去改为从config中提取
这样就不用run js了。
最后代码是:

    def getItemFirstValue(self, inputContent, itemIndex):
        print("in getItemFirstValue")
        # firstItemValue = self.extractTrFirstTdValue(inputContent, itemIndex)
        firstItemValue = self.extractDictListFirstValue(inputContent, itemIndex)
        return firstItemValue


    def extractDictListFirstValue(self, paramItemDictList, itemIndex):
        """
        [
            ...,
            {
                "id": 1149,
                "name": "能源类型",
                "pnid": "1_-1",
                "valueitems": [{
                    "specid": 39893,
                    "value": "纯电动"
                }, {
                    "specid": 42875,
                    "value": "纯电动"
                }]
            }
            ...,
            {
                "id": 1292,
                "name": "<span class='hs_kw39_configpl'></span><span class='hs_kw40_configpl'></span>(小时)",
                "pnid": "1_-1",
                "valueitems": [{
                    "specid": 39893,
                    "value": "0.6"
                }, {
                    "specid": 42875,
                    "value": "0.6"
                }]
            },
            ...
            ,
            {
                "id": 1255,
                "name": "整车<span class='hs_kw36_configpl'></span>",
                "pnid": "1_-1",
                "valueitems": [{
                    "specid": 39893,
                    "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
                }, {
                    "specid": 42875,
                    "value": "三<span class='hs_kw7_configpl'></span>10<span class='hs_kw1_configpl'></span>公里"
                }]
            }
        ]
        """
        paramItemDict = paramItemDictList[itemIndex]
        print("paramItemDict=%s" % paramItemDict)
        valueItemList = paramItemDict["valueitems"]
        print("valueItemList=%s" % valueItemList)
        firstItemDict = valueItemList[0]
        print("firstItemDict=%s" % firstItemDict)
        # firstItemDict={'specid': 43593, 'value': "<span class='hs_kw57_configxt'></span>-<span class='hs_kw21_configxt'></span><span class='hs_kw24_configxt'></span>"}
        firtItemValue = firstItemDict["value"]
        # firtItemValue=<span class='hs_kw57_configxt'></span>-<span class='hs_kw21_configxt'></span><span class='hs_kw24_configxt'></span>
        print("firtItemValue=%s" % firtItemValue)
        return firtItemValue


    # def extractTrFirstTdValue(self, rootDoc, trNumber, isRespDoc=False):
    def extractTrFirstTdValue(self, rootDoc, trNumber):
        """
        <tr data-pnid="1_-1" id="tr_2">
            <th>
                <div id="1149"><a href="https://car.autohome.com.cn/baike/detail_7_18_1149.html#pvareaid=2042252">能源类型</a>
                </div>
            </th>
            <td style="background:#F0F3F8;">
                <div>纯电动</div>
            </td>


        <tr data-pnid="1_-1" id="tr_3">
            <th>
                <div id="0">上市<span class="hs_kw40_configxv"></span></div>
            </th>
            <td style="background:#F0F3F8;">
                <div>2019.11</div>
            </td>
            <td>
                <div>2019.11</div>
            </td>
            <td>
                <div></div>
            </td>
            <td>
                <div></div>
            </td>
        </tr>


        <tr data-pnid="1_-1" id="tr_20" style="background: rgb(255, 255, 255);">
            <th>
                <div id="1255"><a href="https://car.autohome.com.cn/baike/detail_7_18_1255.html#pvareaid=2042252">整车<span
                    class="hs_kw36_configaJ"></span></a></div>
            </th>
            <td style="background:#F0F3F8;">
                <div>三<span class="hs_kw7_configaJ"></span>10<span class="hs_kw1_configaJ"></span>公里</div>
            </td>
            <td>
                <div>三<span class="hs_kw7_configaJ"></span>10<span class="hs_kw1_configaJ"></span>公里</div>
            </td>
            <td>
                <div></div>
            </td>
            <td>
                <div></div>
            </td>
        </tr>
        """
        trQuery = "tr[id='tr_%s']" % trNumber
        # print("trQuery=%s" % trQuery)
        trDoc = rootDoc.find(trQuery)
        # print("trDoc=%s" % trDoc)
        tdDocGenerator = trDoc.items("td")
        # print("tdDocGenerator=%s" % tdDocGenerator)
        tdDocList = list(tdDocGenerator)
        # print("tdDocList=%s" % tdDocList)
        firstTdDoc = tdDocList[0]
        # print("firstTdDoc=%s" % firstTdDoc)
        firstTdDivDoc = firstTdDoc.find("div")
        print("firstTdDivDoc=%s" % firstTdDivDoc)
        # if isRespDoc:
        #     respItem = firstTdDivDoc
        # else:
        #     firstItemValue = firstTdDivDoc.text()
        #     respItem = firstItemValue
        # print("respItem=%s" % respItem)
        # return respItem
        respItemHtml = firstTdDivDoc.html()
        print("respItemHtml=%s" % respItemHtml)
        return respItemHtml
以及相关的函数也优化一下
比如 整车质保:

    # def extractWholeWarranty(self, firstDivDoc):
    def extractWholeWarranty(self, firstDivHtml):
        carModelWholeWarranty = ""
        # <div>三<span class="hs_kw7_configxv"></span>10<span class="hs_kw1_configxv"></span>公里</div>
        # print("firstDivDoc=%s" % firstDivDoc)
        # carModelWholeWarranty = firstDivDoc.text() # 三10公里
        # firstDivHtml = firstDivDoc.html()
        print("firstDivHtml=%s" % firstDivHtml)
        # 三<span class="hs_kw7_configCC"></span>10<span class="hs_kw1_configCC"></span>公里
        # carWholeQualityQuarantee = re.sub("[^<>]+(?P<firstSpan><span.+?></span>)[^<>]+(?P<secondSpan><span.+?></span>)[^<>]+", )
        foundYearDistance = re.search("(?P<warrantyYear>[^<>]+)<span.+?></span>(?P<distanceNumber>[^<>]+)<span.+?></span>(?P<distanceUnit>[^<>]+)", firstDivHtml)
        if foundYearDistance:
            warrantyYear = foundYearDistance.group("warrantyYear")
            distanceNumber = foundYearDistance.group("distanceNumber")
            distanceUnit = foundYearDistance.group("distanceUnit")
            carModelWholeWarranty = "%s年或%s万%s" % (warrantyYear, distanceNumber, distanceUnit)
        else:
            # special:
            # https://car.autohome.com.cn/config/spec/46700.html
            # <div>三<span class="hs_kw58_configWh"></span></div>
            # 三<span class="hs_kw58_configOf"></span>
            foundYearNotLimitDistance = re.search("(?P<warrantyYear>[^<>]+)<span.+?></span>", firstDivHtml)
            if foundYearNotLimitDistance:
                warrantyYear = foundYearNotLimitDistance.group("warrantyYear")
                carModelWholeWarranty = "%s年不限公里" % warrantyYear
        print("carModelWholeWarranty=%s" % carModelWholeWarranty)
        return carModelWholeWarranty


    def getWholeWarranty(self, inputContent, itemIndex):
        # firstDivDoc = self.getItemFirstValue(inputContent, itemIndex, isRespDoc=True)
        # print("firstDivDoc=%s" % firstDivDoc)
        # carModelWholeWarranty = self.extractWholeWarranty(firstDivDoc)
        firstDivDocHtml = self.getItemFirstValue(inputContent, itemIndex)
        print("firstDivDocHtml=%s" % firstDivDocHtml)
        carModelWholeWarranty = self.extractWholeWarranty(firstDivDocHtml)
        return carModelWholeWarranty
也去把处理不同能源类型的共有部分整合处理,最后是:

    @catch_status_code_error
    def carConfigSpecCallback(self, response):
        print("in carConfigSpecCallback")
        curCarModelDict = response.save
        print("curCarModelDict=%s" % curCarModelDict)
        carModelDict = copy.deepcopy(curCarModelDict)


        configSpecHtml = response.text
        # print("configSpecHtml=%s" % configSpecHtml)
        # print("")


        # # for debug
        # return


        # config json item index - spec table html item index = 2
        ItemIndexDiff = 2


        isUseSpecTableHtml = True
        isUseConfigJson = False
        valueContent = None
        energyTypeIdx = 2


        # # Method 1: after run js, extract item value from spec table html
        # """
        # <table class="tbcs" id="tab_0" style="width: 932px;">
        #     <tbody>
        #         <tr>
        #             <th class="cstitle" show="1" pid="tab_0" id="nav_meto_0" colspan="5">
        #             <h3><span>基本参数</span></h3>
        #             </th>
        #         </tr>
        #         <tr data-pnid="1_-1" id="tr_0">
        # """
        # tbodyDoc = response.doc("table[id='tab_0'] tbody")
        # print("tbodyDoc=%s" % tbodyDoc)
        # valueContent = tbodyDoc
        # isUseSpecTableHtml = True
        # isUseConfigJson = False
        # energyTypeIdx = 2


        # Method 2: not run js, extract item value from config json
        # get value from config json
        # var config = {"message" ...... "returncode":"0","taskid":"8be676a3-e023-4fa9-826d-09cd42a1810c","time":"2020-08-27 20:56:17"};
        foundConfigJson = re.search("var\s*config\s*=\s*(?P<configJson>\{[^;]+\});", configSpecHtml)
        print("foundConfigJson=%s" % foundConfigJson)
        if foundConfigJson:
            configJson = foundConfigJson.group("configJson")
            print("configJson=%s" % configJson)
            # configDict = json.loads(configJson, encoding="utf-8")
            configDict = json.loads(configJson)
            print("configDict=%s" % configDict)


            # if "result" in configDict:
            configResultDict = configDict["result"]
            print("configResultDict=%s" % configResultDict)
            # if "paramtypeitems" in configResultDict:
            paramTypeItemDictList = configResultDict["paramtypeitems"]
            print("paramTypeItemDictList=%s" % paramTypeItemDictList)
            # paramTypeItemNum = len(paramTypeItemDictList)
            # print("paramTypeItemNum=%s" % paramTypeItemNum)
            basicParamDict = paramTypeItemDictList[0]
            print("basicParamDict=%s" % basicParamDict)
            basicItemDictList = basicParamDict["paramitems"]
            print("basicItemDictList=%s" % basicItemDictList)
            # print("type(basicItemDictList)=%s" % type(basicItemDictList))
            # basicItemNum = len(basicItemDictList)
            # print("basicItemNum=%s" % basicItemNum)


            valueContent = basicItemDictList
            isUseSpecTableHtml = False
            isUseConfigJson = True


        if isUseConfigJson:
            energyTypeIdx += ItemIndexDiff


        if valueContent:
            carEnergyType = self.getItemFirstValue(valueContent, energyTypeIdx)
            # 纯电动 / 汽油 / 插电式混合动力 / 油电混合
            carModelDict["carEnergyType"] = carEnergyType


            if carEnergyType == "汽油":
                # https://car.autohome.com.cn/config/spec/43593.html
                # https://car.autohome.com.cn/config/spec/41572.html


                # self.processGasolineCar(valueContent, carModelDict)


                gasolineCarKeyIdxMapDict = {
                    "carModelEnvStandard" : 3,
                    "carModelReleaseTime" : 4,
                    "carModelMaxPower" : 5,
                    "carModelMaxTorque" : 6,
                    "carModelEngine" : 7,
                    "carModelGearBox" : 8,
                    "carModelSize" : 9,
                    "carModelBodyStructure" : 10,
                    "carModelMaxSpeed" : 11,
                    "carModelOfficialSpeedupTime" : 12,
                    "carModelActualTestSpeedupTime" : 13,
                    "carModelActualTestBrakeDistance" : 14,
                    "carModelMiitCompositeFuelConsumption" : 15,
                    "carModelActualFuelConsumption" : 16,
                }
                wholeWarrantyIdx = 17


                if isUseConfigJson:
                    for eachKey in gasolineCarKeyIdxMapDict.keys():
                        gasolineCarKeyIdxMapDict[eachKey] += ItemIndexDiff
                    wholeWarrantyIdx += ItemIndexDiff


                self.processSingleEneryTypeCar(gasolineCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)


            elif carEnergyType == "纯电动":
                # https://car.autohome.com.cn/config/spec/42875.html


                # self.processPureElectricCar(valueContent, carModelDict)


                pureElectricCarKeyIdxMapDict = {
                    "carModelReleaseTime": 3,
                    "carModelMiitEnduranceMileagePureElectric": 4,
                    "carModelQuickCharge": 5,
                    "carModelSlowCharge": 6,
                    "carModelQuickChargePercent": 7,
                    "carModelMaxPower": 8,
                    "carModelMaxTorque": 9,
                    "carModelHorsePowerElectric": 10,
                    "carModelSize": 11,
                    "carModelBodyStructure": 12,
                    "carModelMaxSpeed": 13,
                    "carModelOfficialSpeedupTime": 14,
                    "carModelActualTestSpeedupTime": 15,
                    "carModelActualTestBrakeDistance": 16,
                    "carModelActualTestEnduranceMileage": 17,
                    "carModelActualTestQuickCharge": 18,
                    "carModelActualTestSlowCharge": 19,
                }
                wholeWarrantyIdx = 20


                if isUseConfigJson:
                    for eachKey in pureElectricCarKeyIdxMapDict.keys():
                        pureElectricCarKeyIdxMapDict[eachKey] += ItemIndexDiff
                    wholeWarrantyIdx += ItemIndexDiff


                self.processSingleEneryTypeCar(pureElectricCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)


            elif carEnergyType == "插电式混合动力":
                # https://car.autohome.com.cn/config/series/4460.html


                # self.processPhevCar(valueContent, carModelDict)


                phevCarKeyIdxMapDict = {
                    "carModelEnvStandard": 3,
                    "carModelReleaseTime": 4,
                    "carModelMiitEnduranceMileagePureElectric": 5,
                    "carModelQuickCharge": 6,
                    "carModelSlowCharge": 7,
                    "carModelQuickChargePercent": 8,
                    "carModelMaxPower": 9,
                    "carModelMaxTorque": 10,
                    "carModelEngine": 11,
                    "carModelHorsePowerElectric": 12,
                    "carModelGearBox": 13,
                    "carModelSize": 14,
                    "carModelBodyStructure": 15,
                    "carModelMaxSpeed": 16,
                    "carModelOfficialSpeedupTime": 17,
                    "carModelActualTestSpeedupTime": 18,
                    "carModelActualTestBrakeDistance": 19,
                    "carModelActualTestEnduranceMileage": 20,
                    "carModelActualTestQuickCharge": 21,
                    "carModelActualTestSlowCharge": 22,
                    "carModelMiitCompositeFuelConsumption": 23,
                    "carModelActualFuelConsumption": 24,
                }
                wholeWarrantyIdx = 25


                if isUseConfigJson:
                    for eachKey in phevCarKeyIdxMapDict.keys():
                        phevCarKeyIdxMapDict[eachKey] += ItemIndexDiff
                    wholeWarrantyIdx += ItemIndexDiff


                self.processSingleEneryTypeCar(phevCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)


            elif carEnergyType == "油电混合":
                # https://car.autohome.com.cn/config/spec/35507.html


                # self.processHevCar(valueContent, carModelDict)


                hevCarKeyIdxMapDict = {
                    "carModelEnvStandard": 3,
                    "carModelReleaseTime": 4,
                    "carModelMaxPower": 5,
                    "carModelMaxTorque": 6,
                    "carModelEngine": 7,
                    "carModelHorsePowerElectric": 8,
                    "carModelGearBox": 9,
                    "carModelSize": 10,
                    "carModelBodyStructure": 11,
                    "carModelMaxSpeed": 12,
                    "carModelOfficialSpeedupTime": 13,
                    "carModelActualTestSpeedupTime": 14,
                    "carModelActualTestBrakeDistance": 15,
                    "carModelMiitCompositeFuelConsumption": 16,
                    "carModelActualFuelConsumption": 17,
                }
                wholeWarrantyIdx = 18


                if isUseConfigJson:
                    for eachKey in hevCarKeyIdxMapDict.keys():
                        hevCarKeyIdxMapDict[eachKey] += ItemIndexDiff
                    wholeWarrantyIdx += ItemIndexDiff


                self.processSingleEneryTypeCar(hevCarKeyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict)
            else:
                errMsg = "TODO: add support %s!" % carEnergyType
                raise Exception(errMsg)
        else:
            self.saveSingleResult(carModelDict)


    def processSingleEneryTypeCar(self, keyIdxMapDict, valueContent, wholeWarrantyIdx, carModelDict):
        for eachItemKey in keyIdxMapDict.keys():
            print("eachItemKey=%s" % eachItemKey)
            eachItemIndex = keyIdxMapDict[eachItemKey]
            print("eachItemIndex=%s" % eachItemIndex)
            eachItemValue = self.getItemFirstValue(valueContent, eachItemIndex)
            print("eachItemValue=%s" % eachItemValue)
            carModelDict[eachItemKey] = eachItemValue


        # 整车质保
        carModelWholeWarranty = self.getWholeWarranty(valueContent, wholeWarrantyIdx) # 三年或10万公里
        print("carModelWholeWarranty=%s" % carModelWholeWarranty)
        carModelDict["carModelWholeWarranty"] = carModelWholeWarranty


        self.saveSingleResult(carModelDict)
然后把之前的都注释掉了:

    # def processGasolineCar(self, valueContent, carModelDict):
    #     # 汽油


    #     # https://car.autohome.com.cn/config/spec/43593.html
    #     # https://car.autohome.com.cn/config/spec/41572.html


    #     # 环保标准
    #     carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国VI
    #     carModelDict["carModelEnvStandard"] = carModelEnvStandard


    #     # 上市时间
    #     carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2020.04
    #     carModelDict["carModelReleaseTime"] = carModelReleaseTime


    #     # 最大功率(kW)
    #     carModelMaxPower = self.getItemFirstValue(valueContent, 5) # 110
    #     carModelDict["carModelMaxPower"] = carModelMaxPower


    #     # 最大扭矩(N·m)
    #     carModelMaxTorque = self.getItemFirstValue(valueContent, 6) # 250
    #     carModelDict["carModelMaxTorque"] = carModelMaxTorque


    #     # 发动机
    #     carModelEngine = self.getItemFirstValue(valueContent, 7) # 1.4T 150马力 L4
    #     carModelDict["carModelEngine"] = carModelEngine


    #     # 变速箱
    #     carModelGearBox = self.getItemFirstValue(valueContent, 8) # 7挡双离合
    #     carModelDict["carModelGearBox"] = carModelGearBox


    #     # 长*宽*高(mm)
    #     carModelSize = self.getItemFirstValue(valueContent, 9) # 4312*1785*1426
    #     carModelDict["carModelSize"] = carModelSize


    #     # 车身结构
    #     carModelBodyStructure = self.getItemFirstValue(valueContent, 10) # 5门5座两厢车
    #     carModelDict["carModelBodyStructure"] = carModelBodyStructure


    #     # 最高车速(km/h)
    #     carModelMaxSpeed = self.getItemFirstValue(valueContent, 11) # 200
    #     carModelDict["carModelMaxSpeed"] = carModelMaxSpeed


    #     # 官方0-100km/h加速(s)
    #     carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 12) # 8.4
    #     carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime


    #     # 实测0-100km/h加速(s)
    #     carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 13) # -
    #     carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime


    #     # 实测100-0km/h制动(m)
    #     carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 14) # -
    #     carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance


    #     # 工信部综合油耗(L/100km)
    #     carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 15) # 5.8
    #     carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption


    #     # 实测油耗(L/100km)
    #     carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 16) # -
    #     carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption


    #     self.saveSingleResult(carModelDict)


    # def processPureElectricCar(self, valueContent, carModelDict):
    #     # 纯电动


    #     # https://car.autohome.com.cn/config/spec/42875.html


    #     # 上市时间
    #     carModelReleaseTime = self.getItemFirstValue(valueContent, 3) # 2019.11
    #     carModelDict["carModelReleaseTime"] = carModelReleaseTime


    #     # 工信部纯电续航里程(km)
    #     carModelMiitEnduranceMileagePureElectric = self.getItemFirstValue(valueContent, 4) # 265
    #     carModelDict["carModelMiitEnduranceMileagePureElectric"] = carModelMiitEnduranceMileagePureElectric


    #     # 快充时间(小时)
    #     carModelQuickCharge = self.getItemFirstValue(valueContent, 5) # 0.6
    #     carModelDict["carModelQuickCharge"] = carModelQuickCharge


    #     # 慢充时间(小时)
    #     carModelSlowCharge = self.getItemFirstValue(valueContent, 6) # 17
    #     carModelDict["carModelSlowCharge"] = carModelSlowCharge


    #     # 快充电量百分比
    #     carModelQuickChargePercent = self.getItemFirstValue(valueContent, 7) # 80
    #     carModelDict["carModelQuickChargePercent"] = carModelQuickChargePercent


    #     # 最大功率(kW)
    #     carModelMaxPower = self.getItemFirstValue(valueContent, 8) # 100
    #     carModelDict["carModelMaxPower"] = carModelMaxPower


    #     # 最大扭矩(N·m)
    #     carModelMaxTorque = self.getItemFirstValue(valueContent, 9) # 290
    #     carModelDict["carModelMaxTorque"] = carModelMaxTorque


    #     # 电动机(Ps)
    #     carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 10) # 136
    #     carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric


    #     # 长*宽*高(mm)
    #     carModelSize = self.getItemFirstValue(valueContent, 11) # 4237*1785*1548
    #     carModelDict["carModelSize"] = carModelSize


    #     # 车身结构
    #     carModelBodyStructure = self.getItemFirstValue(valueContent, 12) # 5门5座SUV
    #     carModelDict["carModelBodyStructure"] = carModelBodyStructure


    #     # 最高车速(km/h)
    #     carModelMaxSpeed = self.getItemFirstValue(valueContent, 13) # 150
    #     carModelDict["carModelMaxSpeed"] = carModelMaxSpeed


    #     # 官方0-100km/h加速(s)
    #     carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 14) # -
    #     carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime


    #     # 实测0-100km/h加速(s)
    #     carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 15) # -
    #     carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime


    #     # 实测100-0km/h制动(m)
    #     carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 16) # -
    #     carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance


    #     # 实测续航里程(km)
    #     carModelActualTestEnduranceMileage = self.getItemFirstValue(valueContent, 17) # -
    #     carModelDict["carModelActualTestEnduranceMileage"] = carModelActualTestEnduranceMileage


    #     # 实测快充时间(小时)
    #     carModelActualTestQuickCharge = self.getItemFirstValue(valueContent, 18) # -
    #     carModelDict["carModelActualTestQuickCharge"] = carModelActualTestQuickCharge


    #     # 实测慢充时间(小时)
    #     carModelActualTestSlowCharge = self.getItemFirstValue(valueContent, 19) # -
    #     carModelDict["carModelActualTestSlowCharge"] = carModelActualTestSlowCharge


    #     # 整车质保
    #     carModelWholeWarranty = self.getWholeWarranty(valueContent, 20) # 三年或10万公里
    #     carModelDict["carModelWholeWarranty"] = carModelWholeWarranty


    #     self.saveSingleResult(carModelDict)


    # def processPhevCar(self, valueContent, carModelDict):
    #     # 插电式混合动力 = PHEV = Plug-in Hybrid Electric vehicle


    #     # https://car.autohome.com.cn/config/series/4460.html


    #     # 环保标准
    #     carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国V
    #     carModelDict["carModelEnvStandard"] = carModelEnvStandard


    #     # 上市时间
    #     carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2018.11
    #     carModelDict["carModelReleaseTime"] = carModelReleaseTime


    #     # 工信部纯电续航里程(km)
    #     carModelMiitEnduranceMileagePureElectric = self.getItemFirstValue(valueContent, 5) # 56
    #     carModelDict["carModelMiitEnduranceMileagePureElectric"] = carModelMiitEnduranceMileagePureElectric


    #     # 快充时间(小时)
    #     carModelQuickCharge = self.getItemFirstValue(valueContent, 6) # 2.5
    #     carModelDict["carModelQuickCharge"] = carModelQuickCharge


    #     # 慢充时间(小时)
    #     carModelSlowCharge = self.getItemFirstValue(valueContent, 7) # 10.8
    #     carModelDict["carModelSlowCharge"] = carModelSlowCharge


    #     # 快充电量百分比
    #     carModelQuickChargePercent = self.getItemFirstValue(valueContent, 8) # -
    #     carModelDict["carModelQuickChargePercent"] = carModelQuickChargePercent


    #     # 最大功率(kW)
    #     carModelMaxPower = self.getItemFirstValue(valueContent, 9) # 270
    #     carModelDict["carModelMaxPower"] = carModelMaxPower


    #     # 最大扭矩(N·m)
    #     carModelMaxTorque = self.getItemFirstValue(valueContent, 10) # 700
    #     carModelDict["carModelMaxTorque"] = carModelMaxTorque


    #     # 发动机
    #     carModelEngine = self.getItemFirstValue(valueContent, 11) # 2.0T 252马力 L4
    #     carModelDict["carModelEngine"] = carModelEngine


    #     # 电动机(Ps)
    #     carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 12) # 128
    #     carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric


    #     # 变速箱
    #     carModelGearBox = self.getItemFirstValue(valueContent, 13) # 8挡手自一体
    #     carModelDict["carModelGearBox"] = carModelGearBox


    #     # 长*宽*高(mm)
    #     carModelSize = self.getItemFirstValue(valueContent, 14) # 5071*1968*1716
    #     carModelDict["carModelSize"] = carModelSize


    #     # 车身结构
    #     carModelBodyStructure = self.getItemFirstValue(valueContent, 15) # 5门5座SUV
    #     carModelDict["carModelBodyStructure"] = carModelBodyStructure


    #     # 最高车速(km/h)
    #     carModelMaxSpeed = self.getItemFirstValue(valueContent, 16) # 228
    #     carModelDict["carModelMaxSpeed"] = carModelMaxSpeed


    #     # 官方0-100km/h加速(s)
    #     carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 17) # 5.9
    #     carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime


    #     # 实测0-100km/h加速(s)
    #     carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 18) # -
    #     carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime


    #     # 实测100-0km/h制动(m)
    #     carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 19) # -
    #     carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance


    #     # 实测续航里程(km)
    #     carModelActualTestEnduranceMileage = self.getItemFirstValue(valueContent, 20) # -
    #     carModelDict["carModelActualTestEnduranceMileage"] = carModelActualTestEnduranceMileage


    #     # 实测快充时间(小时)
    #     carModelActualTestQuickCharge = self.getItemFirstValue(valueContent, 21) # -
    #     carModelDict["carModelActualTestQuickCharge"] = carModelActualTestQuickCharge


    #     # 实测慢充时间(小时)
    #     carModelActualTestSlowCharge = self.getItemFirstValue(valueContent, 22) # -
    #     carModelDict["carModelActualTestSlowCharge"] = carModelActualTestSlowCharge


    #     # 工信部综合油耗(L/100km)
    #     carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 23) # 2.4
    #     carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption


    #     # 实测油耗(L/100km)
    #     carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 24) # -
    #     carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption


    #     # 整车质保
    #     carModelWholeWarranty = self.getWholeWarranty(valueContent, 25) # 三年或10万公里
    #     carModelDict["carModelWholeWarranty"] = carModelWholeWarranty


    #     self.saveSingleResult(carModelDict)


    # def processHevCar(self, valueContent, carModelDict):
    #     # 混合电动汽车=HEV=Hybrid Electric Vehicle


    #     # https://car.autohome.com.cn/config/spec/35507.html


    #     # 环保标准
    #     carModelEnvStandard = self.getItemFirstValue(valueContent, 3) # 国IV(国V)
    #     carModelDict["carModelEnvStandard"] = carModelEnvStandard


    #     # 上市时间
    #     carModelReleaseTime = self.getItemFirstValue(valueContent, 4) # 2018.08
    #     carModelDict["carModelReleaseTime"] = carModelReleaseTime


    #     # 最大功率(kW)
    #     carModelMaxPower = self.getItemFirstValue(valueContent, 5) # 100
    #     carModelDict["carModelMaxPower"] = carModelMaxPower


    #     # 最大扭矩(N·m)
    #     carModelMaxTorque = self.getItemFirstValue(valueContent, 6) # -
    #     carModelDict["carModelMaxTorque"] = carModelMaxTorque


    #     # 发动机
    #     carModelEngine = self.getItemFirstValue(valueContent, 7) # 1.8L 99马力 L4
    #     carModelDict["carModelEngine"] = carModelEngine


    #     # 电动机(Ps)
    #     carModelHorsePowerElectric = self.getItemFirstValue(valueContent, 8) # 82
    #     carModelDict["carModelHorsePowerElectric"] = carModelHorsePowerElectric


    #     # 变速箱
    #     carModelGearBox = self.getItemFirstValue(valueContent, 9) # E-CVT无级变速
    #     carModelDict["carModelGearBox"] = carModelGearBox


    #     # 长*宽*高(mm)
    #     carModelSize = self.getItemFirstValue(valueContent, 10) # 4360*1765*1455
    #     carModelDict["carModelSize"] = carModelSize


    #     # 车身结构
    #     carModelBodyStructure = self.getItemFirstValue(valueContent, 11) # 5门5座SUV
    #     carModelDict["carModelBodyStructure"] = carModelBodyStructure


    #     # 最高车速(km/h)
    #     carModelMaxSpeed = self.getItemFirstValue(valueContent, 12) # -
    #     carModelDict["carModelMaxSpeed"] = carModelMaxSpeed


    #     # 官方0-100km/h加速(s)
    #     carModelOfficialSpeedupTime = self.getItemFirstValue(valueContent, 13) # -
    #     carModelDict["carModelOfficialSpeedupTime"] = carModelOfficialSpeedupTime


    #     # 实测0-100km/h加速(s)
    #     carModelActualTestSpeedupTime = self.getItemFirstValue(valueContent, 14) # -
    #     carModelDict["carModelActualTestSpeedupTime"] = carModelActualTestSpeedupTime


    #     # 实测100-0km/h制动(m)
    #     carModelActualTestBrakeDistance = self.getItemFirstValue(valueContent, 15) # -
    #     carModelDict["carModelActualTestBrakeDistance"] = carModelActualTestBrakeDistance


    #     # 工信部综合油耗(L/100km)
    #     carModelMiitCompositeFuelConsumption = self.getItemFirstValue(valueContent, 16) # 4.6
    #     carModelDict["carModelMiitCompositeFuelConsumption"] = carModelMiitCompositeFuelConsumption


    #     # 实测油耗(L/100km)
    #     carModelActualFuelConsumption = self.getItemFirstValue(valueContent, 17) # -
    #     carModelDict["carModelActualFuelConsumption"] = carModelActualFuelConsumption


    #     # 整车质保
    #     carModelWholeWarranty = self.getWholeWarranty(valueContent, 18) # 六年或15万公里
    #     carModelDict["carModelWholeWarranty"] = carModelWholeWarranty


    #     self.saveSingleResult(carModelDict)
即可 无需运行js,也可以获取结果了:
[
  [
    "autohome_20200827",
    {
      "carBrandId": "91",
      "carBrandLogoUrl": "https://car3.autoimg.cn/cardfs/series/g26/M05/AE/94/100x100_f40_autohomecar__wKgHEVs9tm6ASWlTAAAUz_2mWTY720.png",
      "carBrandName": "红旗",
      "carEnergyType": "汽油",
      "carMerchantName": "一汽红旗",
      "carMerchantUrl": "https://car.autohome.com.cn/price/brand-91-190.html#pvareaid=2042363",
      "carModelActualFuelConsumption": "",
      "carModelActualTestBrakeDistance": "",
      "carModelActualTestEnduranceMileage": "",
      "carModelActualTestQuickCharge": "",
      "carModelActualTestSlowCharge": "",
      "carModelActualTestSpeedupTime": "",
      "carModelBodyStructure": "4门5座三厢车",
      "carModelDataSift2": "",
      "carModelDataSift3": "",
      "carModelDataSift4": "",
      "carModelDriveType": "前置四驱",
      "carModelEngine": "6.0L 408马力 V12",
      "carModelEnvStandard": "未知",
      "carModelGearBox": "6挡手自一体",
      "carModelGroupName": "6.0升 自然吸气 408马力 未知",
      "carModelHorsePowerElectric": "",
      "carModelMaxPower": "300",
      "carModelMaxSpeed": "",
      "carModelMaxTorque": "550",
      "carModelMiitCompositeFuelConsumption": "",
      "carModelMiitEnduranceMileagePureElectric": "",
      "carModelMsrp": "",
      "carModelName": "2014款 6.0L 帜尊版",
      "carModelOfficialSpeedupTime": "",
      "carModelQuickCharge": "",
      "carModelQuickChargePercent": "",
      "carModelReleaseTime": "2014.03",
      "carModelSize": "5555*2018*1578",
      "carModelSlowCharge": "",
      "carModelSpecId": "15822",
      "carModelSpecUrl": "https://www.autohome.com.cn/spec/15822/#pvareaid=3454492",
      "carModelWholeWarranty": "四年或10万公里",
      "carModelYear": "2014款",
      "carSeriesId": "3108",
      "carSeriesLevelId": "6",
      "carSeriesLevelName": "大型车",
      "carSeriesMainImgUrl": "https://car3.autoimg.cn/cardfs/product/g24/M05/13/DB/380x285_0_q87_autohomecar__wKgHH1rdRjyAGXKwAAbriozDnBw527.jpg",
      "carSeriesMaxPrice": "0.00万",
      "carSeriesMinPrice": "0.00万",
      "carSeriesMsrp": "",
      "carSeriesMsrpUrl": "",
      "carSeriesName": "红旗L5",
      "carSeriesUrl": "https://www.autohome.com.cn/3108/#levelsource=000000000_0&pvareaid=101594"
    },
    "https://www.autohome.com.cn/spec/15822/#pvareaid=3454492"
  ]
]
后记:

转载请注明:在路上 » 【已解决】汽车之家车型车系数据:优化去掉js加速抓取车型参数配置

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
80 queries in 0.224 seconds, using 22.49MB memory