【记录】用Python的Scrapy去爬取cbeebies.com

需要去爬取

http://global.cbeebies.com/

中的儿童音频资源。

scrapy

中文教程：

Scrapy入门教程 — Scrapy 0.24.6 文档

英文教程：

Scrapy Tutorial — Scrapy 1.4.0 documentation

官网：

Scrapy | A Fast and Powerful Scraping and Web Crawling Framework

Scrapy爬虫框架教程（一）– Scrapy入门

先去Mac下安装和配置Scrapy：

➜ scrapy pip install scrapy

Collecting scrapy

Downloading Scrapy-1.4.0-py2.py3-none-any.whl (248kB)

100% |████████████████████████████████| 256kB 96kB/s

Collecting parsel>=1.1 (from scrapy)

Downloading parsel-1.2.0-py2.py3-none-any.whl

Collecting service-identity (from scrapy)

Downloading service_identity-17.0.0-py2.py3-none-any.whl

Collecting lxml (from scrapy)

Downloading lxml-4.1.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (8.7MB)

100% |████████████████████████████████| 8.7MB 41kB/s

Collecting cssselect>=0.9 (from scrapy)

Downloading cssselect-1.0.1-py2.py3-none-any.whl

Collecting w3lib>=1.17.0 (from scrapy)

Downloading w3lib-1.18.0-py2.py3-none-any.whl

Collecting queuelib (from scrapy)

Downloading queuelib-1.4.2-py2.py3-none-any.whl

Collecting PyDispatcher>=2.0.5 (from scrapy)

Downloading PyDispatcher-2.0.5.tar.gz

Collecting pyOpenSSL (from scrapy)

Downloading pyOpenSSL-17.5.0-py2.py3-none-any.whl (53kB)

100% |████████████████████████████████| 61kB 32kB/s

Collecting Twisted>=13.1.0 (from scrapy)

Downloading Twisted-17.9.0.tar.bz2 (3.0MB)

100% |████████████████████████████████| 3.0MB 21kB/s

Requirement already satisfied: six>=1.5.2 in /usr/local/lib/python2.7/site-packages (from scrapy)

Collecting pyasn1-modules (from service-identity->scrapy)

Downloading pyasn1_modules-0.2.1-py2.py3-none-any.whl (60kB)

100% |████████████████████████████████| 61kB 37kB/s

Collecting attrs (from service-identity->scrapy)

Downloading attrs-17.3.0-py2.py3-none-any.whl

Collecting pyasn1 (from service-identity->scrapy)

Downloading pyasn1-0.4.2-py2.py3-none-any.whl (71kB)

100% |████████████████████████████████| 71kB 38kB/s

Collecting cryptography>=2.1.4 (from pyOpenSSL->scrapy)

Downloading cryptography-2.1.4-cp27-cp27m-macosx_10_6_intel.whl (1.5MB)

100% |████████████████████████████████| 1.5MB 26kB/s

Collecting zope.interface>=3.6.0 (from Twisted>=13.1.0->scrapy)

Downloading zope.interface-4.4.3.tar.gz (147kB)

100% |████████████████████████████████| 153kB 67kB/s

Collecting constantly>=15.1 (from Twisted>=13.1.0->scrapy)

Downloading constantly-15.1.0-py2.py3-none-any.whl

Collecting incremental>=16.10.1 (from Twisted>=13.1.0->scrapy)

Downloading incremental-17.5.0-py2.py3-none-any.whl

Collecting Automat>=0.3.0 (from Twisted>=13.1.0->scrapy)

Downloading Automat-0.6.0-py2.py3-none-any.whl

Collecting hyperlink>=17.1.1 (from Twisted>=13.1.0->scrapy)

Downloading hyperlink-17.3.1-py2.py3-none-any.whl (73kB)

100% |████████████████████████████████| 81kB 67kB/s

Requirement already satisfied: idna>=2.1 in /usr/local/lib/python2.7/site-packages (from cryptography>=2.1.4->pyOpenSSL->scrapy)

Collecting cffi>=1.7; platform_python_implementation != “PyPy” (from cryptography>=2.1.4->pyOpenSSL->scrapy)

Downloading cffi-1.11.2-cp27-cp27m-macosx_10_6_intel.whl (238kB)

100% |████████████████████████████████| 245kB 43kB/s

Requirement already satisfied: enum34; python_version < “3” in /usr/local/lib/python2.7/site-packages (from cryptography>=2.1.4->pyOpenSSL->scrapy)

Collecting asn1crypto>=0.21.0 (from cryptography>=2.1.4->pyOpenSSL->scrapy)

Downloading asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)

100% |████████████████████████████████| 102kB 42kB/s

Collecting ipaddress; python_version < “3” (from cryptography>=2.1.4->pyOpenSSL->scrapy)

Downloading ipaddress-1.0.19.tar.gz

Requirement already satisfied: setuptools in /usr/local/lib/python2.7/site-packages (from zope.interface>=3.6.0->Twisted>=13.1.0->scrapy)

Collecting pycparser (from cffi>=1.7; platform_python_implementation != “PyPy”->cryptography>=2.1.4->pyOpenSSL->scrapy)

Downloading pycparser-2.18.tar.gz (245kB)

100% |████████████████████████████████| 256kB 45kB/s

Building wheels for collected packages: PyDispatcher, Twisted, zope.interface, ipaddress, pycparser

Running setup.py bdist_wheel for PyDispatcher … done

Stored in directory: /Users/crifan/Library/Caches/pip/wheels/86/02/a1/5857c77600a28813aaf0f66d4e4568f50c9f133277a4122411

Running setup.py bdist_wheel for Twisted … done

Stored in directory: /Users/crifan/Library/Caches/pip/wheels/91/c7/95/0bb4d45bc4ed91375013e9b5f211ac3ebf4138d8858f84abbc

Running setup.py bdist_wheel for zope.interface … done

Stored in directory: /Users/crifan/Library/Caches/pip/wheels/8b/39/98/0fcb72adfb12b2547273b1164d952f093f267e0324d58b6955

Running setup.py bdist_wheel for ipaddress … done

Stored in directory: /Users/crifan/Library/Caches/pip/wheels/d7/6b/69/666188e8101897abb2e115d408d139a372bdf6bfa7abb5aef5

Running setup.py bdist_wheel for pycparser … done

Stored in directory: /Users/crifan/Library/Caches/pip/wheels/95/14/9a/5e7b9024459d2a6600aaa64e0ba485325aff7a9ac7489db1b6

Successfully built PyDispatcher Twisted zope.interface ipaddress pycparser

Installing collected packages: cssselect, lxml, w3lib, parsel, pyasn1, pyasn1-modules, attrs, pycparser, cffi, asn1crypto, ipaddress, cryptography, pyOpenSSL, service-identity, queuelib, PyDispatcher, zope.interface, constantly, incremental, Automat, hyperlink, Twisted, scrapy

Successfully installed Automat-0.6.0 PyDispatcher-2.0.5 Twisted-17.9.0 asn1crypto-0.24.0 attrs-17.3.0 cffi-1.11.2 constantly-15.1.0 cryptography-2.1.4 cssselect-1.0.1 hyperlink-17.3.1 incremental-17.5.0 ipaddress-1.0.19 lxml-4.1.1 parsel-1.2.0 pyOpenSSL-17.5.0 pyasn1-0.4.2 pyasn1-modules-0.2.1 pycparser-2.18 queuelib-1.4.2 scrapy-1.4.0 service-identity-17.0.0 w3lib-1.18.0 zope.interface-4.4.3

看了：

初窥Scrapy — Scrapy 1.0.5 文档

感觉后续会涉及到：

用交互式shell终端，调试页面抓取的语法
用media pipeline下载音视频文件

下载及处理文件和图片 — Scrapy 1.0.5 文档

先去看看有哪些命令：

➜ scrapy scrapy –help

Scrapy 1.4.0 – no active project

Usage:

scrapy <command> [options] [args]

Available commands:

bench Run quick benchmark test

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory

Use “scrapy <command> -h” to see more info about a command

再去看看其他一些子命令的具体参数：

➜ scrapy scrapy startproject -h

Usage

=====

scrapy startproject <project_name> [project_dir]

Create new project

Options

=======

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<help, -h show this help message and exit

Global Options

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<————

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<logfile=FILE log file. if omitted stderr will be used

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<loglevel=LEVEL, -L LEVEL

log level (default: DEBUG)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<nolog disable logging completely

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<profile=FILE write python cProfile stats to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pidfile=FILE write process ID to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<set=NAME=VALUE, -s NAME=VALUE

set/override setting (may be repeated)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pdb enable pdb on failure

和：

➜ scrapy scrapy bench -h

Usage

=====

scrapy bench

Run quick benchmark test

Options

=======

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<help, -h show this help message and exit

Global Options

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<————

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<logfile=FILE log file. if omitted stderr will be used

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<loglevel=LEVEL, -L LEVEL

log level (default: INFO)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<nolog disable logging completely

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<profile=FILE write python cProfile stats to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pidfile=FILE write process ID to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<set=NAME=VALUE, -s NAME=VALUE

set/override setting (may be repeated)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pdb enable pdb on failure

➜ scrapy scrapy fetch -h

Usage

=====

scrapy fetch [options] <url>

Fetch a URL using the Scrapy downloader and print its content to stdout. You

may want to use –nolog to disable logging

Options

=======

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<help, -h show this help message and exit

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<spider=SPIDER use this spider

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<headers print response HTTP headers instead of body

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<no-redirect do not handle HTTP 3xx status codes and print response

as-is

Global Options

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<————

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<logfile=FILE log file. if omitted stderr will be used

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<loglevel=LEVEL, -L LEVEL

log level (default: DEBUG)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<nolog disable logging completely

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<profile=FILE write python cProfile stats to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pidfile=FILE write process ID to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<set=NAME=VALUE, -s NAME=VALUE

set/override setting (may be repeated)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pdb enable pdb on failure

➜ scrapy scrapy shell -h

Usage

=====

scrapy shell [url|file]

Interactive console for scraping the given url

Options

=======

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<help, -h show this help message and exit

-c CODE evaluate the code in the shell, print the result and

exit

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<spider=SPIDER use this spider

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<no-redirect do not handle HTTP 3xx status codes and print response

as-is

Global Options

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<————

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<logfile=FILE log file. if omitted stderr will be used

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<loglevel=LEVEL, -L LEVEL

log level (default: DEBUG)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<nolog disable logging completely

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<profile=FILE write python cProfile stats to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pidfile=FILE write process ID to FILE

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<set=NAME=VALUE, -s NAME=VALUE

set/override setting (may be repeated)

<span style="font-size: 12px; color: rgb(51, 51, 51); font-family: Monaco;"–<pdb enable pdb on failure

然后去创建项目

➜ scrapy scrapy startproject cbeebies

New Scrapy project ‘cbeebies’, using template directory ‘/usr/local/lib/python2.7/site-packages/scrapy/templates/project’, created in:

/Users/crifan/dev/dev_root/company/naturling/projects/scrapy/cbeebies

You can start your first spider with:

cd cbeebies

scrapy genspider example example.com

➜ scrapy pwd

/Users/crifan/dev/dev_root/company/naturling/projects/scrapy

➜ scrapy ll

total 0

drwxr-xr-x 4 crifan staff 128B 12 26 22:50 cbeebies

➜ scrapy cd cbeebies

➜ cbeebies ll

total 8

drwxr-xr-x 8 crifan staff 256B 12 26 22:50 cbeebies

-rw-r–r– 1 crifan staff 260B 12 26 22:50 scrapy.cfg

➜ cbeebies cd cbeebies

➜ cbeebies ll

total 32

-rw-r–r– 1 crifan staff 0B 12 26 20:41 __init__.py

-rw-r–r– 1 crifan staff 287B 12 26 22:50 items.py

-rw-r–r– 1 crifan staff 1.9K 12 26 22:50 middlewares.py

-rw-r–r– 1 crifan staff 288B 12 26 22:50 pipelines.py

-rw-r–r– 1 crifan staff 3.1K 12 26 22:50 settings.py

drwxr-xr-x 3 crifan staff 96B 12 26 20:49 spiders

去看看：

然后进去项目根目录，看看有哪些其他命令：

➜ cbeebies pwd

/Users/crifan/dev/dev_root/company/naturling/projects/scrapy/cbeebies

➜ cbeebies ll

total 8

drwxr-xr-x 10 crifan staff 320B 12 26 23:07 cbeebies

-rw-r–r– 1 crifan staff 260B 12 26 22:50 scrapy.cfg

➜ cbeebies scrapy –help

Scrapy 1.4.0 – project: cbeebies

Usage:

scrapy <command> [options] [args]

Available commands:

bench Run quick benchmark test

check Check spider contracts

crawl Run a spider

edit Edit spider

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

list List available spiders

parse Parse URL (using its spider) and print the results

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

Use “scrapy <command> -h” to see more info about a command

用爬虫类的小写名，会报错：

➜ cbeebies scrapy crawl cbeebies

2017-12-26 23:08:39 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: cbeebies)

2017-12-26 23:08:39 [scrapy.utils.log] INFO: Overridden settings: {‘NEWSPIDER_MODULE’: ‘cbeebies.spiders’, ‘SPIDER_MODULES’: [‘cbeebies.spiders’], ‘ROBOTSTXT_OBEY’: True, ‘BOT_NAME’: ‘cbeebies’}

Traceback (most recent call last):

File “/usr/local/bin/scrapy”, line 11, in <module>

sys.exit(execute())

File “/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py”, line 149, in execute

_run_print_help(parser, _run_command, cmd, args, opts)

File “/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py”, line 89, in _run_print_help

func(*a, **kw)

File “/usr/local/lib/python2.7/site-packages/scrapy/cmdline.py”, line 156, in _run_command

cmd.run(args, opts)

File “/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py”, line 57, in run

self.crawler_process.crawl(spname, **opts.spargs)

File “/usr/local/lib/python2.7/site-packages/scrapy/crawler.py”, line 167, in crawl

crawler = self.create_crawler(crawler_or_spidercls)

File “/usr/local/lib/python2.7/site-packages/scrapy/crawler.py”, line 195, in create_crawler

return self._create_crawler(crawler_or_spidercls)

File “/usr/local/lib/python2.7/site-packages/scrapy/crawler.py”, line 199, in _create_crawler

spidercls = self.spider_loader.load(spidercls)

File “/usr/local/lib/python2.7/site-packages/scrapy/spiderloader.py”, line 71, in load

raise KeyError(“Spider not found: {}”.format(spider_name))

KeyError: ‘Spider not found: cbeebies’

换用大写的，即可：

➜ cbeebies scrapy crawl Cbeebies

2017-12-26 23:09:00 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: cbeebies)

2017-12-26 23:09:00 [scrapy.utils.log] INFO: Overridden settings: {‘NEWSPIDER_MODULE’: ‘cbeebies.spiders’, ‘SPIDER_MODULES’: [‘cbeebies.spiders’], ‘ROBOTSTXT_OBEY’: True, ‘BOT_NAME’: ‘cbeebies’}

2017-12-26 23:09:00 [scrapy.middleware] INFO: Enabled extensions:

[‘scrapy.extensions.memusage.MemoryUsage’,

‘scrapy.extensions.logstats.LogStats’,

‘scrapy.extensions.telnet.TelnetConsole’,

‘scrapy.extensions.corestats.CoreStats’]

2017-12-26 23:09:00 [scrapy.middleware] INFO: Enabled downloader middlewares:

[‘scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware’,

‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,

‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,

‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,

‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,

‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,

‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,

‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,

‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,

‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware’,

‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,

‘scrapy.downloadermiddlewares.stats.DownloaderStats’]

2017-12-26 23:09:00 [scrapy.middleware] INFO: Enabled spider middlewares:

[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,

‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,

‘scrapy.spidermiddlewares.referer.RefererMiddleware’,

‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,

‘scrapy.spidermiddlewares.depth.DepthMiddleware’]

2017-12-26 23:09:00 [scrapy.middleware] INFO: Enabled item pipelines:

[]

2017-12-26 23:09:00 [scrapy.core.engine] INFO: Spider opened

2017-12-26 23:09:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2017-12-26 23:09:00 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023

2017-12-26 23:09:02 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://global.cbeebies.com/> from <GET http://us.cbeebies.com/robots.txt>

2017-12-26 23:09:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://global.cbeebies.com/> (referer: None)

2017-12-26 23:09:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://global.cbeebies.com/> from <GET http://us.cbeebies.com/watch-and-sing/>

2017-12-26 23:09:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://global.cbeebies.com/robots.txt> (referer: None)

2017-12-26 23:09:04 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET http://global.cbeebies.com/> from <GET http://us.cbeebies.com/shows/>

2017-12-26 23:09:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://global.cbeebies.com/> (referer: None)

response.url=http://global.cbeebies.com/

2017-12-26 23:09:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://global.cbeebies.com/> (referer: None)

response.url=http://global.cbeebies.com/

2017-12-26 23:09:05 [scrapy.core.engine] INFO: Closing spider (finished)

2017-12-26 23:09:05 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

{‘downloader/request_bytes’: 1548,

‘downloader/request_count’: 7,

‘downloader/request_method_count/GET’: 7,

‘downloader/response_bytes’: 12888,

‘downloader/response_count’: 7,

‘downloader/response_status_count/200’: 4,

‘downloader/response_status_count/301’: 3,

‘finish_reason’: ‘finished’,

‘finish_time’: datetime.datetime(2017, 12, 26, 15, 9, 5, 194303),

‘log_count/DEBUG’: 8,

‘log_count/INFO’: 7,

‘memusage/max’: 50208768,

‘memusage/startup’: 50204672,

‘response_received_count’: 4,

‘scheduler/dequeued’: 4,

‘scheduler/dequeued/memory’: 4,

‘scheduler/enqueued’: 4,

‘scheduler/enqueued/memory’: 4,

‘start_time’: datetime.datetime(2017, 12, 26, 15, 9, 0, 792051)}

2017-12-26 23:09:05 [scrapy.core.engine] INFO: Spider closed (finished)

➜ cbeebies

然后可以生成对应的html文件：

然后接着尝试去：

【已解决】Mac中PyCharm中去加断点实时调试scrapy的项目

此时已经可以获取抓取的页面返回的response.body的html内容了。

接着考虑如何：

解析，抓取出，后续需要处理的url，

如何传递给scrapy告诉后续继续处理这些url

【记录】尝试Scrapy shell去提取cbeebies.com页面中的子url

转载请注明：在路上 » 【记录】用Python的Scrapy去爬取cbeebies.com

Post Views: 1,995

【记录】用Python的Scrapy去爬取cbeebies.com

与本文相关的文章

Hi，您需要填写昵称和邮箱！

与本文相关的文章

Hi，您需要填写昵称和邮箱！

订阅在路上