现有一个PySpider的项目,已经运行了一段时间,爬取了一些数据:
对应MongoDB中也保存了对应的数据:
现在希望是:
整体迁移PySpider的环境过去:
希望可以继续爬取,断点续传。
现在能想到的是:
先要去把目标mac中的MongoDB搭建期间,
把源mac中MongoDB数据导出来,再导入目标mac的mongodb中。
然后再去重建目标mac中pipenv的虚拟环境,安装好库
然后把源环境中PySpider的data目录,整体移动过去
至此,再去月目标环境中继续运行,希望应该可以继续恢复运行
-》只要PySpider中data中的db文件里保存的数据,都是相对路径,理论上应该就可以的。
现在先去:
【已解决】Mac中已安装MongoDB但运行mongod出错:exception in initAndListen: NonExistentPath: Data directory /data/db not found
然后再去:
源电脑:导出MongoDB数据
参考:
去操作:
➜ mongodb_migration git:(master) mongodump -d storybook -o . 2018-11-26T11:58:21.944+0800 writing storybook.scholastic to 2018-11-26T11:58:21.946+0800 writing storybook.lexile to 2018-11-26T11:58:21.946+0800 writing storybook.main to 2018-11-26T11:58:23.101+0800 done dumping storybook.lexile (29911 documents) 2018-11-26T11:58:23.353+0800 done dumping storybook.scholastic (51785 documents) 2018-11-26T11:58:24.451+0800 done dumping storybook.main (51785 documents)
目标电脑:导入MongoDB数据
拷贝数据过来后:
去导入:
macdeMacBook-Pro:mongodb_migration mac$ pwd /Users/mac/working/dev_root/xxx/projects/crawler_projects/crawler_fablexile_book/debug/mongodb_migration macdeMacBook-Pro:mongodb_migration mac$ ls -lha total 102928 drwxr-xr-x 5 mac staff 160B 11 26 13:32 . drwxr-xr-x 3 mac staff 96B 11 26 13:31 .. -rw-r--r--@ 1 mac staff 6.0K 11 26 13:32 .DS_Store -rw-r--r--@ 1 mac staff 50M 11 25 19:59 mongodb_storybook_20181126.zip drwxr-xr-x@ 8 mac staff 256B 11 25 19:58 storybook macdeMacBook-Pro:mongodb_migration mac$ mongorestore -d storybook ./storybook 2018-11-26T13:33:25.550-0800 the --db and --collection args should only be used when restoring from a BSON file. Other uses are deprecated and will not exist in the future; use --nsInclude instead 2018-11-26T13:33:25.550-0800 building a list of collections to restore from storybook dir 2018-11-26T13:33:25.552-0800 reading metadata for storybook.main from storybook/main.metadata.json 2018-11-26T13:33:25.553-0800 reading metadata for storybook.scholastic from storybook/scholastic.metadata.json 2018-11-26T13:33:25.553-0800 reading metadata for storybook.lexile from storybook/lexile.metadata.json 2018-11-26T13:33:25.687-0800 restoring storybook.main from storybook/main.bson 2018-11-26T13:33:25.826-0800 restoring storybook.scholastic from storybook/scholastic.bson 2018-11-26T13:33:25.950-0800 restoring storybook.lexile from storybook/lexile.bson 2018-11-26T13:33:27.512-0800 no indexes to restore 2018-11-26T13:33:27.512-0800 finished restoring storybook.lexile (29911 documents) 2018-11-26T13:33:28.407-0800 no indexes to restore 2018-11-26T13:33:28.407-0800 finished restoring storybook.scholastic (51785 documents) 2018-11-26T13:33:28.547-0800 [###################.....] storybook.main 87.5MB/106MB (82.6%) 2018-11-26T13:33:29.134-0800 [########################] storybook.main 106MB/106MB (100.0%) 2018-11-26T13:33:29.134-0800 no indexes to restore 2018-11-26T13:33:29.134-0800 finished restoring storybook.main (51785 documents) 2018-11-26T13:33:29.134-0800 done
然后去用工具看看数据是否导入:
去:
然后打开本地MongoDB,确认数据是对的:
然后目标文件拷贝到了data目录:
然后再去重建pipenv环境:
macdeMacBook-Pro:projects_git mac$ cd /Users/mac/working/dev_root/xxx/projects_git/crawler_projects/crawler_fablexile_book macdeMacBook-Pro:crawler_fablexile_book mac$ pwd /Users/mac/working/dev_root/xxx/projects_git/crawler_projects/crawler_fablexile_book macdeMacBook-Pro:crawler_fablexile_book mac$ ls -l total 56 -rw-r--r-- 1 mac staff 18272 11 26 13:46 FabLexileBook.py -rw-r--r-- 1 mac staff 276 11 26 13:46 Pipfile -rw-r--r-- 1 mac staff 3111 11 26 13:46 README.md drwxr-xr-x 3 mac staff 96 11 26 13:46 tools macdeMacBook-Pro:crawler_fablexile_book mac$ cd /Users/mac/working/dev_root/xxx/projects_git/crawler_projects/crawler_fablexile_book macdeMacBook-Pro:crawler_fablexile_book mac$ pipenv install --skip-lock Creating a virtualenv for this project… Pipfile: /Users/mac/working/dev_root/xxx/projects_git/crawler_projects/crawler_fablexile_book/Pipfile Using /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 (3.6.7) to create virtualenv… ⠹Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 Using base prefix '/Library/Frameworks/Python.framework/Versions/3.6' New python executable in /Users/mac/.local/share/virtualenvs/crawler_fablexile_book-4ZfM-yMK/bin/python3.6 Also creating executable in /Users/mac/.local/share/virtualenvs/crawler_fablexile_book-4ZfM-yMK/bin/python Installing setuptools, pip, wheel...done. Virtualenv location: /Users/mac/.local/share/virtualenvs/crawler_fablexile_book-4ZfM-yMK Installing dependencies from Pipfile… 🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 5/5 — 00:01:09 To activate this project's virtualenv, run pipenv shell. Alternatively, run a command inside the virtualenv with pipenv run. macdeMacBook-Pro:crawler_fablexile_book mac$ pipenv shell Launching subshell in virtual environment… bash-3.2$ . /Users/mac/.local/share/virtualenvs/crawler_fablexile_book-4ZfM-yMK/bin/activate (crawler_fablexile_book) bash-3.2$ which python /Users/mac/.local/share/virtualenvs/crawler_fablexile_book-4ZfM-yMK/bin/python (crawler_fablexile_book) bash-3.2$ python --version Python 3.6.7
然后就可以去试试:
运行pyspider,看看能否继续恢复运行了:
(crawler_fablexile_book) bash-3.2$ pyspider [W 181126 13:52:40 run:413] phantomjs not found, continue running without it. [I 181126 13:52:42 result_worker:49] result_worker starting... [I 181126 13:52:42 processor:211] processor starting... ^C[I 181126 13:52:42 result_worker:66] result_worker exiting... [I 181126 13:52:42 processor:229] processor exiting...
结果找不到phantomjs,所以去下载和安装
参考自己之前的:
【已解决】Mac中安装phantomjs
去安装:
brew tap homebrew/cask brew cask install phantomjs
结果下载phantomjs却花了好半天时间,最后ss中换了sg的节点,才能继续下载:
xxx-Mac-2013-Late:~ mac$ brew cask install phantomjs ==> Caveats phantomjs has been officially discontinued upstream. It may stop working correctly (or at all) in recent versions of macOS. ==> Satisfying dependencies ==> Downloading https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-macosx.zip ==> Downloading from https://bbuseruploads.s3.amazonaws.com/fd96ed93-2b32-46a7-9d2b-ecbc0988516a/downloads/8543ae7d-9ac7-43d3-9052-537d63f16d66/phantomjs-2.1.1- # 1.7%^C xxx-Mac-2013-Late:~ mac$ xxx-Mac-2013-Late:~ mac$ brew cask install phantomjs Updating Homebrew... ==> Caveats phantomjs has been officially discontinued upstream. It may stop working correctly (or at all) in recent versions of macOS. ==> Satisfying dependencies ==> Downloading https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-macosx.zip ==> Downloading from https://bbuseruploads.s3.amazonaws.com/fd96ed93-2b32-46a7-9d2b-ecbc0988516a/downloads/8543ae7d-9ac7-43d3-9052-537d63f16d66/phantomjs-2.1.1- ######################################################################## 100.0% ==> Verifying SHA-256 checksum for Cask 'phantomjs'. ==> Installing Cask phantomjs ==> Creating Caskroom at /usr/local/Caskroom ==> We'll set permissions properly so we won't need sudo in the future. Password: Sorry, try again. Password: ==> Linking Binary 'phantomjs' to '/usr/local/bin/phantomjs'. 🍺 phantomjs was successfully installed! xxx-Mac-2013-Late:~ mac$ which phantomjs /usr/local/bin/phantomjs
然后运行:
pyspider
再去设置状态为RUNNING:
就可以继续运行了。
目前看起来,下载速度还不错:
转载请注明:在路上 » 【已解决】PySpider项目迁移到别的电脑重新继续运行