问题:
去看了下乱码的原因:
是原始抓取到的数据就出错是乱码了:
Mongo Compass中搜:
{"title": {$regex: "Chicka Chicka .*"}}
找到的:
解决办法:
1. 重新修改爬取代码,重新爬取
优点:批量解决问题,缺点:需要 修改代码+运行脚本+重新更新后台数据 所用时间较长
2.去数据库中修改已发现的个别的乱码
优点:相对省时间 缺点:要发现一个解决一个 不能一次性批量解决
我之前也偶尔发现1个 -》总体上还是很少的
结论:先用方案2解决目前出现的个别问题
目前也没合并兰斯的数据,合并了之后再出现这样的问题再考虑新的解决方案
所以现在去更新MongoDB中的数据
本来想要完整更新整个数据呢,后来想到了:
只需要更新单个元素的title字段即可。
所以思路是:
先去查询出来,再去根据id去更新title
mongodb update field
先去本地尝试一下,再去在线数据库中操作
> db.main.find({"title": {"$regex": "Chicka Chicka .*"}}).pretty()
搜到了要的:
{ "_id" : ObjectId("5bd7beecbfaa44fe2c73e73f"), "url" : "https://www.scholastic.com/teachers/books/chicka-chicka-1-2-3-by-bill-martin-jr/", "title" : "Chicka Chicka 1â¢2â¢3", "description" : "This spectacular follow-up to the bestselling Chicka Chicka Boom Boom is the essential book for any child learning to count.\n\n1 told 2 and 2 told 3 \"I'll race you to the top of the apple tree.\"\n\nOne hundred and one numbers climb the apple tree in this bright, rollicking, joyous book for young children. As the numerals pile up and bumblebees threaten, what's the number that saves the day? (Hint: It rhymes with \"hero.\")\n\nRead and count and play and laugh to learn the surprising answer.", "coverImgUrl" : "https://www.scholastic.com/content5/media/products/72/9780439731072_mres.jpg",
然后再去确保能搜到
> db.main.find({"_id": ObjectId("5bd7beecbfaa44fe2c73e73f")}).pretty() { "_id" : ObjectId("5bd7beecbfaa44fe2c73e73f"), "url" : "https://www.scholastic.com/teachers/books/chicka-chicka-1-2-3-by-bill-martin-jr/", "title" : "Chicka Chicka 1â¢2â¢3",
再去想办法更新:
结果郁闷了,更新后只剩title了:
> db.main.update({"_id": ObjectId("5bd7beecbfaa44fe2c73e73f")}, {"title": "Chicka Chicka 1,2,3"}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.find({"_id": ObjectId("5bd7beecbfaa44fe2c73e73f")}).pretty() { "_id" : ObjectId("5bd7beecbfaa44fe2c73e73f"), "title" : "Chicka Chicka 1,2,3" } >
换成另外的$set试试
mongodb set vs update
The $set operator replaces the value of a field with the specified value.
看来就是我要的:更新某个字段(保留其他字段)
“Modifies an existing document or documents in a collection. The method can modify specific fields of an existing document or documents or replace an existing document entirely, depending on the update parameter.
By default, the update() method updates a single document.”
可以更新某个字段,也可以更新整个document
默认更新整个document
db.books.update( { _id: 1 }, { $inc: { stock: 5 }, $set: { item: "ABC123", "info.publisher": "2222", tags: [ "software" ], "ratings.1": { by: "xyz", rating: 3 } } } )
应该用:
db.books.update
然后内部用$set去更新某个字段
【总结】
最后用这个写法就可以了:
> db.main.update({"_id": ObjectId("5bd7beebbfaa44fe2c73e722")}, {$set: {"title": "Chicka Chicka new title"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.find({"_id": ObjectId("5bd7beebbfaa44fe2c73e722")}).pretty() { "_id" : ObjectId("5bd7beebbfaa44fe2c73e722"), "url" : "https://www.scholastic.com/teachers/books/chicka-chicka-sticka-sticka-by-bill-martin-jr/", "title" : "Chicka Chicka new title", ...
成功更新,且只更新title,其他字段不变:
所以再去在线MongoDB中去更新:
[root@xxx-general-01 ~]# mongo storybook --host localhost --port xxx -u storybook -p xxx --authenticationDatabase storybook MongoDB shell version: 3.2.19 connecting to: localhost:32018/storybook > show collections collection main scholastic > db.main.find({"title": {"$regex": "Chicka Chicka .*"}}).pretty() ... { "_id" : ObjectId("5bd7beecbfaa44fe2c73e73f"), "url" : "https://www.scholastic.com/teachers/books/chicka-chicka-1-2-3-by-bill-martin-jr/", "title" : "Chicka Chicka 1â¢2â¢3" ... > db.main.update({"_id": ObjectId("5bd7beecbfaa44fe2c73e73f")}, {$set: {"title": "Chicka Chicka 1,2,3"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.find({"_id": ObjectId("5bd7beecbfaa44fe2c73e73f")}).pretty() { "_id" : ObjectId("5bd7beecbfaa44fe2c73e73f"), "url" : "https://www.scholastic.com/teachers/books/chicka-chicka-1-2-3-by-bill-martin-jr/", "title" : "Chicka Chicka 1,2,3", ...
【后记】
继续去找找是否有其他乱码:
db.main.find({"title": {"$regex": ".*â¢.*"}})
是空。
但是:
> db.main.find({"title": {"$regex": ".*â.*"}}) > db.main.find({"title": {"$regex": ".*â.*"}}) { "_id" : ObjectId("5bd7bd65bfaa44fe2c738260"), "url" : "https://www.scholastic.com/teachers/books/when-a-line-bends--a-shape-begins-by-rhonda-gowler-greene/", "title" : "When a Line Bends⦠A Shape Begins", "description" ...
还真能搜到。
> db.main.find({"title": {"$regex": ".*â.*"}}).length() 6
共有6个。
> db.main.find({"title": {"$regex": ".*¢.*"}}) >
没有。
对于:
> db.main.find({"title": {"$regex": ".*â.*"}}) { "_id" : ObjectId("5bd7bd65bfaa44fe2c738260"), "url" : "https://www.scholastic.com/teachers/books/when-a-line-bends--a-shape-begins-by-rhonda-gowler-greene/", "title" : "When a Line Bends⦠A Shape Begins", "description" : ... { "_id" : ObjectId("5bd7bd9cbfaa44fe2c7393e7"), "url" : "https://www.scholastic.com/teachers/books/reading-response-trifolds-for-40-popular-nonfiction-books-grade/", "title" : "Reading Response Trifolds for 40 Popular Nonfiction Books: Grades 2â3", ... { "_id" : ObjectId("5bd7be00bfaa44fe2c73b1f7"), "url" : "https://www.scholastic.com/teachers/books/it-s-all-about-us-especially-me--by-karen-phillips/", "title" : "It's All About Us (â¦Especially Me!)", ... { "_id" : ObjectId("5bd7be17bfaa44fe2c73b6c0"), "url" : "https://www.scholastic.com/teachers/books/i-heart-band-by-michelle-schusterman/", "title" : "I ⥠Band!", ... { "_id" : ObjectId("5bd7bed5bfaa44fe2c73e0b3"), "url" : "https://www.scholastic.com/teachers/books/hi-lo-passages-to-build-comprehension-grades-56-by-michael-prie/", "title" : "Hi-Lo Passages to Build Comprehension: Grades 5â6", ... { "_id" : ObjectId("5bd7bf69bfaa44fe2c740c17"), "url" : "https://www.scholastic.com/teachers/books/50-skill-building-pyramid-puzzles-math-grades-4-6-by-immacula/", "title" : "50 Skill-Building Pyramid Puzzles: Math: Grades 4â6", ...
分别更新这6个的title:
> db.main.update({"_id": ObjectId("5bd7bd65bfaa44fe2c738260")}, {$set: {"title": "When a Line Bends… A Shape Begins"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.update({"_id": ObjectId("5bd7bd9cbfaa44fe2c7393e7")}, {$set: {"title": "Reading Response Trifolds for 40 Popular Nonfiction Books: Grades 2–3"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.update({"_id": ObjectId("5bd7be00bfaa44fe2c73b1f7")}, {$set: {"title": "It's All About Us (…Especially Me!)"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.update({"_id": ObjectId("5bd7be17bfaa44fe2c73b6c0")}, {$set: {"title": "I ♥ Band!"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.update({"_id": ObjectId("5bd7bed5bfaa44fe2c73e0b3")}, {$set: {"title": "Hi-Lo Passages to Build Comprehension: Grades 5–6"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 }) > db.main.update({"_id": ObjectId("5bd7bf69bfaa44fe2c740c17")}, {$set: {"title": "50 Skill-Building Pyramid Puzzles: Math: Grades 4–6"}}) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
即可。
对于上面的:
… -> ⦠– -> â ♥ -> â¥
再去搜搜其他的:
> db.main.find({"title": {"$regex": ".*¦.*"}}) > db.main.find({"title": {"$regex": ".*¥.*"}}) >
目前都没了。
另外抽空再去:
【记录】把在线的dev的MongoDB备份后恢复到本地