去写爬虫期间,发现有些图片地址,虽然是有的,但是实际上是1×1的空图片:
而另外一些图片地址,才是正常的:
所以希望:
用Python代码,能通过图片的url地址,检测出这些图片的基本参数信息,比如分辨率等等
python 检测在线图片属性
图片EXIF信息获取,在线获取图片EXIF信息
去找找PIL
python pil
➜ crawler_fablexile_book git:(master) ✗ pip install pillow Collecting pillow Downloading https://files.pythonhosted.org/packages/2e/f6/e6d56ec19fae0b76932a343355c7a37198fd8e76c26bc2f2779fc85b4684/Pillow-5.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (3.6MB) 100% |████████████████████████████████| 3.6MB 2.1MB/s matplotlib 1.3.1 requires nose, which is not installed. matplotlib 1.3.1 requires tornado, which is not installed. pyopenssl 18.0.0 has requirement six>=1.5.2, but you'll have six 1.4.1 which is incompatible. Installing collected packages: pillow Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/Library/Python/2.7/site-packages/PIL' Consider using the `--user` option or check the permissions. You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ➜ crawler_fablexile_book git:(master) ✗ pip3 install pillow --user Collecting pillow Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/d1/21/bef2816809fac16754e07ed935469fc65f42ced1a94766de7c804179311d/Pillow-5.3.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (3.6MB) 100% |████████████████████████████████| 3.6MB 725kB/s Installing collected packages: pillow Successfully installed pillow-5.3.0 You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ➜ crawler_fablexile_book git:(master) ✗ pip3 show pillow Name: Pillow Version: 5.3.0 Summary: Python Imaging Library (Fork) Home-page: http://python-pillow.org Author: Alex Clark (Fork Author) Author-email: [email protected] License: Standard PIL License Location: /Users/crifan/Library/Python/3.6/lib/python/site-packages Requires: Required-by: You are using pip version 10.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ➜ crawler_fablexile_book git:(master) ✗ which pip3 /usr/local/bin/pip3
然后去调试代码:
from PIL import Image TestImgPathList = [ "https://images-na.ssl-images-amazon.com/images/P/141972486X.jpg", # 1x1 "https://images-na.ssl-images-amazon.com/images/P/0061992275.jpg", # normal ] def detectImgageInfo(): for curImgPath in TestImgPathList: print("curImgPath=%s" % curImgPath) img = Image.open(curImgPath) print("img=%s" % img) imgSize = img.size print("imgSize=%s" % imgSize) if __name__ == "__main__": detectImgageInfo()
直接报错:
发生异常: FileNotFoundError
python pillow load image from url
【已解决】Python的Pillow打开图片出错:OSError cannot identify image file _io.BytesIO object at
再去参考:
去完善异常处理
【总结】
最后用如下代码:
from PIL import Image # import requests # import cStringIO # import urllib2 as urllib import urllib TestImgUrlList = [ "https://images-na.ssl-images-amazon.com/images/P/141972486X.jpg", # 1x1 "https://images-na.ssl-images-amazon.com/images/P/0061992275.jpg", # normal ] # def getImgRawData(imgUrl): # respImgStream = requests.get(imgUrl, stream=True) # respImgStream.raw.decode_content = True # imgRawData = respImgStream.raw # print("imgRawData=%s" % imgRawData) # return imgRawData def getImgFile(imgUrl): print("getImgFile: imgUrl=%s" % imgUrl) getOk = False imgFileOrErrMsg = "" try: imgFileObj = urllib.request.urlopen(imgUrl) print("imgFileObj=%s" % imgFileObj) getOk = True imgFileOrErrMsg = imgFileObj except urllib.error.URLError as urlErr: getOk = False imgFileOrErrMsg = str(urlErr.reason) return getOk, imgFileOrErrMsg def detectImgageInfoFromUrl(): for curImgUrl in TestImgUrlList: print("curImgUrl=%s" % curImgUrl) # imgRawData = getImgRawData(curImgUrl) # img = Image.open(imgRawData) getOk, imgFileOrErrMsg = getImgFile(curImgUrl) print("getOk=%s, imgFileOrErrMsg=%s" % (getOk, imgFileOrErrMsg)) if getOk: imgFile = imgFileOrErrMsg print("imgFile=%s" % imgFile) img = Image.open(imgFile) print("img=%s" % img) (imgWidth, imgHeight) = img.size print("imgWidth=%s, imgHeight=%s" % (imgWidth, imgHeight)) # 1x1 / 354x500 else: print("Get image url %s failed: %s" %(curImgUrl, imgFileOrErrMsg)) if __name__ == "__main__": detectImgageInfoFromUrl()
输出:
curImgUrl=https://images-na.ssl-images-amazon.com/images/P/141972486X.jpg getImgFile: imgUrl=https://images-na.ssl-images-amazon.com/images/P/141972486X.jpg imgFileObj=<http.client.HTTPResponse object at 0x10e4042e8> getOk=True, imgFileOrErrMsg=<http.client.HTTPResponse object at 0x10e4042e8> imgFile=<http.client.HTTPResponse object at 0x10e4042e8> img=<PIL.GifImagePlugin.GifImageFile image mode=P size=1x1 at 0x10E176240> imgWidth=1, imgHeight=1 curImgUrl=https://images-na.ssl-images-amazon.com/images/P/0061992275.jpg getImgFile: imgUrl=https://images-na.ssl-images-amazon.com/images/P/0061992275.jpg imgFileObj=<http.client.HTTPResponse object at 0x10e1aa208> getOk=True, imgFileOrErrMsg=<http.client.HTTPResponse object at 0x10e1aa208> imgFile=<http.client.HTTPResponse object at 0x10e1aa208> img=<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=354x500 at 0x10E417978> imgWidth=354, imgHeight=500
即可得到希望的结果了。
此处1×1不正常的图片的各种属性:
正常的图片的各种属性:
转载请注明:在路上 » 【已解决】Python中判断本地和网络图片的分辨率等基本信息