【背景】
折腾:
【未解决】将不可拷贝复制的PDF中的表格数据导出并转换为xml格式数据
期间,去试试用xpdf,将一个不可拷贝的pdf文件,转换为文本或html。
【折腾过程】
1.参考:
去:
->
->
->
->
http://gd.tuwien.ac.at/publishing/xpdf/
->
然后去运行:
D:\tmp\dev_tools\python\pdf\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64
中的:
pdftotext.exe
结果还是被保护,无法拷贝:
D:\tmp\dev_tools\python\pdf\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64>pdftotext.exe pdftotext version 3.03 Copyright 1996-2011 Glyph & Cog, LLC Usage: pdftotext [options] <PDF-file> [<text-file>] -f <int> : first page to convert -l <int> : last page to convert -layout : maintain original physical layout -fixed <fp> : assume fixed-pitch (or tabular) text -raw : keep strings in content stream order -htmlmeta : generate a simple HTML file, including the meta information -enc <string> : output text encoding name -eol <string> : output end-of-line convention (unix, dos, or mac) -nopgbrk : don't insert page breaks between pages -opw <string> : owner password (for encrypted files) -upw <string> : user password (for encrypted files) -q : don't print any messages or errors -cfg <string> : configuration file to use in place of .xpdfrc -v : print copyright and version info -h : print usage information -help : print usage information --help : print usage information -? : print usage information D:\tmp\dev_tools\python\pdf\xpdfbin-win-3.03\xpdfbin-win-3.03\bin64>pdftotext.exe -htmlmeta D:\tmp\tmp_dev_root\python\answer_question\self\pdf_table_to_xml\pdf\spec183r21.0.pdf hart183.html Permission Error: Copying of text from this document is not allowed.
2.所以去解决上述问题:
但是没解决掉。。。
【总结】
目前还是没法用xpdf去把pdf转换为想要的html。
转载请注明:在路上 » 【记录】尝试用xpdf将不可复制的PDF转换为文本或HTML