admin管理员组文章数量:1130349
由于PDF的文件大多都是只读文件,有时候为了满足可以编辑的需要通常可以将PDF文件直接转换成Word文件进行操作。
看了网络上面的python转换PDF文件为Word的相关文章感觉都比较复杂,并且关于一些图表的使用还要进行特殊的处理。
本篇文章主要讲解关于如何使用python是实现将PDF转换成Word的业务过程,这次没有使用GUI应用的操作。
由于可能存在版本冲突的问题,这里将开发过程中需要使用的python非标准库的版本列举出来。
python内核版本:3.6.8
PyMuPDF版本:1.18.17
pdf2docx版本:0.5.1
可以选择pip的方式对使用到的python非标准库进行安装。
pip install PyMuPDF==1.18.17
pip install pdf2docx==0.5.1
完成上述的python依赖库安装以后,将pdf2docx导入到我们的代码块中。
# Importing the Converter class from the pdf2docx module.
from pdf2docx import Converter
然后,编写业务函数的代码块,新建一个pdfToWord函数来处理转换逻辑,主要就几行代码可以实现比较简单。
def pdfToWord(pdf_file_path=None, word_file_path=None):
"""
It takes a pdf file path and a word file path as input, and converts the pdf file to a word file.
:param pdf_file_path: The path to the PDF file you want to convert
:param word_file_path: The path to the word file that you want to create
"""
# Creating a Converter object.
converter_ = Converter(pdf_file_path)
# The `convert` method takes the path to the word file that you want to create, and the start and end pages of the PDF
# file that you want to convert.
converter_.convert(word_file_path, start=0, end=None)
converter_.close()
最后,使用main函数调用pdfToWord函数可以直接完成文档格式的转换。
# A special variable in Python that evaluates to `True` if the module is being run directly by the Python interpreter, and
# `False` if it has been imported by another module.
if __name__ == '__main__':
pdfToWord('D:/test-data-work/test_pdf.pdf', 'D:/test-data-work/test_pdf.docx')
# Parsing Page 2: 2/5...Ignore Line "∑" due to overlap
# Ignore Line "∑" due to overlap
# Ignore Line "ç" due to overlap
# Ignore Line "A" due to overlap
# Ignore Line "i =1" due to overlap
# Ignore Line "æ" due to overlap
# Parsing Page 5: 5/5...
# Creating Page 5: 5/5...
# --------------------------------------------------
# Terminated in 3.2503201s.
往期精彩
为了方便,我一口气将20多个python自动化相关的模块记录了下来。
python最好用的能源类可视化图表模块,没有之一!
python如何完成对 Excel文件的解密后读取?
由于PDF的文件大多都是只读文件,有时候为了满足可以编辑的需要通常可以将PDF文件直接转换成Word文件进行操作。
看了网络上面的python转换PDF文件为Word的相关文章感觉都比较复杂,并且关于一些图表的使用还要进行特殊的处理。
本篇文章主要讲解关于如何使用python是实现将PDF转换成Word的业务过程,这次没有使用GUI应用的操作。
由于可能存在版本冲突的问题,这里将开发过程中需要使用的python非标准库的版本列举出来。
python内核版本:3.6.8
PyMuPDF版本:1.18.17
pdf2docx版本:0.5.1
可以选择pip的方式对使用到的python非标准库进行安装。
pip install PyMuPDF==1.18.17
pip install pdf2docx==0.5.1
完成上述的python依赖库安装以后,将pdf2docx导入到我们的代码块中。
# Importing the Converter class from the pdf2docx module.
from pdf2docx import Converter
然后,编写业务函数的代码块,新建一个pdfToWord函数来处理转换逻辑,主要就几行代码可以实现比较简单。
def pdfToWord(pdf_file_path=None, word_file_path=None):
"""
It takes a pdf file path and a word file path as input, and converts the pdf file to a word file.
:param pdf_file_path: The path to the PDF file you want to convert
:param word_file_path: The path to the word file that you want to create
"""
# Creating a Converter object.
converter_ = Converter(pdf_file_path)
# The `convert` method takes the path to the word file that you want to create, and the start and end pages of the PDF
# file that you want to convert.
converter_.convert(word_file_path, start=0, end=None)
converter_.close()
最后,使用main函数调用pdfToWord函数可以直接完成文档格式的转换。
# A special variable in Python that evaluates to `True` if the module is being run directly by the Python interpreter, and
# `False` if it has been imported by another module.
if __name__ == '__main__':
pdfToWord('D:/test-data-work/test_pdf.pdf', 'D:/test-data-work/test_pdf.docx')
# Parsing Page 2: 2/5...Ignore Line "∑" due to overlap
# Ignore Line "∑" due to overlap
# Ignore Line "ç" due to overlap
# Ignore Line "A" due to overlap
# Ignore Line "i =1" due to overlap
# Ignore Line "æ" due to overlap
# Parsing Page 5: 5/5...
# Creating Page 5: 5/5...
# --------------------------------------------------
# Terminated in 3.2503201s.
往期精彩
为了方便,我一口气将20多个python自动化相关的模块记录了下来。
python最好用的能源类可视化图表模块,没有之一!
python如何完成对 Excel文件的解密后读取?
版权声明:本文标题:python如何使用最简单的方式将PDF转换成Word? 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://it.en369.cn/jiaocheng/1763669284a2952981.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。


发表评论