是否有可能,使用Python,合并单独的PDF文件?
假设是这样,我需要进一步扩展它。我希望循环通过目录中的文件夹,并重复此过程。
我可能是得过其实了,但是否可以排除每个pdf文件中包含的一页(我的报告生成总是创建一个额外的空白页)。
是否有可能,使用Python,合并单独的PDF文件?
假设是这样,我需要进一步扩展它。我希望循环通过目录中的文件夹,并重复此过程。
我可能是得过其实了,但是否可以排除每个pdf文件中包含的一页(我的报告生成总是创建一个额外的空白页)。
当前回答
我在linux终端上通过利用subprocess(假设目录中存在one.pdf和two.pdf)使用pdf unite,目的是将它们合并为three.pdf
import subprocess
subprocess.call(['pdfunite one.pdf two.pdf three.pdf'],shell=True)
其他回答
我在linux终端上通过利用subprocess(假设目录中存在one.pdf和two.pdf)使用pdf unite,目的是将它们合并为three.pdf
import subprocess
subprocess.call(['pdfunite one.pdf two.pdf three.pdf'],shell=True)
下面是针对我的特定用例的最常见答案的时间比较:合并5个大单页pdf文件的列表。每个测试我都运行了两次。
(免责声明:我在Flask中运行这个函数,您的里程可能会有所不同)
博士TL;
pdfrw是我测试的3个pdf文件组合库中最快的一个。
PyPDF2
start = time.time()
merger = PdfFileMerger()
for pdf in all_pdf_obj:
merger.append(
os.path.join(
os.getcwd(), pdf.filename # full path
)
)
formatted_name = f'Summary_Invoice_{date.today()}.pdf'
merge_file = os.path.join(os.getcwd(), formatted_name)
merger.write(merge_file)
merger.close()
end = time.time()
print(end - start) #1 66.50084733963013 #2 68.2995400428772
PyMuPDF
start = time.time()
result = fitz.open()
for pdf in all_pdf_obj:
with fitz.open(os.path.join(os.getcwd(), pdf.filename)) as mfile:
result.insertPDF(mfile)
formatted_name = f'Summary_Invoice_{date.today()}.pdf'
result.save(formatted_name)
end = time.time()
print(end - start) #1 2.7166640758514404 #2 1.694727897644043
PDFrw
start = time.time()
result = fitz.open()
writer = PdfWriter()
for pdf in all_pdf_obj:
writer.addpages(PdfReader(os.path.join(os.getcwd(), pdf.filename)).pages)
formatted_name = f'Summary_Invoice_{date.today()}.pdf'
writer.write(formatted_name)
end = time.time()
print(end - start) #1 0.6040127277374268 #2 0.9576816558837891
from PyPDF2 import PdfFileMerger
import webbrowser
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
def list_files(directory, extension):
return (f for f in os.listdir(directory) if f.endswith('.' + extension))
pdfs = list_files(dir_path, "pdf")
merger = PdfFileMerger()
for pdf in pdfs:
merger.append(open(pdf, 'rb'))
with open('result.pdf', 'wb') as fout:
merger.write(fout)
webbrowser.open_new('file://'+ dir_path + '/result.pdf')
Go 回购:https://github.com/mahaguru24/Python_Merge_PDF.git
合并目录下的所有pdf文件
把pdf文件放到目录下。启动程序。你会得到一个合并了所有pdf文件的pdf。
import os
from PyPDF2 import PdfMerger
x = [a for a in os.listdir() if a.endswith(".pdf")]
merger = PdfMerger()
for pdf in x:
merger.append(open(pdf, 'rb'))
with open("result.pdf", "wb") as fout:
merger.write(fout)
今天我该如何编写上面相同的代码呢
from glob import glob
from PyPDF2 import PdfMerger
def pdf_merge():
''' Merges all the pdf files in current directory '''
merger = PdfMerger()
allpdfs = [a for a in glob("*.pdf")]
[merger.append(pdf) for pdf in allpdfs]
with open("Merged_pdfs.pdf", "wb") as new_file:
merger.write(new_file)
if __name__ == "__main__":
pdf_merge()
Giovanni G. PY以一种简单易用的方式(至少对我来说)给出了答案:
import os
from PyPDF2 import PdfFileMerger
def merge_pdfs(export_dir, input_dir, folder):
current_dir = os.path.join(input_dir, folder)
pdfs = os.listdir(current_dir)
merger = PdfFileMerger()
for pdf in pdfs:
merger.append(open(os.path.join(current_dir, pdf), 'rb'))
with open(os.path.join(export_dir, folder + ".pdf"), "wb") as fout:
merger.write(fout)
export_dir = r"E:\Output"
input_dir = r"E:\Input"
folders = os.listdir(input_dir)
[merge_pdfs(export_dir, input_dir, folder) for folder in folders];