是否有可能,使用Python,合并单独的PDF文件?
假设是这样,我需要进一步扩展它。我希望循环通过目录中的文件夹,并重复此过程。
我可能是得过其实了,但是否可以排除每个pdf文件中包含的一页(我的报告生成总是创建一个额外的空白页)。
是否有可能,使用Python,合并单独的PDF文件?
假设是这样,我需要进一步扩展它。我希望循环通过目录中的文件夹,并重复此过程。
我可能是得过其实了,但是否可以排除每个pdf文件中包含的一页(我的报告生成总是创建一个额外的空白页)。
当前回答
它是可能的,使用Python,合并单独的PDF文件?
Yes.
下面的例子将一个文件夹中的所有文件合并为一个新的PDF文件:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
from glob import glob
from pyPdf import PdfFileReader, PdfFileWriter
import os
def merge(path, output_filename):
output = PdfFileWriter()
for pdffile in glob(path + os.sep + '*.pdf'):
if pdffile == output_filename:
continue
print("Parse '%s'" % pdffile)
document = PdfFileReader(open(pdffile, 'rb'))
for i in range(document.getNumPages()):
output.addPage(document.getPage(i))
print("Start writing '%s'" % output_filename)
with open(output_filename, "wb") as f:
output.write(f)
if __name__ == "__main__":
parser = ArgumentParser()
# Add more options if you like
parser.add_argument("-o", "--output",
dest="output_filename",
default="merged.pdf",
help="write merged PDF to FILE",
metavar="FILE")
parser.add_argument("-p", "--path",
dest="path",
default=".",
help="path of source PDF files")
args = parser.parse_args()
merge(args.path, args.output_filename)
其他回答
您可以从PyPDF2模块使用pdffilemerge。
例如,要从路径列表中合并多个PDF文件,可以使用以下函数:
from PyPDF2 import PdfFileMerger
# pass the path of the output final file.pdf and the list of paths
def merge_pdf(out_path: str, extracted_files: list [str]):
merger = PdfFileMerger()
for pdf in extracted_files:
merger.append(pdf)
merger.write(out_path)
merger.close()
merge_pdf('./final.pdf', extracted_files)
这个函数从父文件夹中递归地获取所有文件:
import os
# pass the path of the parent_folder
def fetch_all_files(parent_folder: str):
target_files = []
for path, subdirs, files in os.walk(parent_folder):
for name in files:
target_files.append(os.path.join(path, name))
return target_files
# get a list of all the paths of the pdf
extracted_files = fetch_all_files('./parent_folder')
最后,使用这两个函数进行声明。可以包含多个文档的parent_folder_path,以及用于合并PDF的目的地的output_pdf_path:
# get a list of all the paths of the pdf
parent_folder_path = './parent_folder'
outup_pdf_path = './final.pdf'
extracted_files = fetch_all_files(parent_folder_path)
merge_pdf(outup_pdf_path, extracted_files)
你可以从这里获得完整的代码(来源):如何使用Python合并PDF文档
使用正确的python解释器:
conda activate py_envs
pip install PyPDF2
Python代码:
from PyPDF2 import PdfMerger
#set path files
import os
os.chdir('/ur/path/to/folder/')
cwd = os.path.abspath('')
files = os.listdir(cwd)
def merge_pdf_files():
merger = PdfMerger()
pdf_files = [x for x in files if x.endswith(".pdf")]
[merger.append(pdf) for pdf in pdf_files]
with open("merged_pdf_all.pdf", "wb") as new_file:
merger.write(new_file)
if __name__ == "__main__":
merge_pdf_files()
使用字典以获得更大的灵活性(例如sort, dedup):
import os
from PyPDF2 import PdfFileMerger
# use dict to sort by filepath or filename
file_dict = {}
for subdir, dirs, files in os.walk("<dir>"):
for file in files:
filepath = subdir + os.sep + file
# you can have multiple endswith
if filepath.endswith((".pdf", ".PDF")):
file_dict[file] = filepath
# use strict = False to ignore PdfReadError: Illegal character error
merger = PdfFileMerger(strict=False)
for k, v in file_dict.items():
print(k, v)
merger.append(v)
merger.write("combined_result.pdf")
from PyPDF2 import PdfFileMerger
import webbrowser
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
def list_files(directory, extension):
return (f for f in os.listdir(directory) if f.endswith('.' + extension))
pdfs = list_files(dir_path, "pdf")
merger = PdfFileMerger()
for pdf in pdfs:
merger.append(open(pdf, 'rb'))
with open('result.pdf', 'wb') as fout:
merger.write(fout)
webbrowser.open_new('file://'+ dir_path + '/result.pdf')
Go 回购:https://github.com/mahaguru24/Python_Merge_PDF.git
Giovanni G. PY以一种简单易用的方式(至少对我来说)给出了答案:
import os
from PyPDF2 import PdfFileMerger
def merge_pdfs(export_dir, input_dir, folder):
current_dir = os.path.join(input_dir, folder)
pdfs = os.listdir(current_dir)
merger = PdfFileMerger()
for pdf in pdfs:
merger.append(open(os.path.join(current_dir, pdf), 'rb'))
with open(os.path.join(export_dir, folder + ".pdf"), "wb") as fout:
merger.write(fout)
export_dir = r"E:\Output"
input_dir = r"E:\Input"
folders = os.listdir(input_dir)
[merge_pdfs(export_dir, input_dir, folder) for folder in folders];