我正在尝试使用命令行程序转换将PDF转换为图像(JPEG或PNG)。这是我正在转换的pdf文件之一。
我想让程序去掉多余的空白,并返回足够高质量的图像,以便上标可以轻松读取。
这是我目前最好的尝试。正如你所看到的,修剪工作很好,我只是需要锐化的分辨率相当多。这是我正在使用的命令:
convert -trim 24.pdf -resize 500% -quality 100 -sharpen 0x1.0 24-11.jpg
我试着做了以下有意识的决定:
调整它的大小(对分辨率没有影响)
尽可能提高质量
使用-锐化(我已经尝试了一系列值)
任何建议,请在最终的PNG/JPEG图像的分辨率更高,将非常感谢!
从Pdf中获取图像在iOS Swift最佳解决方案
func imageFromPdf(pdfUrl : URL,atIndex index : Int, closure:@escaping((UIImage)->Void)){
autoreleasepool {
// Instantiate a `CGPDFDocument` from the PDF file's URL.
guard let document = PDFDocument(url: pdfUrl) else { return }
// Get the first page of the PDF document.
guard let page = document.page(at: index) else { return }
// Fetch the page rect for the page we want to render.
let pageRect = page.bounds(for: .mediaBox)
let renderer = UIGraphicsImageRenderer(size: pageRect.size)
let img = renderer.image { ctx in
// Set and fill the background color.
UIColor.white.set()
ctx.fill(CGRect(x: 0, y: 0, width: pageRect.width, height: pageRect.height))
// Translate the context so that we only draw the `cropRect`.
ctx.cgContext.translateBy(x: -pageRect.origin.x, y: pageRect.size.height - pageRect.origin.y)
// Flip the context vertically because the Core Graphics coordinate system starts from the bottom.
ctx.cgContext.scaleBy(x: 1.0, y: -1.0)
// Draw the PDF page.
page.draw(with: .mediaBox, to: ctx.cgContext)
}
closure(img)
}
}
/ /使用
let pdfUrl = URL(fileURLWithPath: "PDF URL")
self.imageFromPdf2(pdfUrl: pdfUrl, atIndex: 0) { imageIS in
}
我使用开源java pdf引擎icepdf。检查办公室演示。
package image2pdf;
import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.awt.image.RenderedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
public class pdf2image {
public static void main(String[] args) {
Document document = new Document();
try {
document.setFile("C:\\Users\\Dell\\Desktop\\test.pdf");
} catch (PDFException ex) {
System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
System.out.println("Error file not found " + ex);
} catch (IOException ex) {
System.out.println("Error IOException " + ex);
}
// save page captures to file.
float scale = 1.0f;
float rotation = 0f;
// Paint each pages content to an image and
// write the image to file
for (int i = 0; i < document.getNumberOfPages(); i++) {
try {
BufferedImage image = (BufferedImage) document.getPageImage(
i, GraphicsRenderingHints.PRINT, Page.BOUNDARY_CROPBOX, rotation, scale);
RenderedImage rendImage = image;
try {
System.out.println(" capturing page " + i);
File file = new File("C:\\Users\\Dell\\Desktop\\test_imageCapture1_" + i + ".png");
ImageIO.write(rendImage, "png", file);
} catch (IOException e) {
e.printStackTrace();
}
image.flush();
}catch(Exception e){
e.printStackTrace();
}
}
// clean up resources
document.dispose();
}
}
我也尝试过imagemagick和pdftoppm, pdftoppm和icepdf的分辨率都比imagemagick高。
我真的没有很好的转换成功[更新2020年5月:实际上:它几乎从来没有为我工作],但我有非常好的pdftoppm成功。下面是几个从PDF生成高质量图像的例子:
[Produces ~25 MB-sized files per pg] Output uncompressed .tif file format at 300 DPI into a folder called "images", with files being named pg-1.tif, pg-2.tif, pg-3.tif, etc:
mkdir -p images && pdftoppm -tiff -r 300 mypdf.pdf images/pg
[Produces ~1MB-sized files per pg] Output in .jpg format at 300 DPI:
mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg
[Produces ~2MB-sized files per pg] Output in .jpg format at highest quality (least compression) and still at 300 DPI:
mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg
要了解更多解释、选项和示例,请参阅我的完整答案:
https://askubuntu.com/questions/150100/extracting-embedded-images-from-a-pdf/1187844 1187844。
相关:
[如何将PDF转换为可搜索的PDF w/pdf2searchablepdf] https://askubuntu.com/questions/473843/how-to-turn-a-pdf-into-a-text-searchable-pdf/1187881#1187881
交联:
如何将一个PDF转换成JPG与命令行在Linux?
https://unix.stackexchange.com/questions/11835/pdf-to-jpg-without-quality-loss-gscan2pdf/585574#585574
在投票之前请注意,这个解决方案是针对使用图形界面的Gimp的,而不是使用命令行的ImageMagick的,但它作为一个替代方案对我来说效果非常好,这就是为什么我发现有必要在这里分享它。
按照这些简单的步骤从PDF文档中提取任何格式的图像
Download GIMP Image Manipulation Program
Open the Program after installation
Open the PDF document that you want to extract Images
Select only the pages of the PDF document that you would want to extract images from.
N/B: If you need only the cover images, select only the first page.
Click open after selecting the pages that you want to extract images from
Click on File menu when GIMP when the pages open
Select Export as in the File menu
Select your preferred file type by extension (say png) below the dialog box that pops up.
Click on Export to export your image to your desired location.
You can then check your file explorer for the exported image.
这是所有。
我希望这对你们有帮助