今天在寻找合适的Python下PDF转图片的解决方案时,意外发现pdf.js(
Mozilla贡献的一个基于 web 标准的通用 pdf 解析和渲染库。)也可以实现PDF转图片,并且是在客户端就可以实现,代码也很简洁,
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><meta name="viewport" content="width=device-width, initial-scale = 1.0, maximum-scale = 1.0, user-scalable=yes"><script src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.4/jquery.min.js">script><script src="pdf.js">script><script src="pdf.worker.js">script><style type="text/css">#upload-button { width: 150px; display: block; margin: 20px auto;}#file-to-upload { display: none;}#pdf-main-container { width: 400px; margin: 20px auto;}#pdf-loader { display: none; text-align: center; color: #999999; font-size: 13px; line-height: 100px; height: 100px;}#pdf-contents { display: none;}#pdf-meta { overflow: hidden; margin: 0 0 20px 0;}#pdf-buttons { float: left;}#page-count-container { float: right;}#pdf-current-page { display: inline;}#pdf-total-pages { display: inline;}#pdf-canvas { border: 1px solid rgba(0,0,0,0.2); box-sizing: border-box;}#page-loader { height: 100px; line-height: 100px; text-align: center; display: none; color: #999999; font-size: 13px;}#download-image { width: 150px; display: block; margin: 20px auto 0 auto; font-size: 13px; text-align: center;}style>head><body><button id="upload-button">Select PDFbutton> <input type="file" id="file-to-upload" accept="application/pdf" /><div id="pdf-main-container"> <div id="pdf-loader">Loading document ...div> <div id="pdf-contents"> <div id="pdf-meta"> <div id="pdf-buttons"> <button id="pdf-prev">Previousbutton> <button id="pdf-next">Nextbutton> div> <div id="page-count-container">Page <div id="pdf-current-page">div> of <div id="pdf-total-pages">div>div> div> <canvas id="pdf-canvas" width="800">canvas> <div id="page-loader">Loading page ...div> <a id="download-image" href="#">Download PNGa> div>div><script>var __PDF_DOC, __CURRENT_PAGE, __TOTAL_PAGES, __PAGE_RENDERING_IN_PROGRESS = 0, __CANVAS = $('#pdf-canvas').get(0), __CANVAS_CTX = __CANVAS.getContext('2d');function showPDF(pdf_url) { $("#pdf-loader").show(); PDFJS.getDocument({ url: pdf_url }).then(function(pdf_doc) { __PDF_DOC = pdf_doc; __TOTAL_PAGES = __PDF_DOC.numPages; // Hide the pdf loader and show pdf container in HTML $("#pdf-loader").hide(); $("#pdf-contents").show(); $("#pdf-total-pages").text(__TOTAL_PAGES); // Show the first page showPage(1); }).catch(function(error) { // If error re-show the upload button $("#pdf-loader").hide(); $("#upload-button").show(); alert(error.message); });;}function showPage(page_no) { __PAGE_RENDERING_IN_PROGRESS = 1; __CURRENT_PAGE = page_no; // Disable Prev & Next buttons while page is being loaded $("#pdf-next, #pdf-prev").attr('disabled', 'disabled'); // While page is being rendered hide the canvas and show a loading message $("#pdf-canvas").hide(); $("#page-loader").show(); $("#download-image").hide(); // Update current page in HTML $("#pdf-current-page").text(page_no); // Fetch the page __PDF_DOC.getPage(page_no).then(function(page) { // As the canvas is of a fixed width we need to set the scale of the viewport accordingly var scale_required = 2;//__CANVAS.width / page.getViewport(1).width; // Get viewport of the page at required scale var viewport = page.getViewport(scale_required); // Set canvas height __CANVAS.height = viewport.height; var renderContext = { canvasContext: __CANVAS_CTX, viewport: viewport }; // Render the page contents in the canvas page.render(renderContext).then(function() { __PAGE_RENDERING_IN_PROGRESS = 0; // Re-enable Prev & Next buttons $("#pdf-next, #pdf-prev").removeAttr('disabled'); // Show the canvas and hide the page loader $("#pdf-canvas").show(); $("#page-loader").hide(); $("#download-image").show(); }); });}// Upon click this should should trigger click on the #file-to-upload file input element// This is better than showing the not-good-looking file input element$("#upload-button").on('click', function() { $("#file-to-upload").trigger('click');});// When user chooses a PDF file$("#file-to-upload").on('change', function() { // Validate whether PDF if(['application/pdf'].indexOf($("#file-to-upload").get(0).files[0].type) == -1) { alert('Error : Not a PDF'); return; } $("#upload-button").hide(); // Send the object url of the pdf showPDF(URL.createObjectURL($("#file-to-upload").get(0).files[0]));});// Previous page of the PDF$("#pdf-prev").on('click', function() { if(__CURRENT_PAGE != 1) showPage(--__CURRENT_PAGE);});// Next page of the PDF$("#pdf-next").on('click', function() { if(__CURRENT_PAGE != __TOTAL_PAGES) showPage(++__CURRENT_PAGE);});// Download button$("#download-image").on('click', function() { $(this).attr('href', __CANVAS.toDataURL()).attr('download', 'page.png');});script>body>html>
原理就是使用把PDF渲染在画布上,然后把画布里的内容保存为图片。有人说它生成的图片效果不好,其实只要根据需要调整这个变量就可以了。
var scale_required = __CANVAS.width / page.getViewport(1).width;
比如这个例子里的默认取值是画布的宽度除以PDF页面的宽度,如果我们把它改为
var scale_required =2;
效果会理想很多。