PHP 转换PDF、TXT、HTML以及图像等格式的方法

这几天一直在使用PHP开发一个不同文件类型转换的项目,清源这里将各种文件格式转换的方法分享给大家,有需要的朋友可以参考一下,谢谢!


1、将PDF转换成JPG - PDF2JPG


这是一个转换图片的很简单的方法,使用前你必须保证已经安装了Image Magic。


<?php
$pdf_file = './pdf/_folder/example.pdf';
    $save_to = './image_folder/example.jpg'; //make sure that apache has permissions to write in this folder! (common problem)
    //execute ImageMagick command 'convert' and convert PDF to JPG with applied settings
    exec('convert "'.$pdf_file.'" -colorspace RGB -resize 800 "'.$save_to.'"', $output, $return_var);
    if($return_var == 0) { //if exec successfuly converted pdf to jpg
    print "Conversion OK";
    } (PS:T不错的PHP Q扣峮:276167802,验证:csl)
    else print "Conversion failed.<br />".$output;
?>


2、将HTML转换为PDF - html2ps


这是一个很实用的方法,在很多项目中都能用上。


<?php
function convert_to_pdf($url, $path_to_pdf) {
    require_once(dirname(__FILE__).'/html2ps/config.inc.php');
    require_once(HTML2PS_DIR.'pipeline.factory.class.php');
    echo WRITER_TEMPDIR;
    //error_reporting(E_ALL);
    //ini_set("display_errors","1");
    @set_time_limit(10000);
    parse_config_file(HTML2PS_DIR.'html2ps.config');
    /**
      * Handles the saving  of  generated PDF to user-defined output file on server
      */
    class MyDestinationFile extends Destination {
    /**
      * @var String result file name / path
      * @access private
      */
    var $_dest_filename;
    function MyDestinationFile($dest_filename) {
    $this->_dest_filename = $dest_filename;
    }
    function process($tmp_filename, $content_type) {
    copy($tmp_filename, $this->_dest_filename);
    }
    }
    $media = Media::predefined("A4");
    $media->set_landscape(false);
    $media->set_margins(array('left' => 5,
    'right' => 5,
    'top' => 10,
    'bottom' => 10));
    $media->set_pixels(800);
    $pipeline = PipelineFactory::create_default_pipeline("", // Auto-detect encoding
    "");
    // Override HTML source
    $pipeline->fetchers[] = new FetcherURL;
    $pipeline->data_filters[] = new DataFilterHTML2XHTML;
    $pipeline->parser = new ParserXHTML;
    $pipeline->layout_engine = new LayoutEngineDefault;
    $pipeline->output_driver = new OutputDriverFPDF($media);
    //$filter = new PreTreeFilterHeaderFooter("HEADER", "FOOTER");
    //$pipeline->pre_tree_filters[] = $filter;
    // Override destination to local file
    $pipeline->destination = new MyDestinationFile($path_to_pdf);
    global $g_config;
    $g_config = array(
    'cssmedia' => 'screen',
    'scalepoints' => '1',
    'renderimages' => true,
    'renderlinks' => true,
    'renderfields' => true,
    'renderforms' => false,
    'mode' => 'html',
    'encoding' => '',
    'debugbox' => false,
    'pdfversion' => '1.4',
    'draw_page_border' => false
    );
    $pipeline->configure($g_config);
    //$pipeline->add_feature('toc', array('location' => 'before'));
    $pipeline->process($url, $media);
    }
?>


3、将HTML换砖为TXT


如果你正在开发 搜索引擎程序,那么这段代码也许你能用的上。


<?php
    // strip javascript, styles, html tags, normalize entities and spaces
    // based on http://www.php.net/manual/en/function.strip-tags.php#68757
    function html2text($html){
    $text = $html;
    static $search = array(
    '@<script.+?</script>@usi', // Strip out javascript content
    '@<style.+?</style>@usi', // Strip style content
    '@<!--.+?-->@us', // Strip multi-line comments including CDATA
    '@</?[a-z].*?\>@usi', // Strip out HTML tags
    );
    $text = preg_replace($search, ' ', $text);
    // normalize common entities
    $text = normalizeEntities($text);
    // decode other entities
    $text = html_entity_decode($text, ENT_QUOTES, 'utf-8');
    // normalize possibly repeated newlines, tabs, spaces to spaces
    $text = preg_replace('/\s+/u', ' ', $text);
    $text = trim($text);
    // we must still run htmlentities on anything that comes out!
    // for instance:
    // <<a>script>alert('XSS')//<<a>/script>
    // will become
    // <script>alert('XSS')//</script>
    return $text;
    }
    // replace encoded and double encoded entities to equivalent unicode character
    // also see /app/bookmarkletPopup.js
    function normalizeEntities($text) {
    static $find = array();
    static $repl = array();
    if (!count($find)) {
    // build $find and $replace from map one time
    $map = array(
    array('\'', 'apos', 39, 'x27'), // Apostrophe
    array('\'', ''', 'lsquo', 8216, 'x2018'), // Open single quote
    array('\'', ''', 'rsquo', 8217, 'x2019'), // Close single quote
    array('"', '"', 'ldquo', 8220, 'x201C'), // Open double quotes
    array('"', '"', 'rdquo', 8221, 'x201D'), // Close double quotes
    array('\'', ',', 'sbquo', 8218, 'x201A'), // Single low-9 quote
    array('"', ',,', 'bdquo', 8222, 'x201E'), // Double low-9 quote
    array('\'', '′', 'prime', 8242, 'x2032'), // Prime/minutes/feet
    array('"', '′′', 'Prime', 8243, 'x2033'), // Double prime/seconds/inches
    array(' ', 'nbsp', 160, 'xA0'), // Non-breaking space
    array('-', '-', 8208, 'x2010'), // Hyphen
    array('-', '-', 'ndash', 8211, 150, 'x2013'), // En dash
    array('--', '--', 'mdash', 8212, 151, 'x2014'), // Em dash
    array(' ', ' ', 'ensp', 8194, 'x2002'), // En space
    array(' ', ' ', 'emsp', 8195, 'x2003'), // Em space
    array(' ', ' ', 'thinsp', 8201, 'x2009'), // Thin space
    array('*', 'o', 'bull', 8226, 'x2022'), // Bullet
    array('*', '?', 8227, 'x2023'), // Triangular bullet
    array('...', '...', 'hellip', 8230, 'x2026'), // Horizontal ellipsis
    array('°', 'deg', 176, 'xB0'), // Degree
    array('EUR', 'euro', 8364, 'x20AC'), // Euro
    array('¥', 'yen', 165, 'xA5'), // Yen
    array('£', 'pound', 163, 'xA3'), // British Pound
    array('?', 'copy', 169, 'xA9'), // Copyright Sign
    array('?', 'reg', 174, 'xAE'), // Registered Sign
    array('(TM)', 'trade', 8482, 'x2122') // TM Sign
    );
    foreach ($map as $e) {
    for ($i = 1; $i < count($e); ++$i) {
    $code = $e[$i];
    if (is_int($code)) {
    // numeric entity
    $regex = "/&(amp;)?#0*$code;/";
    }
    elseif (preg_match('/^.$/u', $code)/* one unicode char*/) {
    // single character
    $regex = "/$code/u";
    }
    elseif (preg_match('/^x([0-9A-F]{2}){1,2}$/i', $code)) {
    // hex entity
    $regex = "/&(amp;)?#x0*" . substr($code, 1) . ";/i";
    }
    else {
    // named entity
    $regex = "/&(amp;)?$code;/";
    }
    $find[] = $regex;
    $repl[] = $e[0];
    }
    }
    } // end first time build
    return preg_replace($find, $repl, $text);
    }
?>


以上是本文关于PHP 转换PDF、TXT、HTML以及图像等格式的方法,希望本文对广大php开发者有所帮助,感谢阅读本文。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值