Remarks about Deprecated PDFlib Functions

 
  •              
  •              
  •              
  •              
  •              
  •              
  •              
  •              
  •              
  •              
  •              
  •              
  •            
  • PDF 函数

    Remarks about Deprecated PDFlib Functions

    Starting with PHP 4.0.5, the PHP extension for PDFlib is officially supported by PDFlib GmbH. This means that all the functions described in the PDFlib Reference Manual are supported by PHP 4 with exactly the same meaning and the same parameters. However, with PDFlib Version 5.0.4 or higher all parameters have to be specified. For compatibility reasons, this binding for PDFlib still supports most of the deprecated functions, but they should be replaced by their new versions. PDFlib GmbH will not support any problems arising from the use of these deprecated functions. The documentation in this section indicates old functions as "Deprecated" and gives the replacement function to be used instead.

    Table of Contents

    add a note add a note

    User Contributed Notes 23 notes

    up
    6
    info at tecnick dot com  ¶
    7 years ago
    For those of us that do not want to pay for a commercial license to use PDFlib I suggest TCPDF:

    http://tcpdf.sf.net

    TCPDF is an Open Source PHP class for generating PDF files on-the-fly without requiring external extensions. This class is already adopted by a large number of php projects such as phpMyAdmin, Drupal, Joomla, Xoops, TCExam, etc.  

    Starting from 2.1 version TCPDF supports UTF-8 Unicode and bidirectional languages such as Arabic and Hebrew.
    up
    5
    uwe at steinmann dot cx  ¶
    11 years ago
    Those looking for a free replacement of pdflib may consider
    pslib at http://pslib.sourceforge.net which produces PostScript but it can be easily turned into PDF by Acrobat Distiller or ghostscript. The API is very similar and even hypertext functions are supported. There
    is also a php extension for pslib in PECL, called ps.
    up
    5
    SID TRIVEDI  ¶
    7 years ago
    /*
    Folks, There is an excellent tutorial from Rasmus Lerdorf available at (It does not support I.E.)

    http://talks.php.net/show/osconpdf/

    Where PHP Mastermind Guru (Father) explained nicely about text, fonts, images and their attributes with working snippets.

    Another tutorial can be found at

    www.devshed.com/c/a/PHP/Building-PDF-Documents-with-PHP-5

    Hence following is the various size of PDF Document.

    Origin is at the lower left and the basic unit is the DTP pt.

    1 pt = 1/72 inch = 0.35277777778 mm

    Some common page sizes

    Format          Width   Height
    US-Letter      612      792
    US-Legal       612      1008
    US-Ledger     1224     792
    11x17           792      1224
    A0                2380    3368
    A1                1684    2380
    A2                1190    1684
    A3                842      1190
    A4                595      842
    A5                421      595
    A6                297      421
    B5                501      709

    */
    up
    4
    sebastien at malot dot fr  ¶
    2 years ago
    Do you have a sample pdf ?
    Can you try those classes :

    http://gist.github.com/smalot/6183152
    or
    http://www.zikko.se/resources/extractFromPDF.php

    can you see your links on the extracted content ?
    BR
    up
    2
    brendandonhue at comcast dot net  ¶
    9 years ago
    Here is a function to test whether a file is a PDF without using any external library. 
    <?php 
    define
    ('PDF_MAGIC'"\\x25\\x50\\x44\\x46\\x2D"); 
    function 
    is_pdf($filename) { 
      return (
    file_get_contents($filenamefalsenull0strlen(PDF_MAGIC)) === PDF_MAGIC) ? true false

    ?> 
    It's not checking if the whole file is valid, just if the correct header is present at the beginning of the file.
    up
    2
    bolyde at gmail dot com  ¶
    6 years ago
    Hi,
    To find the page number of a PDF File, i find this :

    <?php
    public function getNumPagesInPDF(array $arguments = array()) 
    {
    @list(
    $PDFPath) = $arguments;
    $stream = @fopen($PDFPath"r");
    $PDFContent = @fread ($streamfilesize($PDFPath));
    if(!
    $stream || !$PDFContent)
        return 
    false;
        
    $firstValue 0;
    $secondValue 0;
    if(
    preg_match("/\/N\s+([0-9]+)/"$PDFContent$matches)) {
        
    $firstValue $matches[1];
    }

    if(
    preg_match_all("/\/Count\s+([0-9]+)/s"$PDFContent$matches))
    {
        
    $secondValue max($matches[1]);
    }
    return ((
    $secondValue != 0) ? $secondValue max($firstValue$secondValue));
    }
    ?>
    up
    1
    Ken McColl  ¶
    7 years ago
    To get this to work on Windows do not use escapeshellcmd()

    From online help:
    Following characters are preceded by a backslash: #&;`|*?~<>^()[]{}$\, \x0A and \xFF. ' and " are escaped only if they are not paired. In Windows, all these characters plus % are replaced by a space instead.

    So you are probably passing duff paths to pdf2text.exe

    Removing escapeshellcmd worked for me. Just make darned sure you are in control of what is being passed through to your system call.
    up
    1
    bmironov at jonview dot com  ¶
    12 years ago
    RedHat 9 + Apache 2.0 + PHP 4.3.2 + Oracle 9i + PDFlib 5.0.1 (binary distribution)

    It seems to be a working bundle if you do some magic with ./configure:

    RedHat 9:
    kernel-2.4.20-18.9

    Apache 2.0.46:
    ./configure --enable-so --enable-rewrite=shared --enable-status --enable-mpm=prefork

    PHP 4.3.2:
    ./configure \
    --program-prefix= \
    --prefix=/usr \
    --exec-prefix=/usr \
    --bindir=/usr/bin \
    --sbindir=/usr/sbin \
    --sysconfdir=/etc \
    --datadir=/usr/share \
    --includedir=/usr/include \
    --libdir=/usr/lib \
    --libexecdir=/usr/libexec \
    --localstatedir=/var \
    --sharedstatedir=/usr/com \
    --mandir=/usr/share/man \
    --infodir=/usr/share/info \
    --with-config-file-path=/etc \
    --with-config-file-scan-dir=/etc/php.d \
    --without-tsrm-pthreads \    # !!!!!!!!!!!!!!!!!!!!
    --with-zlib \
    --with-gd \
    --enable-gd-native-ttf \
    --with-ttf \
    --without-mysql \
    --with-apxs2filter=/usr/local/apache2/bin/apxs \
    --with-oci8 \
    --enable-sigchild \
    --enable-inline-optimization

    Oracle9i:
    ln -s $ORACLE_HOME/rdbms/public/nzerror.h $ORACLE_HOME/rdbms/demo/nzerror.h

    ln -s $ORACLE_HOME/rdbms/public/nzt.h $ORACLE_HOME/rdbms/demo/nzt.h

    ln -s $ORACLE_HOME/rdbms/public/ociextp.h $ORACLE_HOME/rdbms/demo/ociextp.h

    If you want to use bundled GD-library then:
    1) install following packages: libjpeg, libjpeg-devel, libpng, libpng-devel, freetype, freetype-devel, libtiff, libtiff-devel, zlib, zlib-devel

    2) ln -s /usr/lib/libjpeg.so.62 /usr/lib/libjpeg.so
    ln -s /usr/lib/libpng.so.62 /usr/lib/libpng.so

    It seems to be a working combination, because it is NOT give you:
    1) error message in Apache's error_log:
    Module compiled with module API=20020429, debug=0, thread-safety=0
    PHP compiled with module API=20020429, debug=0, thread-safety=1

    2) error message in Apache's error_log:
    [notice] child pid 12345 exit signal Segmentation fault (11)

    3) MS Internet Explorer can show PDF-output from your PHP-script via Acrobat plug-in and does not crush. No confusing messages about opening "Adobe Acrobat Control for ActiveX".

    Hope it will save you some time.

    Good luck,
    Boris
    up
    2
    thodge at ipswich dot qld dot gov dot au  ¶
    10 years ago
    Yet another addition to the PDF text extraction code last posted by jorromer. The code only seemed to work for PDF 1.2 (Acrobat 3.x) or below. This pdfExtractText function uses regular expressions to cover cases I have found in PDF 1.3 and 1.4 documents. The code also handles closing brackets in the text stream, which were ignored by the previous version. My regular expression skills are somewhat lacking, so improvements may possible by a more skilled programmer. I'm sure there are still cases that this function will not handle, but I haven't come across any yet... 

    <?php 

    function pdf2string($sourcefile) { 

        
    $fp fopen($sourcefile'rb'); 
        
    $content fread($fpfilesize($sourcefile)); 
        
    fclose($fp); 

        
    $searchstart 'stream'
        
    $searchend 'endstream'
        
    $pdfText ''
        
    $pos 0
        
    $pos2 0
        
    $startpos 0

        while (
    $pos !== false && $pos2 !== false) { 

            
    $pos strpos($content$searchstart$startpos); 
            
    $pos2 strpos($content$searchend$startpos 1); 

            if (
    $pos !== false && $pos2 !== false){ 

                if (
    $content[$pos] == 0x0d && $content[$pos 1] == 0x0a) { 
                    
    $pos += 2
                } else if (
    $content[$pos] == 0x0a) { 
                    
    $pos++; 
                } 

                if (
    $content[$pos2 2] == 0x0d && $content[$pos2 1] == 0x0a) { 
                    
    $pos2 -= 2
                } else if (
    $content[$pos2 1] == 0x0a) { 
                    
    $pos2--; 
                } 

                
    $textsection substr
                    
    $content
                    
    $pos strlen($searchstart) + 2
                    
    $pos2 $pos strlen($searchstart) - 
                
    ); 
                
    $data = @gzuncompress($textsection); 
                
    $pdfText .= pdfExtractText($data); 
                
    $startpos $pos2 strlen($searchend) - 1

            } 
        } 

        return 
    preg_replace('/(\s)+/'' '$pdfText); 



    function 
    pdfExtractText($psData){ 

        if (!
    is_string($psData)) { 
            return 
    ''
        } 

        
    $text ''

        
    // Handle brackets in the text stream that could be mistaken for 
        // the end of a text field. I'm sure you can do this as part of the 
        // regular expression, but my skills aren't good enough yet. 
        
    $psData str_replace('\)''##ENDBRACKET##'$psData); 
        
    $psData str_replace('\]''##ENDSBRACKET##'$psData); 

        
    preg_match_all
            
    '/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si'
            
    $psData
            
    $matches 
        
    ); 
        for (
    $i 0$i sizeof($matches[0]); $i++) { 
            if (
    $matches[3][$i] != '') { 
                
    // Run another match over the contents. 
                
    preg_match_all('/\(([^)]*)\)/si'$matches[3][$i], $subMatches); 
                foreach (
    $subMatches[1] as $subMatch) { 
                    
    $text .= $subMatch
                } 
            } else if (
    $matches[4][$i] != '') { 
                
    $text .= ($matches[1][$i] == 'Tc' ' ' '') . $matches[4][$i]; 
            } 
        } 

        
    // Translate special characters and put back brackets. 
        
    $trans = array( 
            
    '...'                => '…'
            
    '\205'                => '…'
            
    '\221'                => chr(145), 
            
    '\222'                => chr(146), 
            
    '\223'                => chr(147), 
            
    '\224'                => chr(148), 
            
    '\226'                => '-'
            
    '\267'                => '•'
            
    '\('                => '('
            
    '\['                => '['
            
    '##ENDBRACKET##'    => ')'
            
    '##ENDSBRACKET##'    => ']'
            
    chr(133)            => '-'
            
    chr(141)            => chr(147), 
            
    chr(142)            => chr(148), 
            
    chr(143)            => chr(145), 
            
    chr(144)            => chr(146), 
        ); 
        
    $text strtr($text$trans); 

        return 
    $text



    ?>
    up
    0
    kangaroo232002 at yahoo dot co dot uk  ¶
    7 years ago
    To extend alex's example earlier, you can use a couple of switches inside the pdf doc to give you the total number of pages, without using any ext. I would have added the whole code, however the site keeps on saying "line is too long... yadayada".

    Open the doc using fopen("$file", "rb"); (for reading)

    Test the first approx 1000b for the following regex
    <?php
    if(preg_match("/\/N\s+([0-9]+)/"$contents$found)) {
        return 
    $found[1];
    }
    ?>

    If that doesn't return anything, you have to read the rest of the file:

    <?php

    preg_match_all
    ("/\/Type\s*\/Pages\s*\/Kids\s+
    \[.*?\]\s*\/Count\s+([0-9]+)/"
    );

    ?>

    This may return more than one, so look through for the highest value, which is the total number of pages in your doc.
    up
    0
    jorromer at uchile dot cl -- Krash  ¶
    10 years ago
    I recently use mattb code below for the extraction of text from PDF files. I modify this code for only extract text fields.

    Hope i can help some one

    Here is the Function

    <?php

      $text 
    pdf2string("file.pdf");
      echo 
    $text;

      function 
    pdf2string($sourcefile){
        
    $fp fopen($sourcefile'rb');
        
    $content fread($fpfilesize($sourcefile));
        
    fclose($fp);

        
    $searchstart 'stream';
        
    $searchend 'endstream';
        
    $pdfdocument '';
        
    $pos 0;
        
    $pos2 0;
        
    $startpos 0;
       
        while( 
    $pos !== false && $pos2 !== false ){
          
    $pos strpos($content$searchstart$startpos);
          
    $pos2 strpos($content$searchend$startpos 1);
         
          if (
    $pos !== false && $pos2 !== false){
            if (
    $content[$pos]==0x0d && $content[$pos+1]==0x0a$pos+=2;
            else if (
    $content[$pos]==0x0a$pos++;

            if (
    $content[$pos2-2]==0x0d && $content[$pos2-1]==0x0a$pos2-=2;
            else if (
    $content[$pos2-1]==0x0a$pos2--;

            
    $textsection substr($content$pos strlen($searchstart) + 2$pos2 $pos strlen($searchstart) - 1);
            
    $data = @gzuncompress($textsection);
            
    $data ExtractText2($data);
            
    $startpos $pos2 strlen($searchend) - 1;
            
            if (
    $data === false){ 
              return -
    1;}
              
            
    $pdfdocument .= $data;}}
       return 
    $pdfdocument;}

    function 
    ExtractText2($postScriptData){
      
    $sw true;
      
    $textStart 0;
      
    $len strlen($postScriptData);

      while (
    $sw){
        
    $ini strpos($postScriptData'('$textStart);
        
    $end strpos($postScriptData')'$textStart+1);
        if ((
    $ini>0) && ($end>$ini)){
          
    $valtext strpos($postScriptData,'Tj',$end+1);
          if (
    $valtext == $end 2)
            
    $text .= substr($postScriptData,$ini+1,$end $ini 1);}
          
        
    $textStart $end 1;
        if (
    $len<=$textStart$sw=false;
        
        if ((
    $ini == 0) && ($end == 0)) $sw=false;}
      
      
    $trans = array("\\341" => "a","\\351" => "e","\\355" => "i","\\363" => "o","\\223" => "","\\224" => "");
      
    $text  strtr($text$trans);
      return 
    $text;

    ?>
    up
    0
    webadmin at secretscreen dot com  ¶
    10 years ago
    I found this info about pdflib scope on a Chinese (I think) site and translated it.  I was trying to do pdf_setfont and kept getting the wrong scope error.  Turns out it has to be in the Page scope.  So pdf_setfont will only work when called between pdf_begin_page and pdf_end_page.

    #########################################
    When API of the PDFlib is called, the error, Can't - IN 'document' scope occurs 
    There is a concept of " the scope " in the PDFlib, as for all API of the PDFlib it is called with some scope, the *1 which is decided This error occurs when it is called other than the scope where API is appointed. The chart below in reference, please verify API call position.

    Path: PDF_moveto (), PDF_circle (), PDF_arc (), PDF_arcn (), PDF_rect () in each case PDF_stroke (), PDF_closepath_stroke (), PDF_fill (), PDF_fill_stroke (), PDF_closepath_fill_stroke (), PDF_clip (), PDF_endpath () the between 

    Page: PDF_begin_page () with PDF_end_page () in between outside path  

    Template: PDF_begin_template () with PDF_end_template () in between outside path  

    Pattern: PDF_begin_pattern () with PDF_end_pattern () in between outside path  

    Font: PDF_begin_font () with PDF_end_font () in between outside glyph  

    Glyph: PDF_begin_glyph () with PDF_end_glyph () in between outside path  

    Document: PDF_open_* () with PDF_close () in between outside page tempalte and pattern  

    Object: The PDF_new () with the PDF_delete () it belongs to the other no scope in between the place 

    Null: Outside object  

    Any: All scopes other than  

    ##########################################

    Hope this helps others as much as it helped me!!!
    up
    0
    chu61 dot tw at gmail dot com  ¶
    10 years ago
    How to get how many pages in a PDF? I read PDF spec. V1.6 and find this:

    PDF set  a "Page Tree Node" to define the ordering of pages in the document. The tree structure allows PDF applications, using little memory to quickly open a document containing thousands of pages.

    If a PDF have 63 pages, the page tree node will like this...

    2 0 obj
    << /Type /Pages
        /Kidsn [ 4 0 R
                   10 0 R
                 ]
         /Count 63        <---- YES, got it
    >>
    endobj

    [P.S]   a  PDF may not only a pages tree node, The right answer is in "root page tree node", if  /Count XX with  /Parent XXX node, it not "root page tree node"

    SO, You must find the node with /Count XX and Without /Parent  terms, and you'll get total pages of PDF

    %PDF-1.0  ~  %PDF-1.5 all works

    Alex form Taipei,Taiwan
    up
    0
    pbierans at lynet dot de  ¶
    13 years ago
    Load extension, open a PDF, add a font, modify PDF in memory and send
    it to browser:

    <?php
      
    // no cache headers:
      
    header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
      
    header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
      
    header("Cache-Control: no-store, no-cache, must-revalidate");
      
    header("Cache-Control: post-check=0, pre-check=0"false);
      
    header("Pragma: no-cache");

      
    $ext_name="libpdf_php.so";
        
    // libpdf_php.so is the PDFLIB for SunOS by "PDFlib GmbH"
        // visit http://www.pdflib.com

      // if the extension is not automatically loaded by Apache
      // dl() will try to load it on demand:
      
    if (!extension_loaded($ext_name) && !@dl($ext_name))
      {
        
    ?>
        <table width="100%" border="0"><tr><td align="center">
          <table style="border: solid #f0f0f0 2px;"><tr>
            <td valign="middle" style="padding: 20px; margin: 0px;">
              <p style="font-family: arial; font-size: 12px; ">
              <b>Sorry,</b><br>
              &nbsp;<br>
              A PDF can not be generated right now.<br>
              The administrator has been informed and will fix this as
              soon as possible.<br>
              Please try again later.
            </p>
          </td></tr></table>
        </td></tr></table>
        <?php
        mail
    ('admin@domain.com','Error: PDFLib not found',
             
    'Called by script:\n  '.$SCRIPT_FILENAME.'?'.$QUERY_STRING,
             
    "From: warnings@domain.com\n");
        exit;
      } 
    // verify that extension is usable

      // unique serial number:
      
    srand(microtime()*10000);
      
    $usnrgmdate("Ymd-His-").rand(1000,9999).'-';
      
    $pdf_file=$usnr.'result.pdf';
      
    $src_file='source.pdf';

      
    // create pdf object
      
    $pdf pdf_new();
      
    pdf_open_file($pdf);
      
    pdf_set_parameter($pdf'serial',      'if-you-have-one');

      
    // fonts to embed, they are in the folder of this file:
      
    pdf_set_parameter($pdf'FontAFM',     'TradeGothic=Tg______.afm');
      
    pdf_set_parameter($pdf'FontOutline''TradeGothic=Tg______.pfb');
      
    pdf_set_parameter($pdf'FontPFM',     'TradeGothic=Tg______.pfm');

      
    // load the source file:
      
    $src_doc   =pdf_open_pdi($pdf,$src_file,''0);
      
    $src_page  =pdf_open_pdi_page($pdf,$src_doc,1,'');
      
    $src_width =pdf_get_pdi_value($pdf,'width' ,$src_doc,$src_page,0);
      
    $src_height=pdf_get_pdi_value($pdf,'height',$src_doc,$src_page,0);

      
    pdf_begin_page($pdf$src_width$src_height);
      {
        
    // place the sourcefile to the background of the actual page:
        
    pdf_place_pdi_page($pdf,$src_page,0,0,1,1);
        
    pdf_close_pdi_page($pdf,$src_page);

        
    // modify the page:
        
    pdf_set_font($pdf'TradeGothic'8'host');
        
    pdf_show_xy($pdf'Now: '.gmdate("Y-m-d H:i:s"),50,50);
      }
      
    pdf_end_page($pdf);
      
    pdf_close($pdf);

      
    // prepare output:
      
    $pdfdata pdf_get_buffer($pdf); // to echo the pdf-data
      
    $pdfsize strlen($pdfdata);     // IE requires the datasize

      // real datatype headers:
      
    header('Content-type: application/pdf');
      
    header('Content-disposition: attachment; filename="'.$pdf_file.'"');
      
    header('Content-length: '.$pdfsize);
      echo 
    $pdfdata;
      exit; 
    // keep this one so no #13#10 or #32 will be written
    ?>
    up
    -1
    sebastien at malot dot fr  ¶
    1 year ago
    Hi,

    I post this comment here because I always wanted to extract text from PDF files, but I never found it.
    So now, I spread my tresor.

    So spend much time in creating a PHP library to extract text from pages.
    Based on TCPDF parser class, now my lib can handle many cases such as multiple charset encoding, base64 and octal encoding ...

    Project webiste : http://www.pdfparser.org

    <?php

    // Include Composer autoloader if not already done.
    include 'vendor/autoload.php';

    // Parse pdf file and build necessary objects.
    $parser = new \Smalot\PdfParser\Parser();
    $pdf    $parser->parseFile('document.pdf');

    // Retrieve all pages from the pdf file.
    $pages  $pdf->getPages();

    // Loop over each page to extract text.
    foreach ($pages as $page) {
        echo 
    $page->getText();
    }

    ?>

    Don't hesitate to report any bug on github issue report tool.

    Thanks
    up
    -1
    ragnar at deulos dot com  ¶
    9 years ago
    After one hole day understanding how pdflib works i got the conclusion that its enough hard to draw just with words to furthermore for drawing a line maybe you will need something like four lines of code, so i did my own functions to do the life easier and the code more understable to modify and draw. I also made a function that will draw a rect with the corners round and the posibility even to fill it ;)

    You can get it from http://www.deulos.com/pdf_php.php

    feel free to make suggestions or whatever u like ;o)
    up
    -2
    luc at phpt dot org  ¶
    8 years ago
    I am trying to extract the text from PDF files and use it to feed a search engine (Intranet tool). I tried several functions "PDF2TXT" posted below, but not they do not produce the expected result. At least, all words need to be separated by spaces (then used as keywords), and the "junk" codes removed (for example: binary data, pictures...). I start modifying the interesting function posted by Swen, and here is the my current version that starts to work quite well (with PDF version 1.2). Sorry for having a quite different style of programming. Luc

    <?php
    // Patch for pdf2txt() posted Sven Schuberth
    // Add/replace following code (cannot post full program, size limitation)

    // handles the verson 1.2
    // New version of handleV2($data), only one line changed
    function handleV2($data){
            
        
    // grab objects and then grab their contents (chunks)
        
    $a_obj getDataArray($data,"obj","endobj");
        
        foreach(
    $a_obj as $obj){
            
            
    $a_filter getDataArray($obj,"<<",">>");
        
            if (
    is_array($a_filter)){
                
    $j++;
                
    $a_chunks[$j]["filter"] = $a_filter[0];

                
    $a_data getDataArray($obj,"stream\r\n","endstream");
                if (
    is_array($a_data)){
                    
    $a_chunks[$j]["data"] = substr($a_data[0],
            
    strlen("stream\r\n"),
            
    strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
                }
            }
        }

        
    // decode the chunks
        
    foreach($a_chunks as $chunk){

            
    // look at each chunk and decide how to decode it - by looking at the contents of the filter
            
    $a_filter split("/",$chunk["filter"]);
            
            if (
    $chunk["data"]!=""){
                
    // look at the filter to find out which encoding has been used            
                
    if (substr($chunk["filter"],"FlateDecode")!==false){
                    
    $data =@ gzuncompress($chunk["data"]);
                    if (
    trim($data)!=""){
                
    // CHANGED HERE, before: $result_data .= ps2txt($data);    
                        
    $result_data .= PS2Text_New($data);
                    } else {
                    
                        
    //$result_data .= "x";
                    
    }
                }
            }
        }
        return 
    $result_data;
    }

    // New function - Extract text from PS codes
    function ExtractPSTextElement($SourceString)
    {
    $CurStartPos 0;
    while ((
    $CurStartText strpos($SourceString'('$CurStartPos)) !== FALSE)
        {
        
    // New text element found
        
    if ($CurStartText $CurStartPos 8$Spacing ' ';
        else    {
            
    $SpacingSize substr($SourceString$CurStartPos$CurStartText $CurStartPos);
            if (
    $SpacingSize < -25$Spacing ' '; else $Spacing '';
            }
        
    $CurStartText++;

        
    $StartSearchEnd $CurStartText;
        while ((
    $CurStartPos strpos($SourceString')'$StartSearchEnd)) !== FALSE)
            {
            if (
    substr($SourceString$CurStartPos 11) != '\\') break;
            
    $StartSearchEnd $CurStartPos 1;
            }
        if (
    $CurStartPos === FALSE) break; // something wrong happened
        
        // Remove ending '-'
        
    if (substr($Result, -11) == '-')
            {
            
    $Spacing '';
            
    $Result substr($Result0, -1);
            }

        
    // Add to result
        
    $Result .= $Spacing substr($SourceString$CurStartText$CurStartPos $CurStartText);
        
    $CurStartPos++;
        }
    // Add line breaks (otherwise, result is one big line...)
    return $Result "\n";
    }

    // Global table for codes replacement 
    $TCodeReplace = array ('\(' => '(''\)' => ')');

    // New function, replacing old "pd2txt" function
    function PS2Text_New($PS_Data)
    {
    global 
    $TCodeReplace;

    // Catch up some codes
    if (ord($PS_Data[0]) < 10) return ''
    if (
    substr($PS_Data08) == '/CIDInit') return '';

    // Some text inside (...) can be found outside the [...] sets, then ignored 
    // => disable the processing of [...] is the easiest solution

    $Result ExtractPSTextElement($PS_Data);

    // echo "Code=$PS_Data\nRES=$Result\n\n";

    // Remove/translate some codes
    return strtr($Result$TCodeReplace);
    }

    ?>
    up
    -1
    spingary at yahoo dot com  ¶
    9 years ago
    I was having trouble with streaming inline PDf's using PHP 5.0.2, Apache 2.0.54.

    This is my code:

    <?
    header("Pragma: public");
    header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
    header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
    header("Cache-Control: must-revalidate");
    header("Content-type: application/pdf");
    header("Content-Length: ".filesize($file));
    header("Content-disposition: inline; filename=$file");
    header("Accept-Ranges: ".filesize($file)); 
    readfile($file);
    exit();
    ?>
    It would work fine in Mozilla Firefox (1.0.7) but with IE (6.0.2800.1106) it would not bring up the Adobe Reader plugin and instead ask me to save it or open it as a PHP file.

    Oddly enough, I turned off ZLib.compression and it started working.  I guess the compression is confusing IE.  I tried leaving out the content-length header thinking maybe it was unmatched filesize (uncompressed number vs actual received compressed size), but then without it it screws up Firefox too.  

    What I ended up doing was disabling Zlib compression for the PDF output pages using ini_set:

    <?
    ini_set('zlib.output_compression','Off'); 
    ?>

    Maybe this will help someone. Will post over in the PDF section as well.
    up
    -1
    michi (Alt+Q) marel.at  ¶
    11 years ago
    <?PHP
    /* A little helpful function to calculate millimeters to points */
    function calcToPt($intMillimeter) {
      
    $intPoints = ($intMillimeter*72)/25.4;
      
    $intPoints round($intPoints);
      return 
    $intPoints;
    }

    /* For example: Create DIN A4 210x297 mm */
    pdf_begin_page$pdfcalcToPt(210), calcToPt(297)); // 595x842 pt
    ?>
    up
    -2
    MAGnUm at magnumhome dot servehttp.com  ¶
    9 years ago
    domPDF is also a great PDF creation interface. it basically converts your code to CSS and then builds the PDF from that with the absolute positions, and what not...
    up
    -3
    donatas at spurgius dot com  ¶
    11 years ago
    I've been looking for a way to extract plain text from PDF documents (needed to search for text inside 'em). Not being able to find one I wrote the needed functions myself. here you go folks.

    <?php
      
    function pdf2string ($sourceFile)
      {
        
    $textArray = array ();
        
    $objStart 0;
        
        
    $fp fopen ($sourceFile'rb');
        
    $content fread ($fpfilesize ($sourceFile));
        
    fclose ($fp);
        
        
    $searchTagStart chr(13).chr(10).'stream';
        
    $searchTagStartLenght strlen ($searchTagStart);
        
        while (((
    $objStart strpos ($content$searchTagStart$objStart)) && ($objEnd strpos ($content'endstream',$objStart+1))))
        {
          
    $data substr ($content$objStart $searchTagStartLenght 2$objEnd - ($objStart $searchTagStartLenght) -2);
          
    $data = @gzuncompress ($data);
          
          if (
    $data !== FALSE && strpos ($data'BT') !== FALSE && strpos ($data'ET') !== FALSE)
          {
            
    $textArray [] = ExtractText ($data);
          }
          
          
    $objStart $objStart $objEnd $objEnd $objStart 1;
        }
        
        return 
    $textArray;
      }
      
      function 
    ExtractText ($postScriptData)
      {
        while (((
    $textStart strpos ($postScriptData'('$textStart)) && ($textEnd strpos ($postScriptData')',$textStart 1)) && substr ($postScriptData$textEnd 1) != '\\'))
        {
          
    $plainText .= substr ($postScriptData$textStart 1$textEnd $textStart 1);
          if (
    substr ($postScriptData$textEnd 11) == ']'//this adds quite some additional spaces between the words
          
    {
            
    $plainText .= ' ';
          }
          
          
    $textStart $textStart $textEnd $textEnd $textStart 1;
        }
        
        return 
    stripslashes ($plainText);
      }
    ?>
    up
    -4
    ontwerp AT zonnet.nl  ¶
    9 years ago
    I was searching for a lowcost/opensource option for combining static html files [as templates] and dynamic output from perl or php routines etc. And the sooner or later I found out that this was the most stable, 'speedest' and customizeable way to produce usable pdf 's with nice formatting :

    1] create html page output [perl-> html output, direct html output from any app or php echo's etc. [sort these html files locally]

    2] parse all html [inluding webimages links, tables font formatting etc] to [E]PS files with the perl app : html2ps [as mentioned beneath] 
    http://user.it.uu.se/~jan/html2ps.html [sort all ps files by future pdf page positions]

    3] use the free ps2pdf/ps2pdfwr linux application 
    http://www.ps2pdf.com/convert/index.htm [uses gostscript, ghostview libs and so on etc]
    Has great formatting options like headers, footers, numbering etc
    [sort pdf files]

    4] convert all pdf files to 1 pdf file with : pdftk [pdftoolkit], deliveres optional compressions/encryption, background stamps etc

    One should ask why using different scripts :
    - combination perl/php is great : perl is speedier at some issues like conversion to ps files in my experience
    - ps to pdf is quickier then direct php to pdf [in my exp.!]
    - I have total control over every files whenever i change html files as a template I use only editors or other app. for it [online or offline].

    p.s. I had to make a opensource solution for creating simpel report analyses that's based on things like :
    - first page [name / title / #/ date]
    - some static info [like introduction, copyrights etc]
    - some dynamic info [outputted from php->dbase queries] combined
    with html tags/images etc.

    And this all mixed [so seperated in files for transparancy]. Also the 3 way manner : data-> html, html->ps, ps->pdf, is easier and quickier to program or adjust in every step.

    Correct me if i'm wrong [mail me to]

    ing. Valentijn Langendorff
    Design & Technologist
    up
    -8
    bondo2 at bondo2 dot info  ¶
    6 years ago
    <?php 

    //getting new instance 
    $pdfFile new_pdf(); 

    PDF_open_file($pdfFile" "); 

    //document info 
    pdf_set_info($pdfFile"Auther""Ahmed Elbshry"); 
    pdf_set_info($pdfFile"Creator""Ahmed Elbshry"); 
    pdf_set_info($pdfFile"Title""PDFlib"); 
    pdf_set_info($pdfFile"Subject""Using PDFlib"); 

    //starting our page and define the width and highet of the document 
    pdf_begin_page($pdfFile595842); 

    //check if Arial font is found, or exit 
    if($font PDF_findfont($pdfFile"Arial""winansi"1)) { 
        
    PDF_setfont($pdfFile$font12); 
    } else { 
        echo (
    "Font Not Found!"); 
        
    PDF_end_page($pdfFile); 
        
    PDF_close($pdfFile); 
        
    PDF_delete($pdfFile); 
        exit(); 


    //start writing from the point 50,780 
    PDF_show_xy($pdfFile"This Text In Arial Font"50780); 
    PDF_end_page($pdfFile); 
    PDF_close($pdfFile); 

    //store the pdf document in $pdf 
    $pdf PDF_get_buffer($pdfFile); 
    //get  the len to tell the browser about it 
    $pdflen strlen($pdfFile); 

    //telling the browser about the pdf document 
    header("Content-type: application/pdf"); 
    header("Content-length: $pdflen"); 
    header("Content-Disposition: inline; filename=phpMade.pdf"); 
    //output the document 
    print($pdf); 
    //delete the object 
    PDF_delete($pdfFile); 
    ?>
    • 0
      点赞
    • 0
      收藏
      觉得还不错? 一键收藏
    • 0
      评论
    提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。

    “相关推荐”对你有帮助么?

    • 非常没帮助
    • 没帮助
    • 一般
    • 有帮助
    • 非常有帮助
    提交
    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值