pygments_在PHP和WordPress上的Pygments

pygments

I've been in a long journey trying to find a great code highlighter, I've been using a lot of them that I can't even remember. These are the ones I can remember right now:

我经过一段漫长的旅程,试图找到一个出色的代码突出显示工具,我使用了很多我什至都不记得的代码。 这些是我现在记得的那些:

  • SyntaxHighlighter

    语法突出显示
  • Google Prettifier

    Google Prettifier
  • highlighter.js

    Highlighter.js
  • Geshi

    哥士

Right now I'm using highlighter.js but it wasn't exactly what I want, what I want is to be able to highlight most "words" or reserved words, such as built in function, objects, etc. that this highlighter and most of them are missing. I know is not an important thing, unfortunately this was stuck in my head, until now.

现在,我正在使用highlighter.js但这并不是我想要的,我想要的是能够突出显示大多数“单词”或保留的单词,例如内置的函数,对象等。他们大多数都失踪了。 我知道这不是重要的事情,不幸的是,直到现在,这还停留在我的脑海中。

Finally, I've found Pygments the perfect one that match with what I've been looking for and it's the same used by GitHub. The only obstacle right now is that it's a python based syntax highlighter and I'm using WordPress, and Wordpress is built on PHP.

最后,我发现Pygments是与我一直在寻找的匹配的完美工具,并且与GitHub所使用的工具相同。 现在唯一的障碍是它是基于python的语法荧光笔,而我使用的是WordPress,而Wordpress是基于PHP构建的。

安装 (Installation)

But hey, we can get over it, there is a solution, first, we need to get python installed on our server so we can use Pygments.

但是,嘿,我们可以克服它,这里有一个解决方案,首先,我们需要在服务器上安装python,以便可以使用Pygments。

We aren't going to go too deep on installation due to the fact that there are so many OS Flavors out there and it could be slightly different on each one of them.

我们不会对安装进行深入研究,因为事实上有太多的OS Flavor,并且每个OS Flavor可能略有不同。

Python (Python)

First of all you have to check if you already have python installed by typing python on your command line.

首先,您必须通过在命令行中键入python来检查是否已安装python

If not is installed, you should take a look at Python Downloads page and download your OS installer.

如果未安装,则应查看“ Python下载”页面并下载操作系统安装程序。

PIP安装程序 (PIP Installer)

To install pip installer according to its site, there are two ways to install it:

要根据其站点安装pip installer ,有两种安装方法

First and recommended way is downloading get-pip.py and run it on your command line:

第一种推荐的方法是下载get-pip.py并在命令行上运行它:

python get-pip.py

Second way is using package managers, by running one of these possible two commands, like it have been mentioned before, this would depends on your server OS.

第二种方法是使用包管理器,方法是运行这两个可能的命令之一(如前所述),这取决于您的服务器操作系统。

sudo apt-get install python-pip

Or:

要么:

sudo yum install python-pip

NOTE: you can use any package manager you prefer, such as easy_install, for the sake of example and because is the one used on Pygments site I used pip.

注意:可以使用您喜欢的任何软件包管理器,例如easy_install ,出于示例的原因,因为它是在Pygments站点上使用的一个pip。

皮格 (Pygments)

To install pygments you need to run this command:

要安装pygments,您需要运行以下命令:

pip install Pygments

If you are on server where the user don't have root access, you would be unable to install it with the previous command, if that is the case you have to run it with --user flag to install the module on the user directory.

如果您在用户没有超级用户访问权限的服务器上,则无法使用上一个命令安装它,如果是这种情况,则必须使用--user标志运行它以在用户目录上安装该模块。 。

pip install --user Pygments

Everything is installed now, so what we got to do is work with PHP and some Python code

现在一切都已安装,因此我们要做的就是使用PHP和一些Python代码

PHP + Python (PHP + Python)

The way it's going to work, it's by executing a python script via php using exec() sending the language name and a filename of the file containing the code to be highlighted.

它的工作方式是通过使用exec()通过php执行python脚本来发送语言名称和包含要突出显示的代码的文件名。

Python (Python)

The first thing we are going to do is create the python script that is going to convert plain code into highlighted code using Pygments.

我们要做的第一件事是创建python脚本,该脚本将使用Pygments将纯代码转换为突出显示的代码。

So let's go step by step on how to create the python script.

因此,让我们逐步介绍如何创建python脚本。

First we import all the required modules:

首先,我们导入所有必需的模块:


import sys
from pygments import highlight
from pygments.formatters import HtmlFormatter


sys module provide the argv list which contains all the arguments passed to the python script.

sys模块提供了argv列表,其中包含传递给python脚本的所有参数。

highlight from pygments is in fact the main function along with a lexer would generate the highlighted code. You would read a bit more about lexer below.

从pygments中highlight实际上是主要功能以及词法分析器将生成突出显示的代码。 您将在下面阅读有关lexer的更多信息。

HtmlFormatter is how we want the code generated be formatted, and we are going to use HTML format. Here is a list of available formatters in case of wondering.

HtmlFormatter是我们希望格式化生成的代码的方式,并且我们将使用HTML格式。 如果想知道,这里是可用格式化程序列表


# Get the code
language = (sys.argv[1]).lower()
filename = sys.argv[2] 
f = open(filename, 'rb')
code = f.read()
f.close()


This block of code what it does is that it takes the second argument (sys.argv[1]) and transform it to lowercase text just to make sure it always be lowercase. Because "php" !== "PHP". The third argument sys.argv[2] is the filename path of the code, so we open, read its contents and close it. The first argument is the python's script name.

此代码块的作用是,它采用第二个参数( sys.argv[1] )并将其转换为小写文本,以确保始终为小写。 因为"php" !== "PHP" 。 第三个参数sys.argv[2]是代码的文件名路径,因此我们打开,读取其内容并关闭它。 第一个参数是python的脚本名称。


# Importing Lexers
# PHP
if language == 'php':
  from pygments.lexers import PhpLexer
  lexer = PhpLexer(startinline=True)

# GUESS
elif language == 'guess':
  from pygments.lexers import guess_lexer
  lexer = guess_lexer( code )

# GET BY NAME
else:
  from pygments.lexers import get_lexer_by_name
  lexer = get_lexer_by_name( language )


So it's time to import the lexer, this block of code what it does is create a lexer depending on the language we need to analyze. A lexer what it does it analyze our code and grab each reserved words, symbols, built-in functions, and so forth.

现在是时候导入词法分析器了,这段代码是根据我们需要分析的语言来创建词法分析器。 词法分析器的作用是分析我们的代码并获取每个保留的单词,符号,内置函数等。

In this case after the lexer analyze all the code would formatted into HTML wrapping all the "words" into an HTML element with a class. By the way the classes name are not descriptive at all, so a function is not class "function", but anyways this is not something to be worried about right now.

在这种情况下,词法分析器分析之后,所有代码都将格式化为HTML,并将所有“单词”包装到带有类HTML元素中。 顺便说一句,类名根本不是描述性的,所以函数不是类“ function”,但是无论如何这现在不必担心。

The variable language contains the string of the language name we want to convert the code, we use lexer = get_lexer_by_name( language ) to get any lexer by their names, well the function it self explanatory. But why we check for php and guess first you may ask, well, we check for php because if we use get_lexer_by_name('php') and the php code does not have the required opening php tag <?php is not going to highlight the code well or as we expected and we need to create a the specific php lexer like this lexer = PhpLexer(startinline=True) passing startinline=True as parameter, so this opening php tag is not required anymore. guess is a string we pass from php letting it know to pygments we don't know which language is it, or the language is not provided and we need it to be guessed.

变量language包含我们要转换代码的语言名称的字符串,我们使用lexer = get_lexer_by_name( language )通过名称获取任何lexer,以及该函数的自我说明。 但是为什么我们要检查php并首先猜测,您可能会问,好吧,我们要检查php,因为如果我们使用get_lexer_by_name('php')并且php代码没有所需的开头php标签, <?php将不会突出显示代码正确或符合我们的预期,我们需要创建一个特定的php词法分析器,例如lexer = PhpLexer(startinline=True)startinline=True作为参数传递,因此不再需要此php标记。 guess是一个字符串,我们将其从php传递给pygments,我们不知道它是哪种语言,或者未提供该语言,我们需要对其进行猜测。

There is a list of available lexers on their site.

他们的站点上有可用的词法分析器列表。

The final step on python is creating the HTML formatter, performing the highlighting and outputing the HTML code containing the highlighted code.

python的最后一步是创建HTML格式程序,执行突出显示并输出包含突出显示的代码HTML代码。


formatter = HtmlFormatter(linenos=False, encoding='utf-8', nowrap=True)
highlighted = highlight(code, lexer, formatter)
print highlighted


For the formatter it's passed linenos=False to not generate lines numbers and nowrap=True to not allow div wrapping the generate code. This is a personal decision, the code would be wrapped using PHP.

对于格式化程序,已传递linenos=False而不生成行号,而nowrap=True不允许div封装生成的代码。 这是个人决定,代码将使用PHP包装。

Next it's passed code containing the actual code, lexer containing the language lexer and the formatter we just create in the line above which tell the highlight how we want our code formatted.

接下来它通过code包含实际的代码, lexer包含语言词法分析器和formatter ,我们刚刚创建的线之上,告诉我们如何希望我们的代码格式化的亮点。

Finally it's output the code.

最后,它输出代码。

That's about it for python, that the script that is going to build the highlight.

对于python而言,就是要构建亮点的脚本。

Here is the complete file: build.py

这是完整的文件: build.py


import sys
from pygments import highlight
from pygments.formatters import HtmlFormatter


# If there isn't only 2 args something weird is going on
expecting = 2;
if ( len(sys.argv) != expecting + 1 ):
  exit(128)

# Get the code
language = (sys.argv[1]).lower()
filename = sys.argv[2] 
f = open(filename, 'rb')
code = f.read()
f.close()


# PHP
if language == 'php':
  from pygments.lexers import PhpLexer
  lexer = PhpLexer(startinline=True)

# GUESS
elif language == 'guess':
  from pygments.lexers import guess_lexer
  lexer = guess_lexer( code )

# GET BY NAME
else:
  from pygments.lexers import get_lexer_by_name
  lexer = get_lexer_by_name( language )
  

# OUTPUT
formatter = HtmlFormatter(linenos=False, encoding='utf-8', nowrap=True)
highlighted = highlight(code, lexer, formatter)
print highlighted


PHP-WordPress (PHP - WordPress)

Let's jump to WordPress and create a basic plugin to handle the code that needs to be highlighted.

让我们跳到WordPress并创建一个基本插件来处理需要突出显示的代码。

It's does not matter if you have never create a plugin for WordPress in your entire life, this plugin is just a file with php functions in it, so you would be just fine without the WordPress plugin development knowledge, but you need knowledge on WordPress development though.

终生都不会为WordPress创建插件并不重要,该插件只是其中包含php函数的文件,因此没有WordPress插件开发知识就可以了,但是您需要有关WordPress开发的知识虽然。

Create a folder inside wp-content/plugins named wp-pygments (can be whatever you want) and inside it copy build.py the python script we just created and create a new php file named wp-pygments.php (maybe the same name as the directory).

wp-content/plugins内创建一个名为wp-pygments的文件夹(可以随心所欲),并在其中复制build.py我们刚刚创建的python脚本,并创建一个名为wp-pygments.php的新php文件(可能是相同的名称)作为目录)。

The code below just let WordPress know what's the plugin's name and other informations, this code is going to be at the top of wp-pygments.php.

下面的代码只是让WordPress知道插件的名称和其他信息,该代码将位于wp-pygments.php的顶部。


<?php
/*
 * Plugin Name: WP Pygments
 * Plugin URI: http://wellingguzman.com/wp-pygments
 * Description: A brief description of the Plugin.
 * Version: 0.1
 * Author: Welling Guzman
 * Author URI: http://wellingguzman.com
 * License: MEH
*/
?>


Add a filter on the_content to look for <pre> tags. the code expected is:

the_content上添加一个过滤器以查找<pre>标签。 预期的代码是:


<pre class="php">
<code>
$name = "World";
echo "Hello, " . $name;
</code>
</pre>


NOTE: html tags needs to be encoded; for example < needs to be &lt; so the parse don't get confused and do it all wrong.

注意: html标签需要被编码; 例如<必须为&lt; 这样解析就不会感到困惑并且做错了所有事情。

Where class is the language of the code inside pre tags, if there is not class or is empty would pass guess to build.py.

其中classpre标记内代码的语言,如果没有class或为空,则会将guess传递给build.py


add_filter( 'the_content', 'mb_pygments_content_filter' );
function mb_pygments_content_filter( $content )
{
  $content = preg_replace_callback('/

preg_replace_callback function would execute mb_pygments_convert_code callback function every time there's a match on the content using the regex pattern provided: /<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim, it should match on any <pre><code> on a post/page content.

What about sim?, these are three pattern modifiers flags. From php.net:


add_filter( 'the_content', 'mb_pygments_content_filter' );
function mb_pygments_content_filter( $content )
{
  $content = preg_replace_callback('/

preg_replace_callback function would execute mb_pygments_convert_code callback function every time there's a match on the content using the regex pattern provided: /<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim , it should match on any <pre><code> on a post/page content.

What about sim ?, these are three pattern modifiers flags. From php.net :

  • s: If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines.

    s :如果设置了此修饰符,则模式中的点元字符将匹配所有字符,包括换行符。

  • i: If this modifier is set, letters in the pattern match both upper and lower case letters.

    i :如果设置了此修饰符,则模式中的字母将同时匹配大写和小写字母。

  • m: By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines).

    m :默认情况下,PCRE将主题字符串视为由单个“行”字符组成(即使实际上包含多个换行符)。

This can be done with DOMDocument(); as well. replace /<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim with this:

这可以通过DOMDocument(); 也一样 用以下/<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim替换/<pre(\s?class\="(.*?)")?[^>]?.*?>.*?<code>(.*?).*?/sim


// This prevent throwing error
libxml_use_internal_errors(true);

// Get all pre from post content
$dom = new DOMDocument();
$dom->loadHTML($content);
$pres = $dom->getElementsByTagName('pre');

foreach ($pres as $pre) {
  $class = $pre->attributes->getNamedItem('class')->nodeValue;
  $code = $pre->nodeValue;
  
  $args = array(
    2 => $class, // Element at position [2] is the class
    3 => $code // And element at position [2] is the code
  );
  
  // convert the code
  $new_code = mb_pygments_convert_code($args);
  
  // Replace the actual pre with the new one.
  $new_pre = $dom->createDocumentFragment();
  $new_pre->appendXML($new_code);
  $pre->parentNode->replaceChild($new_pre, $pre);
}
// Save the HTML of the new code.
$content = $dom->saveHTML();


The code below is from mb_pygments_convert_code function.

下面的代码来自mb_pygments_convert_code函数。


define( 'MB_WPP_BASE', dirname(__FILE__) );
function mb_pygments_convert_code( $matches )
{
  $pygments_build = MB_WPP_BASE . '/build.py';
  $source_code    = isset($matches[3])?$matches[3]:'';
  $class_name     = isset($matches[2])?$matches[2]:'';
  
  // Creates a temporary filename
  $temp_file      = tempnam(sys_get_temp_dir(), 'MB_Pygments_');
  
  // Populate temporary file
  $filehandle = fopen($temp_file, "w");
  fwrite($filehandle, html_entity_decode($source_code, ENT_COMPAT, 'UTF-8') );
  fclose($filehandle);
  
  // Creates pygments command
  $language   = $class_name?$class_name:'guess';
  $command    = sprintf('python %s %s %s', $pygments_build, $language, $temp_file);

  // Executes the command
  $retVal = -1;
  exec( $command, $output, $retVal );
  unlink($temp_file);
  
  // Returns Source Code
  $format = '<div class="highlight highlight-%s"><pre><code>%s</code></pre></div>';
  
  if ( $retVal == 0 )
    $source_code = implode("\n", $output);
    
  $highlighted_code = sprintf($format, $language, $source_code);
  
  return $highlighted_code;
}


Reviewing the code above:

查看上面的代码:


define( 'MB_WPP_BASE', dirname(__FILE__) );


define a absolute plugin's directory path constant.

定义绝对插件的目录路径常量。


$pygments_build = MB_WPP_BASE . '/build.py';
$source_code    = isset($matches[3])?$matches[3]:'';
$class_name     = isset($matches[2])?$matches[2]:'';


$pygments_build is the full path where the python script is located. Every time there is a match an array called $matches is passed containing 4 element. Take this as an example of a matched code from post/page content:

$pygments_build是python脚本所在的完整路径。 每次有一个匹配项时,都会传递一个名为$matches的数组,其中包含4个元素。 以此为示例,以匹配来自帖子/页面内容的代码:


<pre class="php">
<code>
$name = "World";
echo "Hello, " . $name;
</code>
</pre>


  • The element at position [0] is the whole <pre> match, and its value is:

    [0]位置的元素是整个<pre>匹配项,其值为:

    
    <pre class="php">
    <code>
    $name = "World";
    echo "Hello, " . $name;
    </code>
    </pre>
    
      
  • The element at position [1] is the class attribute name with its value, and its value is:

    位置[1]处的元素是具有其值的类属性名称,其值为:

    
    class="php"
    
      
  • The element at position [2] is the class attribute value without its name, and its value is:

    [2]位置的元素是没有其名称的class属性值,其值为:

    
    php
    
      
  • The element at position [3] is the code itself without its pre tags, and its value is:

    位置[3]的元素是代码本身,没有其pre标签,其值为:

    
    $name = "World";
    echo "Hello, " . $name;
    
      

// Creates a temporary filename
$temp_file = tempnam(sys_get_temp_dir(), 'MB_Pygments_');


it creates a temporary file containing the code that would be passed to the python script. it's a better way to handle the code would be passed. instead of passing this whole thing as a parameters it would be a mess.

它会创建一个临时文件,其中包含将传递给python脚本的代码。 这是处理代码通过的更好方法。 而不是将整个事情作为参数传递,那将是一团糟。


// Populate temporary file
$filehandle = fopen($temp_file, "wb");
fwrite($filehandle, html_entity_decode($source_code, ENT_COMPAT, 'UTF-8') );
fclose($filehandle);


It creates the file of the code, but we decode all the HTML entities, so pygments can convert them properly.

它创建了代码文件,但是我们解码了所有HTML实体,因此pygments可以正确地对其进行转换。


// Creates pygments command
$language = $class_name?$class_name:'guess';
$command  = sprintf('python %s %s %s', $pygments_build, $language, $temp_file);


It creates the python command to be used, it outputs:

它创建要使用的python命令,它输出:


python /path/to/build.py php /path/to/temp.file



// Executes the command
$retVal = -1;
exec( $command, $output, $retVal );
unlink($temp_file);
  
// Returns Source Code
$format = '<div class="highlight highlight-%s"><pre><code>%s</code></pre></div>';
  
if ( $retVal == 0 )
  $source_code = implode("\n", $output);
    
$highlighted_code = sprintf($format, $language, $source_code);


Executes the command just created and if returns 0 everything worked fine on the python script. exec(); return an array of the lines outputs from python script. so we join the array outputs into one string to be the source code. If not, we are going to stick with the code without highlight.

执行刚创建的命令,如果返回0,则在python脚本上一切正常。 exec(); 返回python脚本输出的行数组。 因此我们将数组输出连接到一个字符串中作为源代码。 如果没有,我们将坚持不加亮点的代码。

通过缓存进行改进 (Improving it by Caching)

So by now with work fine, but we have to save time and processing, imagine 100 <pre> tags on a content it would creates 100 files and call 100 times the python script, so let's cache this baby.

因此,现在可以正常工作了,但是我们必须节省时间和处理,想象一下在内容上创建100个<pre>标签,它将创建100个文件,并调用python脚本的100倍,因此让我们缓存这个宝贝。

瞬态API (Transient API)

WordPress provide the ability of storing data on the database temporarily with the Transient API.

WordPress提供了使用Transient API将数据临时存储在数据库中的功能

First, let's add a action to save_post hook, so every time the post is saved we convert the code and cache it.

首先,让我们向save_post钩子添加一个操作,因此,每次保存帖子时,我们都会转换代码并对其进行缓存。


add_action( 'save_post', 'mb_pygments_save_post' );
function mb_pygments_save_post( $post_id )
{
  if ( wp_is_post_revision( $post_id ) )
    return;
    
  $content = get_post_field( 'post_content', $post_id );
  
  mb_pygments_content_filter( $content );
}


if is a revision we don't do anything, otherwise we get the post content and call the pygments content filter function.

如果是修订版,我们什么也不做,否则我们获取帖子内容并调用pygments内容过滤器函数。

Let's create some functions to handle the cache.

让我们创建一些函数来处理缓存。


// Cache Functions
// Expiration time (1 month), let's clear cache every month.
define('MB_WPP_EXPIRATION', 60 * 60 * 24 * 30);

// This function it returns the name of a post cache.
function get_post_cache_transient()
{
  global $post;
  
  $post_id = $post->ID;
  $transient = 'post_' . $post_id . '_content';
  
  return $transient;
}

// This creates a post cache for a month,
// containing the new content with pygments
// and last time the post was updated.
function save_post_cache($content)
{
  global $post;
    
  $expiration = MB_WPP_EXPIRATION;
  $value = array( 'content'=>$content, 'updated'=>$post->post_modified );
  set_transient( get_post_cache_transient(), $value, $expiration );
}

// This returns a post cache
function get_post_cache()
{
  $cached_post = get_transient( get_post_cache_transient() );
  
  return $cached_post;
}

// Check if a post needs to be updated.
function post_cache_needs_update()
{
  global $post;
  
  $cached_post = get_post_cache();
  if ( strtotime($post->post_modified) > strtotime($cached_post['updated']) )
    return TRUE;
      
  return FALSE;
}

// Delete a post cache.
function clear_post_cache()
{ 
  delete_transient( get_post_cache_transient() );
}


At the beginning of mb_pygments_content_filter() add some lines to check if there is a cached for the post.

mb_pygments_content_filter()的开头,添加一些行以检查该帖子是否有缓存。


function mb_pygments_content_filter( $content )
{
  if ( FALSE !== ( $cached_post = get_post_cache() ) && !post_cache_needs_update() )
    return $cached_post['content'];

  clear_post_cache();


And at the end of mb_pygments_content_filter() add a line to save the post cache.

然后在mb_pygments_content_filter()的末尾添加一行以保存帖子缓存。


save_post_cache( $content );


Finally, when the plugin is uninstall we need to remove all the cache we created, this is a bit tricky, so we use $wpdb object to delete all using this a query.

最后,在卸载插件时,我们需要删除所有创建的缓存,这有点棘手,因此我们使用$wpdb对象使用此查询删除所有$wpdb


register_uninstall_hook(__FILE__, 'mb_wp_pygments_uninstall');
function mb_wp_pygments_uninstall() {
  global $wpdb;
  
  $wpdb->query( "DELETE FROM `wp_options` WHERE option_name LIKE '_transient_post_%_content' " );
}

翻译自: https://davidwalsh.name/pygments-php-wordpress

pygments

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值