在线代码高亮显示源码02

转载 2004年09月05日 18:01:00

Please choose 'View Source' in your browser to view the HTML, or File | Save to save this file to your hard drive for editing.


auto const double float int short struct unsigned break continue else for long signed switch void case default enum goto register sizeof typedef volatile char do extern if return static union while asm dynamic_cast namespace reinterpret_cast try bool explicit new static_cast typeid catch false operator template typename class friend private this using const_cast inline public throw virtual delete mutable protected true wchar_t define error import undef elif if include else ifdef line endif ifndef pragma vector set map list stack deque multimap multiset Empty False Nothing Null True Call Class Const Dim Do Loop Erase Execute ExecuteGlobal Exit For Each Next Function If Then Else On Error Option Explicit Private Property Get Property Let Property Set Public Randomize ReDim Rem Select Case Stop Sub While Wend With Clear Execute Raise Replace Test Write WriteLine Abs Array Asc Atn CBool CByte CCur CDate CDbl Chr CInt CLng Conversions Cos CreateObject CSng CStr Date DateAdd DateDiff DatePart DateSerial DateValue Day Derived Math Escape Eval Exp Filter FormatCurrency FormatDateTime FormatNumber FormatPercent GetLocale GetObject GetRef Hex Hour InputBox InStr InStrRev Int Fix IsArray IsDate IsEmpty IsNull IsNumeric IsObject Join LBound LCase Left Len LoadPicture Log LTrim RTrim and Trim Maths Mid Minute Month MonthName MsgBox Now Oct Replace RGB Right Rnd Round ScriptEngine ScriptEngineBuildVersion ScriptEngineMajorVersion ScriptEngineMinorVersion Second SetLocale Sgn Sin Space Split Sqr StrComp String StrReverse Tan Time Timer TimeSerial TimeValue TypeName UBound UCase Unescape VarType Weekday WeekdayName Year break catch continue debugger do while for in function if else Labeled return switch this throw try while var with ActiveXObject Array arguments Boolean Date Debug Enumerator Error Function Global Math Number Object RegExp String VBArray // find tag... // find end of element // find attributes... // /**/ <![CDATA[ ]]> cpp-inline cpp-pre C Boxed code
/*this is c*/
char* sText;
sText="hello/" test/
       strings";
and inlined code char* str="string"; /*comment*/.]]>
C++ Boxed code
// this is c++
char* sText;
sText="hello/" test/
       strings";
and inlined code char* str="string"; //comment.]]>
Javascript Boxed code
//this is jscript
var sText;
sText="hello/" test" + "another /
       string";
and inlined code var str="string"; // comment.]]>
VBScript Boxed code
' this is vbscript
Dim sText
sText="hello "" string" & test  & " test" 'playing with strings
and inlined code Dim str="string" 'comment.]]>

XML

a value. ]]>

Sample Image - highlight.png

New languages supported: JScript, VBScript, C, XML !

Introduction

Have you ever wondered how the CP team highlights the source code in their edited article ? I suppose it's not by hand and they must have some clever code to do it.

However, if you look around in the forums on the web, you will see that there are few if any who have this feature. Sad thing, because colored source code is much easier to read. In fact, it would be great to have source code in forums automatically colored with your favorite coloring scheme.

The last-but-not-least reason for writing this article was to learn regular expressions, javascript and DOM in one project.

The source code entirely written in JScript so it can be included server-side or client-side in your web pages.

The techniques used are:

  • regular expressions
  • XML DOM
  • XSL transformation
  • CSS style

When reading this article, I will assume that you have little knowledge of regular expressions, DOM and XSLT although I'm also a newbie in those 3 topics.

Live Demo

Before starting to explain the "colorizing" process, you can play with the demo below (you need to enable JavaScript). Copy and paste any HTML text or use the "Add ..." to generate built-in examples. The source code formated as follows

For boxed code, use pre tag:

<pre lang="...">
source code
<pre>
where ... describes the language: "c" -> C, "cpp" -> C++, "jscript" -> Javascript, "vbscript" -> VBScript, "xml" -> XML.
For inline code, use the code tag.

Enter a mixture of source code and HTML here.

Transformation overview

Parsing pipe

All the boxes will be discussed in details in the next chapter. I will give here an short overview of the process.

First, a language syntax specification file is loaded (Language specification box). This specification is a plain xml file given by the users. In order to speed up things, preprocessing is made on this document (Preprocessing box).

Let us suppose for simplicity that we have the source code to colorize (Code box). Note that I will show how to apply the coloring to a whole html page later on. The parser, using the preprocessed syntax document, builds an XML document representing the parsed code (Parsing box). The technique used by the parser is to split up the code in a succession of nodes of different types: keyword, comment, litteral, etc...

At last, an XSTL transformation are applied to the parsed code document to render it to HTML and a CSS style is given to match the desired appearance.

Parsing procedure

The philosophy used to build the parser is inspired from the Kate documentation (see [1]).

The code is considered as a succession of contexts. For example, in C++,

  • keyword: if, else, while, etc...
  • preprocessor instruction: #ifdef, ...
  • literals: "..."
  • line comment: // ...
  • block comment: /* ... */
  • and the rest.

For each context, we define rules that have 3 properties:

  1. a regular expression for matching a string
  2. the context of the text matched by the rule: attribute
  3. the context of the text following the rule: context

The rules have priority among them. For example, we will first look for a /* ... */ comment, then a // ... line comment, then litteral, etc...

When a rule is matched using a regular expression, the string matched by the rule is assigned with the attribute context, the current context is updated as context and the parsing continues. The diagram show the possible path between contexts. As one can see, some rule do not lead to a need context.

Context dynamics

Let me explain a bit the schema below. Consider that we are in the code context. We are going to look for the first match of the code rules: /**/, //, "...", keyword. Moreover, we have to take into account their priorities: a keyword is not really a keyword in a block of comment, so it has a lower priority. This task is easily and naturally done through regular expressions.

Once we find a match, we look for the rule that triggered that match (always following the priority of the rules). Therefore, pathological like is well parsed:

// a keyword while in a comment
while is not considered as a keyword since it is in a comment.

Rules available

There are 5 rules currently available:

  1. detect2chars: detects a pattern made of 2 characters.
  2. detectchar: detects a pattern made of 1 character.
  3. linecontinue: detects end of line
  4. keyword:detect a keyword out of a keyword family
  5. regexp:matches a regular expression.

regexp is by far the most powerful rule of all as all other rules are represented internally by regular expressions.

Language Specification

From the rules and context above, we derive an XML structure as described in the XSD schema below (I don't really understand xsd but .Net generates this nice diagram...)

Language specification schema. Click on the image to view it full size.

I will breifly discuss the language specification file here. For more details, look at the xsd schema or at highlight.xml specification file (for C++). Basically, you must define families of keywords, choose context and write the rule to pass from one to another.

Nodes

Name Type Parent Node Description
highlight root none

The root node

needs-build A (optional) highlight "yes" if file needs preprocessing
save-build A (optional***) highlight "yes" if file has to be saved after preprocessing
keywordlists E highlight Node containing families of keywords as children
keywordlist E keywordlist A family of keywords
id A keywordlist String identifier
pre A (optional) keywordlist Regular to append before keyword
post A (optional) keywordlist Regular to append at the end of the keyword
regexp A (optional*) keywordlist Regular expression matching the keyword family. Build by the preprocessor
kw E keywordlist Text or CDATA node containing the keywords
languages E highlight Node containing languages as children
language E languages A language specification
contexts E language A collection of context node
default A contexts String identifying the default context
context E contexts A context node containing rules as children
id A context String identifier
attribute A context The name of the node in which the context will be stored.
detect2chars** E context Rule to dectect pair of characters. (ex: /*)
char A detect2chars First character of the pattern
char1 A detect2chars Second character of the pattern
detectchar** E context Rule to dectect one character. (ex: ")
char A detectchar character to match
keyword** E context Rule to match a family of keywords
family A keyword Family indentifier, must match /highlight/keywordlists/keyword[@id]
regexp E context A regular expression to match
expression A regexp the regular expression.
Comments:
  • *: this argument is optional at the condition that preprocessing takes place. The usual way to do is to always preprocess or to preprocess once with the "save-build" parameter set to "yes" so that the preprocessing is save. Note that if you modify the language syntax, you will have to re-preprocess.
  • **: all those element have two other attributes:
    attribute (optional) A a rule The name of the node in which the string match will be stored. If not set or equal to "hidden", no node is created.
    context A a rule The next context.
  • ***: Client-side javascript is not allowed to write files. Hence, this option aplies only to server-side execution.

Preprocessing

In the preprocessing phase, we are going to build the regular expressions that will be used later on to match the rules. This section makes an extensive use of regular expressions. As mentionned before, this is not a tutorial on regular expressions since I'm also a newbie in that topic. A tool that I have found to be really useful is Expresso (see [3]) a regular expression test machine.

keyword families

Building the keyword families regular expressions is straightforward. You just need to concatenate the keywords togetter using |:
<keywordlist ...>
    <kw>if</kw>
    <kw>else</kw>
</keywordlist>
will be matched by
/b(if|else)/b

The generated regular expression is added as an attribute to the keywordlist node:

<keywordlist regexp="/b(if|else)/b">
    <kw>if</kw>
    <kw>else</kw>
</keywordlist>

When using libraries of function, it is usual to have a common function header, like for OpenGL:

glVertex2f, glPushMatrix(), etc...
You can skip the hassle of rewritting gl in all the kw items by using the attribute pre which takes a regular expression as a parameter:
<keywordlist pre="gl" ...>
    <kw>Vertex2f</kw>
    <kw>PushMatrix</kw>
</keywordlist>
will be matched by
/bgl(Vertex2f|PushMatrix)/b
You can also add regular expression after the keyword using post. Still working on our OpenGL example, there are some methods that have characters at the end to tell the type of parameters:
  • glCoord2f: takes 2 floats,
  • glRaster3f: takes 3 floats,
  • glVertex4v: takes an array of floats of size 4
Using post and regular expression, we can match it easily:
<keywordlist pre="gl" post="[2-4]{1}(f|v){1}" ...>
    <kw>Vertex</kw>
    <kw>Raster</kw>
</keywordlist>
will be matched by
/bgl(Vertex2f|PushMatrix)[2-4]{1}(f|v){1}/b

String literals

This is a little exercise on regular expression: How to match a literal string in C++? Remember that it must support /", end of line with /.

My answer (remember I'm a newbie) is

"(.|//"|///r/n)*?((////)+"|[^//]{1}")
I tested this expression on the following string:
"a simple string"
---
"a less /" simple string"
---
"a even less simple string //"
---
"a double line/
string"
---
"a double line string does not work without
backslash"
---
"Mixing" string "can/"" become "tricky"
---
"Mixing  /" nasty" string is /" even worst"

 

Contexts

The context regular expression is also build by concatenating the regular expression of the rules. The value is added as an attribute to the context node:

<context regexp="(...|...)">

Controlling if preprocessing is necessary

It is possible to skip the preprocessing phase or to save the "preprocessed" language specification file. This is done by specifying the following parameters in the root node highlight
Attribute Description Default
need-build "yes" if needs preprocessing yes
save-build "yes" if saving preprocessed language specification to disk no

Javascript call

The preprocessing phase is done through the javascript method loadAndBuildSyntax:

// language specification file
var sXMLSyntax = "highlight.xml";
// loading is done by loadXML
// preprocessing is done in loadAnd... It returns a DOMDocument
var xmlDoc = loadAndBuildSyntax( loadXML( sXMLSyntax ) );

Parsing

We are going to use the language syntax above to build an XML tree out of the source code. This tree will be made out of successive context nodes.

We can start parsing the string (pseudo-code below):

source = source code;
context = code; // current context
regExp = context.regexp; // regular expresion of the current context
while( source.length > 0)
{
Here we follow the procedure:
  1. find first match of the contextrules
  2. store the source before the match
  3. find the rule that was matched
  4. process the rule parameters
    match = regExp.execute( source );
    // check if the rules matched something
    if( !match)
    {
        // no match, creating node with the remaining source and finishing.
        addChildNode( context // name of the node,
            source // content of the node);
        break;
    }
    else
    {
The source before the match has to be stored in a new node:
        addChildNode( context, source before match);
We now have to find the rule that has matched. This is done by the method findRule that returns the rule node. The rule is then processed using attribute and context parameters.
        // getting new node
        ruleNode = findRule( match );
        // testing if matching string has to be stored
        // if yes, adding
        if (ruleNode.attribute != "hidden")
            addChildNode( attribute, match);
        // getting new context
        context=ruleNode.context;
        // getting new relar expression
        regExp=context.regexp;
    }
}

At the end of this method, we have build an XML tree containing the context. For example, consider the classic of the classic "Hello world" program below:

int main(int argc, char* argv[])
{
    // my first program
    cout<<"Hello world";
    return -1;
};
This sample is translated in the following xml structure:
<parsedcode lang="cpp" in-box="-1">
  <reservedkeyword>int</reservedkeyword>
  <code> main(</code>
  <reservedkeyword>int</reservedkeyword>
  <code> argc, ></code>
  <reservedkeyword>char</reservedkeyword>
  <code> * argv[])
{
</code>
...
Here is the specification of the resulting XML file:
Node Name Type Parent Node Description
parsedcode root Root node of document
lang A parsedcode type of language: c, cpp, jscript, etc.
in-box A parsedcode -1 if it should be enclosed in a pre tag, otherwize in code tag
code E parsedcode non special source code
and others... E parsedcode

Javascript call

The algorithm above is implemented in the applyRules method:

applyRules( languageNode, contextNode, sCode, parsedCodeNode);
where
  • languageNode is the current language node (XMLDOMNode),
  • contextNode is the start context node (XMLDOMNode),
  • sCode is the source code (String),
  • parsedCodeNode is the parent node of the parsed code (XMLDOMNode)

XSLT transformation

Once you have the XML representation of your code, you can basically do whatever you want with it using XSLT transformations.

Header

Every XSL file starts with some declarations and other standard options:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output encoding="ISO-8859-1" indent="no" omit-xml-declaration="yes"/>

Since source code indenting has to be conserved, we disable automatic indenting and, also the xml declaration is omitted:

<xsl:output encoding="ISO-8859-1" indent="no" omit-xml-declaration="yes"/>

Basic templates

<xsl:template match="cpp-linecomment">
<span class="cpp-comment">//<xsl:value-of select="text()"   disable-output-escaping="yes" /></span>
</xsl:template>

This template appies to the node cpp-linecomment which corresponds to single line comment in C++.
We apply the CSS style to this node by encapsulating it in span tags and by specifying the CSS class.
Moreovern, we do not want character escaping for that, so we use

<xsl:value-of select="text()"   disable-output-escaping="yes" /></span>

The parsedcode template

It gets a little complicated here. As everybody knows, XSL quicly becomes really complicated once you want to do more advanced stylesheets. Below is the template for parsedcode, it does simple thing but looks ugly:
Checks if in-box parameter is true, if true create pre tags, otherwize create code tags.

<xsl:template match="parsedcode">
 <xsl:choose>
  <xsl:when test="@in-box[.=0]">
   <xsl:element name="span">
    <xsl:attribute name="class">cpp-inline</xsl:attribute>
    <xsl:attribute name="lang"><xsl:value-of select="@lang"/></xsl:attribute>
    <xsl:apply-templates/>
   </xsl:element>
  </xsl:when>
  <xsl:otherwise>
   <xsl:element name="pre">
    <xsl:attribute name="class">cpp-pre</xsl:attribute>
    <xsl:attribute name="lang"><xsl:value-of select="@lang"/></xsl:attribute>
    <xsl:apply-templates/>
   </xsl:element>
  </xsl:otherwise>
 </xsl:choose>
</xsl:template>

Javascript call

This is where you have to customize a bit the methods. The rendering is done in the method highlightCode:

highlightCode( sLang, sRootTag, bInBox, sCode)
where

  • sLang is a string identifying the language ( "cpp" for C++),
  • sRootTag will the node name encapsulation the code. For example, pre for boxed code, code for inline code,
  • bInCode a boolean set to true if in-box has to be set to true.
  • sCode is the source code
  • it returns the modified code

The file names are hardcoded inside the highlightCode method: hightlight.xml for the language specification, highlight.xsl for the stylesheet. In the article, the XML syntax is embed in a xml tag and is simply accessed using the id

Applying code transformation to an entire HTML page.

So now you are wondering how to apply this transformation to an entire HTML page? Well surprisingly, this can be done in... 2 lines! In fact, there exist the method String::replace(regExp, replace) that replaces the substring matching the regular expressions regExp with replace. The best part of the story is that replace can be a function... So we just (almost) need to pass highlightCode and we are done.

For example, we want to match the code enclosed in pre tags:

// this is javascript
var regExp=/<pre>(.|/n)*?<//pre>/gim;
// render xml
var sValue =  sValue.replace( regExp,
        function( $0 )
        {
            return highlightCode("cpp", "cpp",$0.substring( 5, $0.length-6 ));
        }
    );

In practice, some checking are made on the language name and all these computations are hidden in the replaceCode method.

Using the methods in your web site

ASP pages

To use the highlightin scheme in your ASP web site:
  1. Put the javascript code between script tags in an asp page:
    <script language="javascript" runat="server">
    ...
    </script>
    
  2. include this page where you need it
  3. modify the method processAndHighlightCode to suit your needs
  4. modify the method handleException to redirect the exception to the Response
  5. apply this method to the HTML code you want to modify
  6. update your css style with the corresponding classes.

Demonstration application

The demonstration application is a hack of the CodeProject Article Helper. Type in code in pre or code to see the results.

Update History

Date Description
02-20-2002
  • Added demonstration in the article!
  • Added new languages: JScript, VBScript, C, XML
  • Now handling <pre lang="..."> bracketting: you can specify the language of the code.
  • loadAndBuildSyntax takes a DomDocument as parameter. You can call it like this: loadAndBuildSyntax( loadXML( sFileName ))
  • highlightCode takes one more argument: bInBox.
02-17-2002 Minor changes in stylesheet
02-14-2003
  • Added pre, post to the keyword rule
  • The text disapearing in <code> brackets is fixed. The bug was in processAndHighlightArcticle (bad function argument).
02-13-2003 Initial release.

在线代码高亮显示

/* Online Syntax Highlighter - converts sourcecode to html. Copyright (C) 2003, 2004 Tobi Volle...
  • flashvan
  • flashvan
  • 2004年09月05日 16:24
  • 1982

Word文档中插入高亮代码完美实现

经常写文档的伙伴应该知道,word插入代码时格式就乱了,看起来很不舒服! 网上搜了一通,通过网上介绍的方法,使用Notepad++功能可以实现,按照网上说的方法在我电脑上却没有成功,格式依旧很乱,不过...
  • dadaxiaoxiaode
  • dadaxiaoxiaode
  • 2015年12月11日 13:40
  • 3972

Word插入代码显示行号并高亮/着色显示

有时候需要编写技术文档,需要在word中插入代码,但直接复制的代码,显示效果可读性非常差,能不能把word上显示的代码弄得给IDE显示的效果一样呢? 答案是可以的。 关键的就是两步: 设置word显示...
  • zeaning
  • zeaning
  • 2015年12月16日 15:59
  • 2588

[csdn markdown]使用摘记一源代码高亮及图片上传和链接

本文主要内容是体验csdn markdown的代码块高亮显示和图片链接及上传。图片上传上边这是标题行,只需要使用一个#就可以表示,几个表示是几级标题图片上传 本地图片上传控件 本地图片上传方式 csd...
  • sushengmiyan
  • sushengmiyan
  • 2015年03月12日 16:44
  • 4447

SyntaxHighlighter高亮代码显示实例源码

  • 2009年03月03日 10:02
  • 41KB
  • 下载

高亮显示代码的在线编辑器ASP.NET版

  • 2009年11月24日 16:54
  • 345KB
  • 下载

如何使LXR索引的代码在线阅读时语法高亮显示

LXR —— Linux Cross Refercence。Linux内核源码阅读和查询的利器之一,不用多介绍了。LXR安装后看到的源码是没有颜色的,用惯了语法高亮的编辑器,一下子看到满屏的黑白代码不...
  • Alan0521
  • Alan0521
  • 2012年09月07日 11:52
  • 595

至简至美的在线编程网站Anycodes.cn开放啦!支持六种编程语言,语法高亮,行数显示,和代码自动折叠。

Code At Any Place And Any Time至简则至美的在线编程网站AnycodeC.com可以免费使用啦!支持六种编程语言,语法高亮,行数显示,和代码自动折叠。 经过多日奋战,在线编...
  • lvjing2
  • lvjing2
  • 2013年11月16日 00:08
  • 4174

CodePaste代码高亮网站源码

  • 2012年05月16日 09:46
  • 2.09MB
  • 下载

Prettify 代码高亮 源码 css js

  • 2014年11月23日 21:17
  • 4.83MB
  • 下载
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:在线代码高亮显示源码02
举报原因:
原因补充:

(最多只允许输入30个字)