Txtmark - Java markdown processor
Copyright (C) 2011-2015 René Jeschke rene_jeschke@yahoo.de
See LICENSE.txt for licensing information.
Txtmark is yet another markdown processor for the JVM.
It is easy to use:
String result = txtmark.Processor.process("This is ***TXTMARK***");
It is fast (see below)
... well, it is the fastest markdown processor on the JVM right now.
(This might be outdated, but txtmark is still flippin' fast.)
It does not depend on other libraries, so classpathing txtmark.jar is
sufficient to use Txtmark in your project.
For an in-depth explanation of markdown have a look at the original Markdown Syntax.
Maven repository
Txtmark is available on maven central.
Txtmark extensions
To enable Txtmark's extended markdown parsing you can use the $PROFILE$ mechanism:
[$PROFILE$]: extended
This seemed to me as the easiest and safest way to enable different behaviours.
Just put this line into your Txtmark file like you would use reference links.
Behavior changes when using [$PROFILE$]: extended
Lists and code blocks end a paragraph
In normal markdown the following:
This is a paragraph
* and this is not a list
Will produce:
This is a paragraph
* and this is not a list
When using Txtmark extensions this changes to:
This is a paragraph
- and this is not a list
Text anchors
Headlines and list items may recieve an ID which
you can refer to using links.
## Headline with ID ## {#headid}
Another headline with ID {#headid2}
------------------------
* List with ID {#listid}
Links: [Foo] (#headid)
this will produce:
Headline with ID
Another headline with ID
- List with ID
Links: Foo
The ID must be the last thing on the first line.
All spaces before {# get removed, so you can't
use an ID and a manual line break in the same line.
Auto HTML entities
(C) becomes © - ©
(R) becomes ® - ®
(TM) becomes ™ - ™
-- becomes – - –
--- becomes — - —
... becomes … - …
<< becomes « - «
>> becomes » - »
"Hello" becomes “Hello” - “Hello”
Underscores (Emphasis)
Underscores in the middle of a word don't result in emphasis.
Con_cat_this
normally produces this:
Concatthis
Superscript
You can use ^ to mark a span as superscript.
2^2^ = 4
turns into
22 = 4
Abbreviations
Abbreviations are defined like reference links, but using a *
instead of a link and must be single-line only.
[Git]: * "Fast distributed revision control system"
and used like this
This is [Git]!
which will produce
This is Git!
Fenced code blocks
```
This is code!
```
~~~
Another code block
~~~
~~~
You can also mix flavours
```
Fenced code block delimiter lines do start with at least three of `` or `~
It is possible to add meta data to the beginning line. Everything trailing after `` or `~ is then considered meta data. These are all valid meta lines:
```python
~ ~ ~ ~ ~java
``` ``` ``` this is even more meta
The meta information that you provide here can be used with a BlockEmitter to include e.g. syntax highlighted code blocks. Here's an example:
public class CodeBlockEmitter implements BlockEmitter
{
private static void append(StringBuilder out, List lines)
{
out.append("
");
for (final String l : lines)
{
Utils.escapedAdd(out, l);
out.append('\n');
}
out.append("
");}
@Override
public void emitBlock(StringBuilder out, List lines, String meta)
{
if (Strings.isEmpty(meta))
{
append(out, lines);
}
else
{
try
{
// Utils#highlight(...) is not included with txtmark, it's sole purpose
// is to show what the meta can be used for
out.append(Utils.highlight(lines, meta));
out.append('\n');
}
catch (final IOException e)
{
// Ignore or do something, still, pump out the lines
append(out, lines);
}
}
}
}
You can then set the BlockEmitter in the txtmark Configuration using Configuration.Builder#setCodeBlockEmitter(BlockEmitter emitter).
Markdown conformity
Txtmark passes all tests inside MarkdownTest_1.0_2007-05-09
except of two:
Images.text
Fails because Txtmark doesn't produce empty 'title' image attributes.
(IMHO: Images ... OK)
Literal quotes in titles.text
What the frell ... this test will continue to FAIL.
Sorry, but using unescaped " in a title which should be surrounded
by " is unacceptable for me ;)
Change:
Foo [bar](/url/ "Title with "quotes" inside").
[bar]: /url/ "Title with "quotes" inside"
to:
Foo [bar](/url/ "Title with \"quotes\" inside").
[bar]: /url/ "Title with \"quotes\" inside"
and Txtmark will produce the correct result.
(IMHO: Literal quotes in titles ... OK)
Where Txtmark is not like Markdown
Txtmark does not produce empty title attributes in link and image tags.
Unescaped " in link titles starting with " are not recognized and result
in unexpected behaviour.
Due to a different list parsing approach some things get interpreted differently:
* List
> Quote
will produce when processed with Markdown:
- List
Quote
and this when produced with Txtmark:
- List
Quote
Another one:
* List
====
will produce when processed with Markdown:
* List
and this when produced with Txtmark:
List
List of escapeable characters:
\ [ ] ( ) { } #
" ' . < > + - _
! ` ^
Performance comparison of markdown processors for the JVM
Remarks: These benchmarks are too old to be of any value. I leave them here as a reference, though.
Excerpt from the original post concerning this benchmark suite:
Most of these tests are of course unrealistic: Who would write a
text where each word is a link? Yet they serve an important use:
It makes it possible for the developer to pinpoint the parts of
the parser where there is most room for improvement. Also, it
explains why certain texts might render much faster in one
Processor than in another.
Benchmark system:
Ubuntu Linux 10.04 32 Bit
Intel(R) Core(TM) 2 Duo T7500 @ 2.2GHz
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Server VM (build 19.1-b02, mixed mode)
Test
Actuarius
PegDown
Knockoff
Txtmark
1st Run (ms)
2nd Run (ms)
1st Run (ms)
2nd Run (ms)
1st Run (ms)
2nd Run (ms)
1st Run (ms)
2nd Run (ms)
Plain Paragraphs
1127
577
1273
1037
740
400
157
64
Every Word Emphasized
1562
1001
1523
1513
13982
13221
54
46
Every Word Strong
1125
997
1115
1114
9543
9647
44
41
Every Word Inline Code
382
277
1058
1052
9116
9074
51
39
Every Word a Fast Link
2257
1600
537
531
3980
3410
109
55
Every Word Consisting of Special XML Chars
4045
4270
2985
3044
312
377
778
775
Every Word wrapped in manual HTML tags
3334
2919
901
896
3863
3736
73
62
Every Line with a manual line break
510
588
1445
1440
1527
1130
56
56
Every word with a full link
452
246
1045
996
1884
1819
86
55
Every word with a full image
268
150
1140
1132
1985
1908
38
36
Every word with a reference link
9847
9082
18956
18719
121136
115416
1525
1380
Every block a quote
445
206
1312
1301
478
457
50
45
Every block a codeblock
70
87
373
376
161
175
60
22
Every block a list
920
912
1720
1725
622
651
55
55
All tests together
3281
2885
5184
5196
10130
10460
206
196
Benchmarked versions:
Actuarius version: 0.2
PegDown version: 0.8.5.4
Knockoff version: 0.7.3-15
Mentioned/related projects
Markdown is Copyright (C) 2004 by John Gruber
SmartyPants is Copyright (C) 2003 by John Gruber
Actuarius is Copyright (C) 2010 by Christoph Henkelmann
Knockoff is Copyright (C) 2009-2011 by Tristan Juricek
PegDown is Copyright (C) 2010 by Mathias Doenitz
PHP Markdown & Extra is Copyright (C) 2009 Michel Fortin