struts2把html生成pdf,c# – 如何使用iTextSharp将HTML转换为PDF-CSDN博客

首先，HTML和PDF不相关，尽管它们是在同一时间创建的。 HTML旨在传达更高级别的信息，如段落和表格。虽然有方法来控制它，它最终取决于浏览器来绘制这些更高层次的概念。 PDF旨在传达文档，文档必须在呈现的任何地方“看”。

在HTML文档中，您可能有一个100％宽的段落，并且取决于您的显示器的宽度，它可能需要2行或10行，当您打印它可能是7行，当你看它在手机上它可能取20行。但是，PDF文件必须与呈现设备无关，因此无论您的屏幕大小如何，它都必须始终呈现完全相同。

由于上面的必须，PDF不支持抽象的东西，如“表”或“段落”。 PDF支持三种基本的东西：文本，线/形状和图像。 (还有其他事情，如注释和电影，但我想保持简单这里。)在PDF中，你不说“这里是一个段落，浏览器做你的事情！相反你说，“在这个确切的X，Y位置使用这个确切的字体绘制文本，不用担心，我已经计算了文本的宽度，所以我知道它将都适合这一行”。你也不说“这里有一个表”，而是你说“在这个确切的位置绘制这个文本，然后在这个其他确切的位置绘制一个矩形，我以前计算，所以我知道它会出现在文本的周围“。

第二，iText和iTextSharp解析HTML和CSS。而已。 ASP.Net，MVC，Razor，Struts，Spring等，都是HTML框架，但iText / iTextSharp是100％不知道他们。与DataGridViews，中继器，模板，视图等等，它们都是框架特定的抽象。它是你的责任，从你选择的框架中获取HTML，iText不会帮助你。如果你得到一个异常说文档没有页面或者你认为“iText不解析我的HTML”，它几乎确定你don’t actually have HTML，你只认为你这样做。

第三，已经存在多年的内置类是HTMLWorker，但是它已被XMLWorker(Java/.Net)替代。零工作正在做的HTMLWorker不支持CSS文件，只有有限的支持最基本的CSS属性，实际上是breaks on certain tags.如果你没有看到HTML attribute or CSS property and value in this file，那么它可能不是由HTMLWorker支持。 XMLWorker可能更复杂，有些但是那些并发症也是make it more extensible。

下面是C#代码，显示如何解析HTML标签到iText抽象中，它会自动添加到您正在处理的文档。 C#和Java非常相似，所以应该比较容易转换。示例#1使用内置的HTMLWorker来解析HTML字符串。由于只支持内联样式，class =“headline”被忽略，但其他一切应该实际工作。示例#2与第一个相同，只是它使用XMLWorker代替。示例3还解析了简单的CSS示例。

//Create a byte array that will eventually hold our final PDF

Byte[] bytes;

//Boilerplate iTextSharp setup here

//Create a stream that we can write to, in this case a MemoryStream

using (var ms = new MemoryStream()) {

//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF

using (var doc = new Document()) {

//Create a writer that's bound to our PDF abstraction and our stream

using (var writer = PdfWriter.GetInstance(doc, ms)) {

//Open the document for writing

doc.Open();

//Our sample HTML and CSS

var example_html = @"

This is some sample text!!!

var example_css = @".headline{font-size:200%}";

/**************************************************

* Example #1 *

* *

* Use the built-in HTMLWorker to parse the HTML. *

* Only inline CSS is supported. *

* ************************************************/

//Create a new HTMLWorker bound to our document

using (var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc)) {

//HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)

using (var sr = new StringReader(example_html)) {

//Parse the HTML

htmlWorker.Parse(sr);

}

/**************************************************

* Example #2 *

* *

* Use the XMLWorker to parse the HTML. *

* Only inline CSS and absolutely linked *

* CSS is supported *

* ************************************************/

//XMLWorker also reads from a TextReader and not directly from a string

using (var srHtml = new StringReader(example_html)) {

//Parse the HTML

iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);

}

/**************************************************

* Example #3 *

* *

* Use the XMLWorker to parse HTML and CSS *

* ************************************************/

//In order to read CSS as a string we need to switch to a different constructor

//that takes Streams instead of TextReaders.

//Below we convert the strings into UTF8 byte array and wrap those in MemoryStreams

using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_css))) {

using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(example_html))) {

//Parse the HTML

iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);

}

doc.Close();

}

//After all of the PDF "stuff" above is done and closed but **before** we

//close the MemoryStream, grab all of the active bytes from the stream

bytes = ms.ToArray();

}

//Now we just need to do something with those bytes.

//Here I'm writing them to disk but if you were in ASP.Net you might Response.BinaryWrite() them.

//You could also write the bytes to a database in a varbinary() column (but please don't) or you

//could pass them to another function for further PDF processing.

var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.pdf");

System.IO.File.WriteAllBytes(testFile, bytes);