I tried converting .doc to HTML by using WordToHtmlConverter and it worked perfectly.
But when i tried to convert .docx to HTML, i got stuck with it.
What i tried:
I used the below code to convert .docx to HTML:
InputStream input = TikaInputStream.get(new File("C:\\Users\\Downloads\\filename.docx"));
Parser parser = new AutoDetectParser();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
String xml = sw.toString();
System.out.print("tika : "+xml);
} finally {
input.close();
}
The output what i got is,