I need a workflow like below:
// load xyz.com in the browser window
// the browser is live, meaning users can interact with it
browser.load("http://www.google.com");
// return the HTML of the initially loaded page
String page = browser.getHTML();
// after some time
// user might have navigated to a new page, get HTML again
String newpage = browser.getHTML();
I am surprised to see how hard this is to do with Java GUIs such as JavaFX (http://lexandera.com/2009/01/extracting-html-from-a-webview/) and Swing.
Is there some simple way to get this functionality in Java?
解决方案
Here is a contrived example using JavaFX that prints the html content to System.out - it should not be too complicated to adapt to create a getHtml() method. (I have tested it with JavaFX 8 but it should work with JavaFX 2 too).
The code will print the HTML content everytime a new page is loaded.
Note: I have borrowed the printDocument code from this answer.
public class TestFX extends Application {
@Override
public void start(Stage stage) throws Exception {
try {
final WebView webView = new WebView();
final WebEngine webEngine = webView.getEngine();
Scene scene = new Scene(webView);
stage.setScene(scene);
stage.setWidth(1200);
stage.setHeight(600);
stage.show();
webEngine.getLoadWorker().stateProperty().addListener(new ChangeListener() {
@Override
public void changed(ObservableValue extends State> ov, State t, State t1) {
if (t1 == Worker.State.SUCCEEDED) {
try {
printDocument(webEngine.getDocument(), System.out);
} catch (Exception e) { e.printStackTrace(); }
}
}
});
webView.getEngine().load("http://www.google.com");
} catch (Exception e) {
e.printStackTrace();
}
}
public static void printDocument(Document doc, OutputStream out) throws IOException, TransformerException {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
transformer.transform(new DOMSource(doc), new StreamResult(new OutputStreamWriter(out, "UTF-8")));
}
public static void main(String[] args) {
launch(args);
}
}