Running JS Scripts from Java with HtmlUnit


Published: 2014-10-01
Updated: 2014-11-13
Web: https://fritzthecat-blog.blogspot.com/2014/10/running-js-scripts-from-java-with.html


When studying JavaScript, I got tired from always having to write an HTML page around my script, just to be able to experiment with the language features. Normally I did a lot of console.log() or alert() calls to trace the execution (in times when browser debuggers were not yet mature). But the console was invisible by default, or it could be deactivated in some browsers, or not even supported. So I wanted to see results directly on the HTML page, and I started to do logElement.innerHTML = "..." and things like this. But such code depends on the surrounding HTML and thus did not always run.

Finally I decided to create a

in Eclipse.
I downloaded the "JavaScript Development Tools" from the "Eclipse Web Tools Platform" to get syntax highlighting for my JS scripts.
Using Java 8, I then wrote a Java program that can execute a JS script in the JS engine of the Java runtime ("Nashorn"). Here is the Java source.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
public class JavaScriptRunner
{
/**
* Executes given JavaScripts, given either by URLs, file paths, or as scripts.
* Allocates a new engine instance for this call.
*/
public final void execute(String [] jsUrlsOrFilesOrScripts) throws Exception {
final ScriptEngineManager engineManager = new ScriptEngineManager();
final ScriptEngine engine = engineManager.getEngineByMimeType("text/javascript");
createGlobalBindings(engine);

for (String jsUrlOrFileOrScript : jsUrlsOrFilesOrScripts) {
Reader reader;
try {
final URL url = new URL(jsUrlOrFileOrScript);
reader = new InputStreamReader(url.openStream());
}
catch (MalformedURLException e1) {
try {
reader = new FileReader(jsUrlOrFileOrScript);
}
catch (FileNotFoundException e2) {
reader = new StringReader(jsUrlOrFileOrScript);
}
}

engine.eval(new BufferedReader(reader));
}

cleanUp();
}

/**
* Does nothing. Override this to create bindings for a BOM environment.
*/
protected void addGlobalBindings(Bindings bindings) {
}

/**
* Does nothing. Override this to clean up after JavaScript execution.
*/
protected void cleanUp() {
}

private void createGlobalBindings(ScriptEngine engine) {
final Bindings bindings = engine.createBindings();
addGlobalBindings(bindings);
engine.setBindings(bindings, ScriptContext.GLOBAL_SCOPE);
}

}

As you can see I test the JS input to be an URL, when this fails I try to load it as file, when this fails it must be the script text itself.

The problem here is that the JavaScript statement console.log() does not work. Oracle recommends to use print() instead. But this wouldn't work in a browser page.

Note: Oracle added a new console application to the JDK that can run JS scripts. It is called jjs. It prompts for input when called without arguments. It provides print() instead of console.log().

So it was necessary to imitate the web-browser a little bit in Java. Reading the internet about the JS engine in Java 8 I then found out how I could provide a JS execution environment from Java.
I wrote following class and installed it into the JS execution context (called Bindings).

1
2
3
4
5
6
7
public class ConsoleMock
{
public void log(Object text) {
System.err.println(text);
}

}

To be installed with following override:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class JavaScriptRunnerWithBomMocks extends JavaScriptRunner
{
public static void main(String [] args) throws Exception {
new JavaScriptRunnerWithBomMocks().execute(args);
}

/**
* Creates naive BOM mock bindings.
*/
protected void addGlobalBindings(Bindings bindings) {
bindings.put("console", new ConsoleMock());
}

}

Now I could load my first JS script and execute it.

1
2
3
4
5
function foo() {
console.log("Hello World");
}

foo();

This worked.
JS was executing, Java was printing "Hello World" to stderr.
Basically, if you do something like window.alert("Hello Browser") in JS, you need to provide a Java class that has a public method alert(String message), and store an instance of this class into the JS engine's execution context under the name "window". That's all. You can then pop up a JDialog in the alert() implementation if you like :-)

Finally there was a script that worked a lot with the DOM (document object model) of the HTML page. I would have liked to let it run in that mock environment, just to see if there were major bugs in it.
I thought I could write further mock implementations like ConsoleMock, and imitate the w3c DOM API by delegating to its Java implementation. But this was a big disappointment, because I found out that browsers support only small parts of that API. My scripts wanted to call getElementsByClassName(), but this is not in the w3c API!

I ended up implementing the so-called BOM (browser object model), that consists of

I installed these mock instances into the script engine like I did for the console. Result was a JavaScript environment that could edit an XML document, but not an HTML page.

Not a realistic test ground for my scripts!


The most interesting part in that work was the interaction between Java and JS. As the ConsoleMock (see above) was an example for a call from JS to Java, here is an example for a call from Java to JS. Subject is the JS window.setTimeout(callbackFunction, millis) call.
Here is the JS part:

1
2
3
4
5
function testFunction() {
console.log("I am a callback function to be passed ");
}

window.setTimeout(testFunction, 1000);

And here is the Java side.
This implementation immediately calls the JS function, in a real world there would be a real timeout before that call.

1
2
3
4
5
6
7
8
public class WindowMock
{
public int setTimeout(Runnable function, int millis) {
function.run();
return 0;
}

}

To be installed with

1
2
3
 protected void addGlobalBindings(Bindings bindings) {
bindings.put("window", new WindowMock());
}

JS seems to look at the Java types in method setTimeout() and create adequate parameter objects to pass to the Java side. In this case it finds the Runnable interface and provides a JS object with a run() function, which then can be called by the Java side.


To be able to run my scripts also with an HTML object model, I tried to find some Java implementation that could provide all JavaScript functions and objects needed. I found HtmlUnit (called "headless browser") and tried to integrate it into my Eclipse test project. For that purpose I simply added the JAR files to a local lib directory, and from there to the Eclipse build-path.
Following JARs were sufficient:


commons-codec-1.9.jar
commons-collections-3.2.1.jar
commons-io-2.4.jar
commons-lang3-3.3.2.jar
commons-logging-1.1.3.jar
cssparser-0.9.14.jar
htmlunit-2.15.jar
htmlunit-core-js-2.15.jar
httpclient-4.3.3.jar
httpcore-4.3.2.jar
httpmime-4.3.3.jar
nekohtml-1.9.21.jar
sac-1.3.jar
serializer-2.7.1.jar
xalan-2.7.1.jar
xercesImpl-2.11.0.jar
xml-apis-1.4.01.jar

The integration into the JS engine then looked like this:

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
public class JavaScriptRunnerWithHtmlUnit extends JavaScriptRunner
{
public static void main(String [] args) throws Exception {
final List<String> javaScripts = new ArrayList<String>();
final List<String> webPages = new ArrayList<String>();
for (String arg : args) {
final String argument = arg.toLowerCase();
if (argument.endsWith(".js"))
javaScripts.add(arg);
else if (argument.endsWith(".html") || argument.endsWith(".htm"))
webPages.add(arg);
else
throw new IllegalArgumentException("Unknown extension: "+arg);
}

if (javaScripts.size() <= 0)
throw new IllegalStateException("Having no JavaScripts to execute!");

if (webPages.size() <= 0)
webPages.add("http://www.this-page-intentionally-left-blank.org");

for (String webPage : webPages)
new JavaScriptRunnerWithHtmlUnit(webPage).execute(javaScripts.toArray(new String[javaScripts.size()]));
}


private final WebClient webClient;
private final HtmlPage page;

public JavaScriptRunnerWithHtmlUnit(String webPage) {
this.webClient = new WebClient(/*BrowserVersion.IE8*/);
try {
page = webClient.getPage(webPage);
}
catch (FailingHttpStatusCodeException | IOException e) {
throw new RuntimeException(e);
}
System.err.println("==========================================");
System.err.println("Interpreting web page: "+webPage);
}

/**
* Creates HtmlUnit BOM bindings.
*/
protected void addGlobalBindings(Bindings bindings) {
assert page != null;

final Window window = (Window) page.getEnclosingWindow().getScriptObject();
bindings.put("window", window);
bindings.put("console", new ConsoleMock()); // window.getConsole() has not log(String) method
bindings.put("document", window.getDocument());
bindings.put("navigator", window.getNavigator());
bindings.put("location", window.getLocation());
bindings.put("history", window.getHistory());
bindings.put("screen", window.getScreen());
}

@Override
protected void cleanUp() {
webClient.closeAllWindows();
}

}

Voila! My JS scripts ran with any HTML page I gave them!
The tricky part was (Window) page.getEnclosingWindow().getScriptObject(), it took me some time to crawl through the HtmlUnit architecture an find this JS "window" proxy.

The problem with this environment is that you must pass all JS and HTML files on commandline. And the JS files must be in correct order. This is the consequence of a missing "import" statement in JS.

So far about my way to get a JS playground for experimenting and learning.


If you do not understand my Java classes, here is short take-away Java code to try out the Java 8 JS engine with HtmlUnit. Mind that you need the JARs listed above in your CLASSPATH.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
final ScriptEngineManager engineManager = new ScriptEngineManager();
final ScriptEngine engine = engineManager.getEngineByMimeType("text/javascript");
final Bindings bindings = engine.createBindings();
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://www.this-page-intentionally-left-blank.org/");
final Window window = (Window) page.getEnclosingWindow().getScriptObject();
bindings.put("window", window);
bindings.put("document", window.getDocument());
bindings.put("navigator", window.getNavigator());
bindings.put("location", window.getLocation());
bindings.put("history", window.getHistory());
bindings.put("screen", window.getScreen());
engine.setBindings(bindings, ScriptContext.GLOBAL_SCOPE);
final URL url = new URL("http://code.jquery.com/jquery.js");
final Reader reader = new InputStreamReader(url.openStream());
engine.eval(new BufferedReader(reader));

One drawback of this environment is that I can not debug my scripts in it.

Another one is that I could not load jQuery.js.
It turned out that the Java JS engine (Nashorn) doesn't work together with HtmlUnit properly, because HtmlUnit is built on a Rhino fork (an older JS engine) and communicates via the Rhino Scriptable interface with JavaScript. For example, the JS function Array.slice() copies a given array when called without further arguments, but with the HtmlUnit nodes some unreported error seems to happen in Nashorn, resulting in an undefined return, which then makes jQuery fail.

Loading jQuery.js with the HtmlUnit Rhino fork and HtmlUnit as BOM provider finally worked:

 1
2
3
4
5
6
7
8
9
10
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://www.this-page-intentionally-left-blank.org/");
final Window window = (Window) page.getEnclosingWindow().getScriptObject();
final Context context = Context.enter();
final Scriptable scope = context.initStandardObjects(window);
ScriptableObject.putProperty(scope, "console", Context.javaToJS(new ConsoleMock(), scope));
final URL url = new URL("http://code.jquery.com/jquery.js");
final Reader reader = new InputStreamReader(url.openStream());
context.evaluateReader(scope, reader, url.toString(), 1, null);
Context.exit();

But for some reason, the same failed when I used the HtmlUnit JS facility:

1
2
3
4
5
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://www.this-page-intentionally-left-blank.org/");
final String source = "http://code.jquery.com/jquery.js";
final String scriptText = ....; // read from source URL
webClient.getJavaScriptEngine().execute(page, scriptText, source, 1);


See also HtmlUnit JavaScript API support test statistics.

You can download Java sources for launching JS from Java here.



Glossary :-)

BOM: Web-browser object model, JS global variables like window, navigator, location, ...
BOM: Byte order mark, optional leading file bytes describing the encoding of the file
POM: Maven (Java build tool) project object model
POMMES: Fried potatoes





ɔ⃝ Fritz Ritzberger, 2014-10-01