This is actually two related bugs, but the combination makes it extremely confusing what's going on. (I can split them if that would make it easier; but since the fix of the first bug makes the second go away, it's probably not necessary).
Tested on version 2.33
The basic problem of the first bug: When a heavy javascript function is triggered by setTimeout(), and a new page is loaded, this function will continue to run on the new page.
Example:
page1.html
:
<!DOCTYPE html>
<html>
<head>
<script src="page1.js"></script>
</head>
<body>
<h1>Page 1</h1>
<p><a href="page2.html">Page 2</a></p>
</body>
</html>
page1.js
:
function loadExtraContent()
{
for (var i = 0; i < 100; i++)
{
var p = document.createElement("p");
p.innerHTML = "new content!";
var body = document.querySelector("body");
if (body)
body.appendChild(p);
}
}
setTimeout(loadExtraContent, 1);
page2.html
:
<!DOCTYPE html>
<html>
<head>
<script src="page2.js"></script>
</head>
<body>
<h1>Page 2</h1>
<p>This is page 2</p>
</body>
</html>
(ignore the js for now, it is only important for the second bug)
Test:
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
// Load page 1. Has a setTimeout(...) function
HtmlPage page = webClient.getPage("file:page1.html");
// Immediately load page 2. Timeout function will be triggered already
page = webClient.getPage("file:page2.html");
// Wait for javascript
webClient.waitForBackgroundJavaScriptStartingBefore(100);
// Fails: return 98 (about) instead of 1
assertEquals(1, page.querySelectorAll("p").size());
}
The "loadExtraContent" method is started upon loading page 1. When we load page 2, it is still running. The if (body)
clause is there, because there is a tiny window in which the new body
is not loaded yet.
It is important that the timeout function is started before the second page is loaded, because loading a new pages causes all queued jobs in the JavaScriptJobManagerImpl
to be cancelled (side effect of WebWindowImpl.setEnclosedPage() -> destroyChildren()
. Also, a javascript job holds a lock on the HtmlPage
, so using WebClient
to just click on the "Page 2" link, would cause that click to be blocked until the execution has ended.
Suggested fix: make WebWindowImpl.setEnclosedPage
synchronized on the current enclosedPage
, or some other mechanism to make it block on the current running HtmlPage.
Basic problem of the second bug: when a script modifies the DOM tree, it prevents other scripts from loading.
Continued example:
page2.js
:
function setHeader()
{
document.querySelector("h1").textContent = 'Loaded by JS';
}
setTimeout(setHeader, 50);
Test:
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
// Load page 1. Has a setTimeout(...) function
HtmlPage page = webClient.getPage("file:page1.html");
// Immediately load page 2. Timeout function will be triggered already
page = webClient.getPage("file:page2.html");
// Wait for javascript
webClient.waitForBackgroundJavaScriptStartingBefore(100);
HtmlHeading1 h1 = page.querySelector("h1");
assertEquals("Loaded by JS", h1.getTextContent());
}
The problem is here that javascript is loaded by HtmlScript.onAllChildrenAddedToPage()
, which runs executeScriptIfNeeded()
, which checks isExecutionNeeded()
. The 4th test here is htmlPage.isParsingHtmlSnippet()
. This one is only triggered by DOM modifications like element.innerHTML = x
. So if, due to the first bug, a background script is still running and modifying the DOM, this can cause some scripts to not load. This gives really confusing error messages, and sometimes even internal HtmlUnit errors (since two processes are modifying the DOM concurrently).
This problem is really hard to trigger. This specific example fails almost never, but on our development environment, it fails often, since we load a lot of scripts which depend on each other.
One way to specifically make this example fail by forcing a specific thread interleaving:
HtmlScript.isExecutionNeeded()
and HtmlPage.registerSnippetParsingEnd()
.isExecutionNeeded
will be triggered for file1.js
. Continue.file2.js
and the first p.innerHTML
. By this point, p
will probably still refer to the old document/page, so continue the javascript thread.p
will refer to the new document/page. Continue the main thread.waitForBackgroundJavascriptStartingBefore(...)
, so remove the HtmlScript
breakpoint and continue the javascript thread.org.junit.ComparisonFailure: expected:<[Loaded by JS]> but was:<[Page 2]>
Suggested fix: Same as the previous one.
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too