Possibly the same issue (or not) : refer to #551
Using HtmlUnit v3.0.0
Touching on a different website this time, but also has Incapsula javascript. This time the javascript loads past the Incapsula but still hangs slightly after. There's also a noticeable CPU activity increase but nothing happens even after I left it for more than 1 hour. It did not throw a OOM error either.
Using just the code below, with only HtmlUnit as a singular depedency, the issue is reproducible everytime. Results has a very rare chance where it is able to progress to the typing username
line and even rarer it goes to the typing password
but it completely stops after and I have never gotten past to page 2 loaded. Even if it eventually works the reliability of it would be a cause of concern.
public static void main(String[] args) {
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
new JavaScriptInterceptor(webClient);
HtmlPage page1 = webClient.getPage("https://iadvisor.zurich.com.my/Pages/Login.aspx");
System.out.println("### Page 1 loaded");
((HtmlTextInput) page1.getByXPath("//input[contains(@id, 'txtUserId')]").get(0)).type("username");
System.out.println("### Typing username");
((HtmlInput) page1.getByXPath("//input[contains(@id, 'txtPassword')]").get(0)).type("password");
System.out.println("### Typing password");
HtmlPage page2 = ((HtmlInput) page1.getByXPath("//input[contains(@id, 'imgbtnLogin')]").get(0)).click();
System.out.println("### Page 2 loaded");
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public static class JavaScriptInterceptor extends FalsifyingWebConnection {
public JavaScriptInterceptor(WebClient webClient) throws IllegalArgumentException {
super(webClient);
}
@Override
public WebResponse getResponse(WebRequest request) throws IOException {
System.out.println("Request : " + request.getUrl());
return super.getResponse(request);
}
}
I have tried skipping Incapsula using the method below but it still gets stuck at the exact same place so I'm half inclined to believe the issue is unrelated to Incapsula.
I have also tried disabling javascript all together. This causes the webpage to not process the login properly and I get the below results
URLs are the same as the above code snippet. As visible on the debugger, both page1
and page2
returns the same webpage which should not be the case.
Additional stuff :
I've managed to bypass alot of Javascript by sending a WebRequest with the correct information for the login but down the line there's a page that manipulates viewstates which till now I have had no success in reverse engineering the required parameters. Would be great if the javascript issue can be further improved upon as this has became insanely complex at this point.
Foot note : I opened a new issue as I'm not sure if this is still caused by the Incapsula script. Also thanks for the Incapsula fix, I managed to get through one of the other websites that I am scraping now.
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too