I'm trying to load a Wechat official account URL https://mp.weixin.qq.com/s/PcUNumU5j2T3UD-3FN66lQ and I get some exceptions and a strange page.
The java code looks to add some useless Html tags like span and attributes like "display:none" in div which causes the div not visible (please check out the attachment below).
My code:
public static String getPageXmlByUrl(String url) {
if (!isUrl(url)) {
throw new ServerException(ResultEnum.PARAM_ERROR);
}
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setDownloadImages(false);
webClient.getOptions().setWebSocketEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = null;
try {
page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(2000);
} catch (Exception e) {
e.printStackTrace();
} finally {
webClient.close();
}
String pageXml = page.asXml();
return pageXml;
}
Exceptions:
Exceptions.txt
Result Html from the testing code
Strange Page.html.txt
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too