On trying to scrap the content (Example Thumbnail picture of a course, price etc.) from an educative website - Udemy and searching in a general URL string (given in code snippet). The source code of the site has a division with class name - "ud-app-loader ud-component--search--search" and also sub-divisions for the courses presented on screen with div class="popper-module--popper--2BpLn".
Code used to get the HTML content from the website:
public static void getData(String courseName,String sortType) throws Exception {
String URL="https://www.udemy.com/courses/search/?lang=en&price=price-paid&q="+courseName+
"&ratings=4.5&sort=relevance&sort="+sortType+"&src=ukw";
WebClient client=new WebClient(BrowserVersion.FIREFOX);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setCssEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page=client.getPage(URL);
client.waitForBackgroundJavaScript(500000);
System.out.println(page.asXml());
}
On using the above code, the Javascript scripts are not loading properly to display the additional code snippet, which is visible in Inspect section of browser but not in source code.
Getting too many EvaluatorException exceptions at various places also. A glimpse of such an exception is as follows:
======= EXCEPTION START ======== Exception class=[org.htmlunit.corejs.javascript.EvaluatorException] org.htmlunit.ScriptException: An invalid or illegal selector was specified (selector: '[data-css-toggle-id' error: Invalid selectors: [data-css-toggle-id). (script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10)#577) at org.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:989) at org.htmlunit.corejs.javascript.Context.call(Context.java:590) at org.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:484) at org.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:349) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:867) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:843) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:834) at org.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:966) at org.htmlunit.html.ScriptElementSupport.executeInlineScriptIfNeeded(ScriptElementSupport.java:380) at org.htmlunit.html.ScriptElementSupport.executeScriptIfNeeded(ScriptElementSupport.java:230) at org.htmlunit.html.ScriptElementSupport$1.execute(ScriptElementSupport.java:120) at org.htmlunit.html.ScriptElementSupport.onAllChildrenAddedToPage(ScriptElementSupport.java:143) at org.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:191) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:601) at org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:412) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:548) at org.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1273) at org.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1200) at org.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:204) at org.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:274) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:2969) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1953) at org.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:834) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:346) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:297) at org.htmlunit.cyberneko.xerces.parsers.XMLParser.parse(XMLParser.java:76) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:838) at org.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:203) at org.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:300) at org.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:220) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:672) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:574) at org.htmlunit.WebClient.getPage(WebClient.java:492) at org.htmlunit.WebClient.getPage(WebClient.java:399) at org.htmlunit.WebClient.getPage(WebClient.java:537) at org.htmlunit.WebClient.getPage(WebClient.java:519) at org.example.Scraper.getData(Scraper.java:20) at org.example.App.main(App.java:16) Caused by: org.htmlunit.corejs.javascript.EvaluatorException: An invalid or illegal selector was specified (selector: '[data-css-toggle-id' error: Invalid selectors: [data-css-toggle-id). (script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10)#577) at org.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.runtimeError(HtmlUnitContextFactory.java:454) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:986) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:1042) at org.htmlunit.javascript.host.dom.Document.querySelectorAll(Document.java:1044) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:222) at org.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:423) at org.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1874) at org.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:1051) at org.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:89) at org.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:392) at org.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:335) at org.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3914) at org.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:102) at org.htmlunit.javascript.JavaScriptEngine$2.doRun(JavaScriptEngine.java:858) at org.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:972) ... 37 more Enclosed exception: org.htmlunit.corejs.javascript.EvaluatorException: An invalid or illegal selector was specified (selector: '[data-css-toggle-id' error: Invalid selectors: [data-css-toggle-id). (script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10)#577) at org.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.runtimeError(HtmlUnitContextFactory.java:454) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:986) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:1042) at org.htmlunit.javascript.host.dom.Document.querySelectorAll(Document.java:1044) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:222) at org.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:423) at org.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1874) at script(script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10):577) at script(script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10):576) at org.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:1051) at org.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:89) at org.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:392) at org.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:335) at org.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3914) at org.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:102) at org.htmlunit.javascript.JavaScriptEngine$2.doRun(JavaScriptEngine.java:858) at org.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:972) at org.htmlunit.corejs.javascript.Context.call(Context.java:590) at org.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:484) at org.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:349) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:867) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:843) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:834) at org.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:966) at org.htmlunit.html.ScriptElementSupport.executeInlineScriptIfNeeded(ScriptElementSupport.java:380) at org.htmlunit.html.ScriptElementSupport.executeScriptIfNeeded(ScriptElementSupport.java:230) at org.htmlunit.html.ScriptElementSupport$1.execute(ScriptElementSupport.java:120) at org.htmlunit.html.ScriptElementSupport.onAllChildrenAddedToPage(ScriptElementSupport.java:143) at org.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:191) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:601) at org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:412) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:548) at org.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1273) at org.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1200) at org.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:204) at org.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:274) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:2969) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1953) at org.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:834) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:346) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:297) at org.htmlunit.cyberneko.xerces.parsers.XMLParser.parse(XMLParser.java:76) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:838) at org.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:203) at org.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:300) at org.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:220) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:672) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:574) at org.htmlunit.WebClient.getPage(WebClient.java:492) at org.htmlunit.WebClient.getPage(WebClient.java:399) at org.htmlunit.WebClient.getPage(WebClient.java:537) at org.htmlunit.WebClient.getPage(WebClient.java:519) at org.example.Scraper.getData(Scraper.java:20) at org.example.App.main(App.java:16) ======= EXCEPTION END ========
Stackoverflow thread of this question (for complete context): How to extract the HTML elements inside <div data-module-*> from a website source code using HTMLUnit?
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too