i am wondering if it is somehow possible to get the page source (after javascript has been executed) as HTML (like a browser would show it if i inspect a pages source with its developer console).
I know that i can convert a HtmlPage
to XML like this:
a given html:
<span id="dynamic"></span>
document.querySelector('#dynamic').innerHTML = "<span>dynamically added</span>";
// kotlin example to parse HTML as string to HtmlPage object
val rendered: HtmlPage = WebClient(BrowserVersion.BEST_SUPPORTED).loadHtmlCodeIntoCurrentWindow(htmlFromAboveAsString)
// convert HtmlPage object to XML as string and print it to console
which will lead to the following output:
<?xml version="1.0" encoding="UTF-8"?>
<span id="dynamic">
dynamically added
document.querySelector('#dynamic').innerHTML = "<span>dynamically added</span>";
but since this is xml (as the asXml()
function promises^^) the string will diverge from what a browser would show during DOM inspection.
because the asXml()
methods use-case is to create a valid XML, it adds a prolog that defines the XML version and the character encoding on top (<?xml version="1.0" encoding="UTF-8"?>
) as well as wrapping the innerText of script tags with a CDATA block to not clash with potential valid XML tags (like in my example a text including things like <span>dynamically added</span>
) and potentially doing even more things.
a real browser on the other hand would give me the actual html after rendering while having a look in its developer console, like this:
<span id="dynamic"><span>dynamically added</span></span>
document.querySelector('#dynamic').innerHTML = "<span>dynamically added</span>";
Is it possible to get a rendered html version as string instead of a html that has been converted to xml?
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too