How to Web Srap Html page after JS loads

Sometimes Jsoup is not enough and in cases where we want the final version of the Html file after JS (redirects etc) first loads then we can use HtmlUnit.

It makes the difference between this:

<div class="dotdbox"> 
 <div style="color: #000000;text-align: center;padding: 3px 2px 0px 2px; font-size: 11px;background-color: #ffffff;"> 
  <script language="JavaScript" src="" type="text/javascript"></script> 
  <p><span style="font-size: 11px;"><a href="/free/dotd.html">Get the Deal of the Day email alert</a></span></p>

and that:

Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage("")
                        .asInstanceOf[HtmlPage].asXml ).select("div.dotdbox").text
<div class="dotdbox"> 
 <div style="color:#000000;text-align:center;padding:3px 2px 0;font-size:11px;background-color:#ffffff;"> 
                       January 19, 2014 
  <br /> 
  <br /> 
  <b> <a href=""> Practical Data Science with R </a> </b> 
  <br /> 
  <br /> Get half off the eBook or pBook 
  <br /> 
  <br /> Enter dotd040614 in the Promotional Code box when you check out 
  <p> <span style="font-size:11px;"> <a href="/free/dotd.html"> Get the Deal of the Day email alert </a> </span> </p> 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s