How to Web Srap Html page after JS loads

Sometimes Jsoup is not enough and in cases where we want the final version of the Html file after JS (redirects etc) first loads then we can use HtmlUnit.

It makes the difference between this:

Jsoup.connect("http://www.manning.com").get.select("div.dotdbox")
<div class="dotdbox"> 
 <div style="color: #000000;text-align: center;padding: 3px 2px 0px 2px; font-size: 11px;background-color: #ffffff;"> 
  <script language="JavaScript" src="http://incsrc.manningpublications.com/dotd.js" type="text/javascript"></script> 
  <p><span style="font-size: 11px;"><a href="/free/dotd.html">Get the Deal of the Day email alert</a></span></p>
 </div> 
</div>

and that:

Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage("http://www.manning.com")
                        .asInstanceOf[HtmlPage].asXml ).select("div.dotdbox").text
<div class="dotdbox"> 
 <div style="color:#000000;text-align:center;padding:3px 2px 0;font-size:11px;background-color:#ffffff;"> 
  
                       January 19, 2014 
  <br /> 
  <br /> 
  <b> <a href="http://www.manning.com/zumel/"> Practical Data Science with R </a> </b> 
  <br /> 
  <br /> Get half off the eBook or pBook 
  <br /> 
  <br /> Enter dotd040614 in the Promotional Code box when you check out 
  <p> <span style="font-size:11px;"> <a href="/free/dotd.html"> Get the Deal of the Day email alert </a> </span> </p> 
 </div> 
</div>
Advertisements