Visitor Design Pattern in Scala

Scala provides built-in support for the Visitor Design Pattern through the use of pattern matching.

The Visitor Design Pattern is trying to address the problem of never-ending new functionality that is otherwise implemented by adding new methods in the inheritance tree.

According to the Visitor pattern the inheritance tree is decoupled from a new functionality and encapsulated in a separate object that packs all implementations according to the inheritance tree normally by using method overloading of all the various types.

In Scala by the use of the built-in pattern matching this becomes very easy:

class Animal { def walk:String }

class Dog extends Animal { override def walk = "on 4" }

class Man extends Animal { override def walk = "on 2" }

 *  Visitor Pattern provides a solution to never-ending new
 *  functionality that would otherwise be implemented with new
 *  method in the hierarchy tree.
def talk(animal: Animal) = animal match {
  case Dog => "wav wav"
  case Man => "hi"

 *  New functionality implemented in separate methods that
 *  uses pattern matching to implement tailored functionality.
def swim(animal: Animal) = animal match {
  case Dog => "on 4"
  case Man => "on 4"

Strategy Pattern in Scala – A Pragmatic Example

Let’s say we want to create a newsletter grouping together all the Daily Deals coming from popular IT Book Publishers.

We can source the Daily Deal information by web-scrapping the publisher website.

We’ll employ the Strategy Pattern as we want to encapsulate the web-scrapping algorithm that is distinct for each publisher website and we want to maintain flexibility of introducing new publishers in the future without altering the core context.

Essentially where we want to get to is a form like this:

new WebScrappingContext(Strategy).dailyDeal(url)

In the above form, the Strategy can vary and be interchangeable, paired with the publisher url that will be applied to the Strategy Functor.

Let’s start by defining a helpful type Strategy that gets a url and produces the Daily Deal:

type WebScrappingStrategy = String => String

Next we’ll create the context that would be hosting the Strategy Functor and would apply the url string on it:

  case class WebScrap(strategy: WebScrappingStrategy) {
    def dailyDeal(url:String) = strategy( url )

Note that we have used a case class but we could as well use a method to achieve the same effect:

def dailyDeal( url:String, webScrappingStrategy: WebScrappingStrategy ) = webScrappingStrategy( url )

Then we’ll get busy with web-scrapping publisher websites.

The Manning Daily Deal web-scrapper Strategy:

    def ManningWebScrappingStrategy: WebScrappingStrategy  =
      (url:String) =>
      Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage( url )
                    .asInstanceOf[HtmlPage].asXml )
        .select("div.dotdbox b").text

Note that the Manning website is using JavaScript to create the Daily Deal section that Jsoup cannot parse therefore we use HtmlUnit.

One step further with the call to the Strategy context case class:

  def ManningDailyDeal = {

    val ManningWebScrappingStrategy: WebScrappingStrategy  =
      (url:String) =>
      Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage( url )
                    .asInstanceOf[HtmlPage].asXml )
        .select("div.dotdbox b").text

    WebScrap( ManningWebScrappingStrategy ).dailyDeal( "" )

The Strategy Pattern is the last returned line in the above variable:

WebScrap( ManningWebScrappingStrategy ).dailyDeal( "" )

The url is inherent to the Strategy and ultimately to the Publisher therefore the above grouping under the ManningDailyDeal variable.

Similarly here are the other Publisher Strategies:

  def OReillyDailyDeal = {

    val OReillyWebScrappingStrategy: WebScrappingStrategy = Jsoup.connect(_)"a[href$=DEAL] strong").get(0).text

    WebScrap( OReillyWebScrappingStrategy ).dailyDeal( "" )
  def APressDailyDeal = {

    val APressWebScrappingStrategy: WebScrappingStrategy = Jsoup.connect(_)"div.block-dotd").get(0).select("a")

    WebScrap( APressWebScrappingStrategy ).dailyDeal( "" )
  def SpringerDailyDeal = {

    val SpringerWebScrappingStrategy: WebScrappingStrategy = Jsoup.connect(_)"div.block-dotd").get(1).select("a")

    WebScrap( SpringerWebScrappingStrategy ).dailyDeal( "" )

That’s pretty much it for the Strategy Pattern. If we want to take it one step further packing it up in a nice Factory Method OO Pattern:

  trait Publisher
  object Manning extends Publisher
  object APress extends Publisher
  object Springer extends Publisher
  object OReilly extends Publisher

  object DailyDeal {
    def apply(publisher: Publisher) = publisher match {
      case Manning => ManningDailyDeal
      case APress => APressDailyDeal
      case Springer => SpringerDailyDeal
      case OReilly => OReillyDailyDeal

So we can make calls like so:


Full code on this GitHub repository.

How to Web Srap Html page after JS loads

Sometimes Jsoup is not enough and in cases where we want the final version of the Html file after JS (redirects etc) first loads then we can use HtmlUnit.

It makes the difference between this:

<div class="dotdbox"> 
 <div style="color: #000000;text-align: center;padding: 3px 2px 0px 2px; font-size: 11px;background-color: #ffffff;"> 
  <script language="JavaScript" src="" type="text/javascript"></script> 
  <p><span style="font-size: 11px;"><a href="/free/dotd.html">Get the Deal of the Day email alert</a></span></p>

and that:

Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage("")
                        .asInstanceOf[HtmlPage].asXml ).select("div.dotdbox").text
<div class="dotdbox"> 
 <div style="color:#000000;text-align:center;padding:3px 2px 0;font-size:11px;background-color:#ffffff;"> 
                       January 19, 2014 
  <br /> 
  <br /> 
  <b> <a href=""> Practical Data Science with R </a> </b> 
  <br /> 
  <br /> Get half off the eBook or pBook 
  <br /> 
  <br /> Enter dotd040614 in the Promotional Code box when you check out 
  <p> <span style="font-size:11px;"> <a href="/free/dotd.html"> Get the Deal of the Day email alert </a> </span> </p>