ConcurrentHashMap computeIfAbsent method in Java 8

The very nifty method computeIfAbsent has been added in the ConcurrentMap interface in Java 8 as part of the atomic operations of the ConcurrentMap interface. It’s more precisely a default method that provides an alternative to what we use to code ourselves:

if (map.get(key) == null) {
   V newValue = mappingFunction.apply(key);
   if (newValue != null)
      return map.putIfAbsent(key, newValue);
   }
}

but this time providing a function as a second argument.

Most often this method will be used in the context of ConcurrentHashMap in which case the method is implemented in a thread-safe synchronised way.

In terms of usage the method is handy for situations where we want to maintain a thread-safe cache of expensive one-off computed resources.

Here’s another example of holding a key-value pair where value is a thread-safe counter represented by an AtomicInteger:

private final Map counters = new ConcurrentHashMap();

private void accumulate(String name) {
    cnts.computeIfAbsent(name, k -> new AtomicInteger()).incrementAndGet();
}

Visitor Design Pattern in Scala

Scala provides built-in support for the Visitor Design Pattern through the use of pattern matching.

The Visitor Design Pattern is trying to address the problem of never-ending new functionality that is otherwise implemented by adding new methods in the inheritance tree.

According to the Visitor pattern the inheritance tree is decoupled from a new functionality and encapsulated in a separate object that packs all implementations according to the inheritance tree normally by using method overloading of all the various types.

In Scala by the use of the built-in pattern matching this becomes very easy:

class Animal { def walk:String }

class Dog extends Animal { override def walk = "on 4" }

class Man extends Animal { override def walk = "on 2" }

/**
 *  Visitor Pattern provides a solution to never-ending new
 *  functionality that would otherwise be implemented with new
 *  method in the hierarchy tree.
 */
def talk(animal: Animal) = animal match {
  case Dog => "wav wav"
  case Man => "hi"
}

/**
 *  New functionality implemented in separate methods that
 *  uses pattern matching to implement tailored functionality.
 */
def swim(animal: Animal) = animal match {
  case Dog => "on 4"
  case Man => "on 4"
}

Strategy Pattern in Scala – A Pragmatic Example

Let’s say we want to create a newsletter grouping together all the Daily Deals coming from popular IT Book Publishers.

We can source the Daily Deal information by web-scrapping the publisher website.

We’ll employ the Strategy Pattern as we want to encapsulate the web-scrapping algorithm that is distinct for each publisher website and we want to maintain flexibility of introducing new publishers in the future without altering the core context.

Essentially where we want to get to is a form like this:

new WebScrappingContext(Strategy).dailyDeal(url)

In the above form, the Strategy can vary and be interchangeable, paired with the publisher url that will be applied to the Strategy Functor.

Let’s start by defining a helpful type Strategy that gets a url and produces the Daily Deal:

type WebScrappingStrategy = String => String

Next we’ll create the context that would be hosting the Strategy Functor and would apply the url string on it:

  case class WebScrap(strategy: WebScrappingStrategy) {
    def dailyDeal(url:String) = strategy( url )
  }

Note that we have used a case class but we could as well use a method to achieve the same effect:

def dailyDeal( url:String, webScrappingStrategy: WebScrappingStrategy ) = webScrappingStrategy( url )

Then we’ll get busy with web-scrapping publisher websites.

The Manning Daily Deal web-scrapper Strategy:

    def ManningWebScrappingStrategy: WebScrappingStrategy  =
      (url:String) =>
      Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage( url )
                    .asInstanceOf[HtmlPage].asXml )
        .select("div.dotdbox b").text

Note that the Manning website is using JavaScript to create the Daily Deal section that Jsoup cannot parse therefore we use HtmlUnit.

One step further with the call to the Strategy context case class:

  def ManningDailyDeal = {

    val ManningWebScrappingStrategy: WebScrappingStrategy  =
      (url:String) =>
      Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage( url )
                    .asInstanceOf[HtmlPage].asXml )
        .select("div.dotdbox b").text

    WebScrap( ManningWebScrappingStrategy ).dailyDeal( "http://www.manning.com" )
  }

The Strategy Pattern is the last returned line in the above variable:

WebScrap( ManningWebScrappingStrategy ).dailyDeal( "http://www.manning.com" )

The url is inherent to the Strategy and ultimately to the Publisher therefore the above grouping under the ManningDailyDeal variable.

Similarly here are the other Publisher Strategies:

  def OReillyDailyDeal = {

    val OReillyWebScrappingStrategy: WebScrappingStrategy = Jsoup.connect(_).get.select("a[href$=DEAL] strong").get(0).text

    WebScrap( OReillyWebScrappingStrategy ).dailyDeal( "http://oreilly.com" )
  }
  def APressDailyDeal = {

    val APressWebScrappingStrategy: WebScrappingStrategy = Jsoup.connect(_).get.select("div.block-dotd").get(0).select("a")
      .get(0).select("img").attr("alt")

    WebScrap( APressWebScrappingStrategy ).dailyDeal( "http://www.apress.com" )
  }
  def SpringerDailyDeal = {

    val SpringerWebScrappingStrategy: WebScrappingStrategy = Jsoup.connect(_).get.select("div.block-dotd").get(1).select("a")
      .get(0).select("img").attr("alt")

    WebScrap( SpringerWebScrappingStrategy ).dailyDeal( "http://www.apress.com" )
  }

That’s pretty much it for the Strategy Pattern. If we want to take it one step further packing it up in a nice Factory Method OO Pattern:

  trait Publisher
  object Manning extends Publisher
  object APress extends Publisher
  object Springer extends Publisher
  object OReilly extends Publisher

  object DailyDeal {
    def apply(publisher: Publisher) = publisher match {
      case Manning => ManningDailyDeal
      case APress => APressDailyDeal
      case Springer => SpringerDailyDeal
      case OReilly => OReillyDailyDeal
    }
  }

So we can make calls like so:

  DailyDeal(Manning)
  DailyDeal(OReilly)
  DailyDeal(APress)
  DailyDeal(Springer)

Full code on this GitHub repository.

How to Web Srap Html page after JS loads

Sometimes Jsoup is not enough and in cases where we want the final version of the Html file after JS (redirects etc) first loads then we can use HtmlUnit.

It makes the difference between this:

Jsoup.connect("http://www.manning.com").get.select("div.dotdbox")
<div class="dotdbox"> 
 <div style="color: #000000;text-align: center;padding: 3px 2px 0px 2px; font-size: 11px;background-color: #ffffff;"> 
  <script language="JavaScript" src="http://incsrc.manningpublications.com/dotd.js" type="text/javascript"></script> 
  <p><span style="font-size: 11px;"><a href="/free/dotd.html">Get the Deal of the Day email alert</a></span></p>
 </div> 
</div>

and that:

Jsoup.parse( new WebClient(BrowserVersion.CHROME).getPage("http://www.manning.com")
                        .asInstanceOf[HtmlPage].asXml ).select("div.dotdbox").text
<div class="dotdbox"> 
 <div style="color:#000000;text-align:center;padding:3px 2px 0;font-size:11px;background-color:#ffffff;"> 
  
                       January 19, 2014 
  <br /> 
  <br /> 
  <b> <a href="http://www.manning.com/zumel/"> Practical Data Science with R </a> </b> 
  <br /> 
  <br /> Get half off the eBook or pBook 
  <br /> 
  <br /> Enter dotd040614 in the Promotional Code box when you check out 
  <p> <span style="font-size:11px;"> <a href="/free/dotd.html"> Get the Deal of the Day email alert </a> </span> </p> 
 </div> 
</div>

CentOS/RedHat make port 8080 visible

I am a happy DigitalOcean customer primarily because of the low cost, the SSD drives, the friendly stuff and the flexibility by which you can reshape your purchased resources into droplets within the 4 DataCenters (2 in NY and 2 in Amsterdam) supported.

Until the need for a UK DataCenter arises which leads me to RackSpace.

On both private cloud hosting providers I am making a web service available that needs to be accessible @ port 8080. The CentOS flavour assembled in DigitalOcean has everything permitted by default in its iptables settings but the one assembled in RackSpace does not.

When I issue the iptables command I get:


[dimitrisli@lon1 ~]# iptables -L -n --line-numbers
Chain INPUT (policy ACCEPT)
num target prot opt source destination
1 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
2 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0
3 ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
4 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
5 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
num target prot opt source destination
1 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
num target prot opt source destination

And just by adding permission for port 8080 will put it by default under the last reject input policy so the correct command should be putting the permission at the current spot of the reject input policy:


[dimitrisli@lon1 ~]# iptables -I INPUT 5 -m state --state NEW -m tcp -p tcp --dport 8080 -j ACCEPT -m comment --comment "Jetty Server port"

[dimitrisli@lon1 ~]# service iptables save

that eventually does the trick.

CentOS/RedHat screen command

Install screen:


yum install screen.x86_64

Create a screen by giving it a name:


screen -S process1

Detach from the current screen:

shortcut: Ctrl + A + D

Inspect running screens:


screen -ls

There is a screen on:
11174.process1 (Detached)
1 Socket in /var/run/screen/S-root.

Reattach to running screen per name:

screen -R process1

Homebrew: Install the Typesafe Stack

Old news: New machine -> new setup
Exciting alternatives: Homebrew -> “Typesafe Stack”

brew install scala sbt maven giter8

Homebrew is a package manager that keeps things tidy under the /usr/local/ directory, which is what we are using here to have Scala and friends installed. giter8 is a template Github archetype-maven-like command line tool that grows up to be the defacto way of bootstrapping a Scala-related project.

Installing Homebrew in Mac OS X 10.9 (Mavericks)

Installing Homebrew in previous OS X versions was holding a dependency on having Xcode pre-installed which is a substantial dependency (in terms of volume size) especially considering that you might not even be using this IDE as was always my case.

Not any more.

Apparently the Xcode part that is needed by Homebrew and other 3rd party tools, dubbed Command Line Tool in the net, has been extracted from Xcode and is being made available via the simple Terminal command in Mac OS X 10.9:

xcode-select --install

that pops up a pane to confirm auto-download of the tool.

From that point onwards Homebrew installation can continue with the ruby script as it was always the case. After installation completes ‘brew doctor’ can assure you that all is good and you can start amassing Cellars for the brewing process.

Tennis Historical Data Retriever in Scala

Here’s a quick and dirty Tennis historical data retriever:

package data.analysis.tennis
 
import scala.io.Source
import java.util.Date
import java.text.SimpleDateFormat
 
 
object TennisDataAnalysis extends App{
 
  def wrapStringInt(stringInt:String) = if(stringInt=="") None else Some(stringInt.toInt)
 
  case class TennisMatch(location:String, tournament:String, date:Date, series:String,
                         surface:String, round:String, bestOf:Int, winner:String, loser:String,
                         W1:Option[Int], L1:Option[Int], W2:Option[Int], L2:Option[Int], W3:Option[Int],
                         L3:Option[Int], W4:Option[Int], L4:Option[Int], W5:Option[Int], L5:Option[Int],
                         Wsets:Option[Int], Lsets:Option[Int], comment:String)
 
 
  val sourceSite = "http://www.tennis-data.co.uk/"
  val years = List(2010,2011,2012,2013)
  val tournaments = List("ausopen","frenchopen","usopen","wimbledon")
 
  val urls = years.map(year => sourceSite+year+"/").flatMap(urlYear => tournaments.map(tours=> urlYear+tours+".csv"))
 
  val data =
    urls.flatMap{urlYearTour =>
      Source.fromURL(urlYearTour).getLines.drop(1).map(_.split(","))
        .map{g => TennisMatch(g(1), g(2), new SimpleDateFormat("dd/mm/yyyy").parse(g(3)), g(4),
                          g(6),g(7),g(8).toInt, g(9), g(10),
                          wrapStringInt(g(15)), wrapStringInt(g(16)), wrapStringInt(g(17)), wrapStringInt(g(18)),
                          wrapStringInt(g(19)), wrapStringInt(g(20)), wrapStringInt(g(21)), wrapStringInt(g(22)),
                          wrapStringInt(g(23)), wrapStringInt(g(24)), wrapStringInt(g(25)), wrapStringInt(g(26)),
                          g(27))}}
 
  data.foreach(println)
 
 
}

Synology, VirtualBox and iSCSI

This is a tutorial on how to setup your VirtualBox VMs to run off your Synology NAS via the fast iSCSI protocol.

- What is VirtualBox? VirtualBox is a high performant, free, easy and at the same time versatile virtualisation solution.
Why VirtualBox? Because it’s free, easy to setup, cross-platform product that you can install at home/work and hasn’t let me down yet.
What is Synology? Synology is a company that produces high quality NAS drives.
Why Synology? Although not cheap, their NAS solutions are a good marriage between hardware and tailored-made actively maintained software. You can read further details on my Amazon review.
What is iSCSI? SCSI is a protocol for fast communication between hard drives and computers. iSCSI is a protocol that allows SCSI commands over the wire (Internet/WAN/LAN).
Why iSCSI? Don’t take my word for it, if you own an external HD or a NAS try storing the VMs there and point to them as mounted drives from VirtualBox. The latency will give you an answer.

Now that we got the theory out of the way let’s start with the setup!

First stop, Synology DSM > Start Menu top left > Storage Manager > iSCSI LUN tab

iscsi1

Then we create a new iSCSI LUN. In the wizard that pops up there is only one option available for me since I’ve gone for the SHR (Synology Hybrid Raid) while I was first setting up my drives. If you’ve gone for one of the vanilla Raid options you’ll be getting two more options to chose from:

iscsi2

Next we give a descriptive name to the LUN to remind us what that slice in our NAS will be hosting, in my case it will be a CentOS latest distro for the MacbookPro non-x64 compatible architecture. We also setup the size (10GB in our example):

iscsi3

The iSCSI LUN that we are about to create will be associated with an iSCSI Target that our VirtualBox client will be connecting to. Following the same pattern of giving a descriptive name and letting the IQN identifier to its default generated one:

iscsi4

The next screen will be a summary page:

iscsi5

Our newly created iSCSI LUN shows now up on the Storage Manager tab list:

iscsi6

Same goes for the associated iSCSI Target that was auto-generated:

iscsi7

Our job is done in the Synology NAS, let’s head now to the VirtualBox:

iscsi8

iscsi9

Next we have to tell VirtualBox not to rush and attach any drive yet since we’ll need to configure it first (unfortunately we can’t do that currently from the VirtualBox GUI):

iscsi10

VirtualBox draws our attention that we’ll have to setup the drive eventually:

iscsi11

The VM now is successfully created, but we’re not done just yet:

iscsi12

Next by right clicking on the VM > settings > storage tab we note down the name of the SATA controller (in our case simply set it up SATA).

iscsi13

Here comes the hard part of the tutorial. We have to tell VirtualBox that our newly created VM will be using an iSCSI LUN slice created on our NAS. This step truly needs to be easy and intuitively done via the VirtualBox GUI but currently the functionality is not there. That’s where we need to rely on a command line VirtualBox command to set this up. The command is called VBoxManage that is visible in MacOSX Terminal after we install VirtualBox. In the Windows systems we’ll have to use the command prompt navigating first to the bin directory of VirtualBox where the VBoxManage command lives. Here’s the full command with further explanations of all the needed options:

iscsi14

Here’s the command in text (be careful to replace: 1. the name of the VM, 2. the name of the SATA controller, 3. the NAS IP address, 4. the IQN your NAS has set):

VBoxManage storageattach "CentOS-6.4-i386" --storagectl "SATA" --port 0 --device 0 --type hdd --medium iscsi --server 192.168.1.10 --target "iqn.2000-01.com.synology:ZeusData.name" --tport 3260

The command replies back with a unique identifier and a short message of success:

iscsi15

Let’s restart now VirtualBox and take a look at the properties of the VM.

VM > settings > storage tab. Now we see the iSCSI target appearing and also notice how VirtualBox is picking up the size of the LUN set by our Synology configuration to be 10GB:

iscsi16

All is done now, let’s start the VM:

iscsi17

iscsi19

A final thing to mention is that while the VM is running, Synology DSM is capturing on-the-fly the iSCSI target device/status:

iscsi18

In case you want to erase the VM, before deleting the SATA controller and the VM itself from VirtualBox, with VirtualBox closed execute the following command (same as the one previously presented but with the argument ‘none’ for medium):

VBoxManage storageattach "CentOS-6.4-i386" --storagectl "SATA" --port 0 --device 0 --type hdd --medium none --server 192.168.1.10 --target "iqn.2000-01.com.synology:ZeusData.name" --tport 3260

Following this activity the VM, SATA controller in VirtualBox as well as the iSCSI LUN/Target in Synology can be deleted.