Concurrency on the JVM Using Scala with Venkat Subramaniam
This evening, I attending the Denver JUG where Venkat Subramaniam was speaking about Scala. Unfortunately, I arrived halfway through his Programming Scala talk and didn't get a chance to learn as much as I wanted to. What I did see made Scala look very powerful and (possibly) easier to learn than Java. Below are my notes from Venkat's talk.
Concurrency is important these days because we're in a world of multiple processors. When you have multiple threads running at one time, it can become painful. Before Java, you had to learn the API for multi-threading for each different platform. With Java's "Write once, debug everywhere", you only had to learn one API. Unfortunately, it's pretty low level: how to start a thread, manage it, stop it, etc. You also have to remember where to put synchronize
in your code.
With Scala, immutability and its Actors make it easy to program concurrent systems. For example, here's a web service that retrieves stock prices in sequential order:
def getyearEndClosing(symbol : String, year : Int) = { val url = "http://ichart.finance.yahoo.com/table.csv?s=" + symbol + "&a=11&b=01&c" + year + "&d=11&e=31&f=" + year + "&g=m" val data = io.Source.fromURL(url).mkString val price = data.split("\n")(1).split(",")(4).toDouble Thread.sleep(1000); // slow down internet (symbol, price) } val symbols = List("APPL", "GOOG", "IBM", "JAVA", "MSFT") val start = System.nanoTime val top = (("", 0.0) /: symbols) { (topStock, symbol) => val (sym, price) = getYearEndClosing(symbol, 2008) if (topStock._2 < price) (sym, price) else topStock } val end = System.nanoTime println("Top stock is " + top._1 + " with price " + top._2) println("Time taken " + (end - start)/10000000000.0)
To make this concurrent, we create Actors. Actors are nothing but Threads with a built-in message queue. Actors allow spawning separate threads to retrieve each stock price. Instead of doing:
symbols.foreach { symbol => getYearEndClosing(symbol, 2008) }
You can add actors:
val caller = self symbols.foreach { symbol => actor { caller ! getYearEndClosing(symbol, 2008) } }
Then remove val (sym, price) = getYearEndClosing(symbol, 2008)
and replace it with:
receive { case(sym: String, price: Double) => if (topStock._2 < price) (sym, price) else topStock }
After making this change, the time to execute the code dropped from ~7 seconds to ~2 seconds. Also, since nothing is mutable in this code, you don't have to worry about concurrency issues.
With Scala, you don't suffer the multiple-inheritance issues you do in Java. Instead you can use Traits to do mixins. For example:
import scala.actors._ import Actor._ class MyActor extends Actor { def act() { for(i <- 1 to 3) { receive { case msg => println("Got " + msg) } } }
When extending Actor, you have to call MyActor.start
to start the Actor. Writing actors this way is not recommended (not sure why, guessing because you have to manually start them).
Venkat is now showing an example that counts prime numbers and he's showing us how it pegs the CPU when counting how many exist between 1 and 1 million (78,499). After adding actor and receive logic, he shows how his Activity Monitor shows 185% CPU usage, indicating that both cores are being used.
What happens when one of the threads crashes and burns? The receive will wait forever. Because of this, using receive
is a bad idea. It's much better to use receiveWithin(millis)
to set a timeout. Then you can catch the timeout in the receiveWithin
block using:
case TIMEOUT => println("Uh oh, timed out")
A more efficient way to use actors is using react
instead of receive
. With react
, threads leave after putting the message on the queue and new threads are started to execute the block when the message is "reacted" to. One thing to remember with react is any code after the react block will never be executed. Just like receiveWithin(millis)
, you can use reactWithin(millis)
to set a timeout.
The major thing I noticed between receive and react is Venkat often had to change the method logic to use react. To solve this, you can use loop
(or better yet, loopWhile(condition)
) to allow accessing the data outside the react block. In conclusion, reactWithin(millis)
is best to use, unless you need to execute code after the react block.
Conclusion
This was a great talk by Venkat. He used TextMate the entire time to author and execute all his Scala examples. Better yet, he never used any sort of presentation. All he had was a "todo" list with topics (that he checked off as he progressed) and a sample.scala file.
Personally, I don't plan on using Scala in the near future, but that's mostly because I'm doing UI development and GWT and JavaScript are my favorite languages for that. On the server-side, I can see how it reduces the amount of Java you need to write (the compiler works for you instead of you working for the compiler). However, my impression is its sweet spot is when you need to easily author an efficient concurrent system.
If you're looking to learn Scala, I've heard Scala by Example (PDF) is a great getting-started resource. From there, I believe Programming in Scala and Venkat's Programming Scala are great books.
Posted by Guillaume Laforge on September 10, 2009 at 10:51 AM MDT #
Posted by Martin Falck-Hansen on September 10, 2009 at 07:14 PM MDT #
Posted by Matt Raible on September 10, 2009 at 07:16 PM MDT #
Posted by Martin Falck-Hansen on September 10, 2009 at 07:16 PM MDT #
Posted by Mats Henricson on September 10, 2009 at 10:58 PM MDT #
Posted by Logan Hutchinson on September 11, 2009 at 03:50 PM MDT #
Posted by Mustang on September 14, 2009 at 03:05 AM MDT #
The paradigms have been around for decades (literally), this looks like another repackaging.. Check out ACE ( http://www.cs.wustl.edu/~schmidt/ACE-overview.html ) for a very similar approach, but it's performance would just smoke anything java.
The approach of minimizing locking / synchronization by using immutable objects is interesting, but I dont think there is a free lunch. I suspect that while the code might be easier to write and think about, in terms of performance you are just trading locking/synchronization for an increased number of memory allocations / object creations.
I guess I don't see how this pattern could be meaningfully applied to a non-trivial example where the worker threads need to actually interact with each other (ie inspect/ modify their state).
Posted by Alonso on September 18, 2009 at 02:43 PM MDT #