Google's new language, Go, isn't as revolutionary as its designers would have you believe but it is an interesting language nonetheless. It is a systems language that has features that address the multicore trend. The future is parallel computation and who better to introduce a systems language with first-class concurrency primitives than the company that popularized MapReduce? Speaking of MapReduce, here's a simple implementation I wrote for Go (available at github):
I don't claim that this is an optimized MapReduce implementation. For each item on the input queue a new channel is created. After all the channels are created it loops over the channels and blocks waiting for results in FIFO order. Responses are then placed on the reducer's queue. The interface{}
s that are littered throughout the code are there because Go doesn't currently support generics. All types implement the empty interface (interface{}
) so it acts like an Object
in Java or a void *
pointer in C. To get the real value you have to unbox it like so: x.(int)
. I was a little disappointed that you have to sacrifice type safety in order to be able to write a generic function.
Here is a sample program that uses MapReduce to count the number of words in all the files that can be found from the current directory:
One major idiom that I picked up from the standard library that comes with Go is the iteration idiom which creates a channel, writes data to it and then closes it. You can see this in the find_files function. This way, you can iterate over the results of the channel using range
which you can see in the outer for
loop in the reducer. Note that you have to populate the channel with data in a goroutine or else the application will reach deadlock since writing to a channel blocks until the data you write is consumed by default. To change this behavior you can create the channel with a buffer like this: make(chan int, 10)
. Writing to a buffered channel won't block until its buffer is full. Still, if you plan on using the iteration idiom it is best to spawn off a goroutine since you can populate the channel with as many items as necessary without leaving a chance of deadlock.
I also noticed that range
is inconsistent. If you use it on a string or an array it will return two values, the first value being the position in the list. When used with a channel it returns only the next item. You would expect that this would be consistent across all types but for some reason it's not:
Go is a neat language. It's the first compiled language that I've used in a long time and it feels like a scripting language, which is a compliment. It's unfortunate that one has to sacrifice type safety in order to make generic functions. There isn't much that's new in Go, but the syntax and the way that things are put together works well. Languages with concurrency primitives are the future (even though some of them are 20 years old and the theory behind them is 30 years old).