In my last post about my first experiences with Erlang I outlined a module I came up with to implement futures. Futures provide a useful means of calling a function without waiting for it to finish and being able to obtain the resulting value at some later point. While my implementation worked some of the comments I have received showed it could be implemented much more simply even than the couple of dozen lines I had.
Julian Fondren (#ayrnieu) offered the following enhancement:
new(F) ->
Caller = self(),
spawn( fun () -> Caller ! { self(), { ok, F() } } end ).
which neatly removes the need for the oneshot/0
that was the receiver and proxy for the future'd function.
This still left the issue of the return value being the pid of a generic Erlang process. In comments I discussed how this leaky abstraction bothered me a little. I wasn't hopping up and down about it but it seemed to lack something compared to an equivalent solution with classes.
In comments Bob Ippolito offered a neat, functional, solution:
fun () -> value(Pid) end.
i.e. to wrap the pid in an anonymous function that returns the result value. Hence, to the caller, the future becomes just another function that can be asked for the result of the future'd function. Quite stylish. It's this sort of functional insight I was looking for when I took up Erlang. I expect to use this trick often now.
The resulting futures module becomes so trivial it's hardly worth calling it a module, yet it neatly and elegantly (to my eye) solves a real problem:
-module(futures).
-export([new/1,value/1]).
new( F ) ->
Caller = self(),
Pid = spawn( fun () -> Caller ! { self(), { ok, F() } } end ),
fun() -> value( Pid ) end.
value( Pid ) ->
receive
{ Pid, { ok, Result } } -> Result
end.
In #erlang we discussed a little about how this might be modified to handle cases where there was a lot of concurrency that might, for example, overwhelm a file-system with open files. This could be as simple a modification as:
spawn( fun() -> Caller ! { self(), { ok, ( throttle( F ) )() } end ).
where throttle/1
is some function that ensures that only N processes are active before executing the wrapped function. Another suggestion was to think more in terms of message passing and use an additional supervisor process that keeps N processes alive and passes futures to them for evaluation in turn.
Given my duplicate finder might well run into futher instances of {error,emfile}
I will probably experiment with one or both approaches to throttling.