[erlang-questions] A bug? - bloom filters - and loadable BIFs

Steve Kirsch steve.kirsch@REDACTED
Wed Apr 29 20:09:06 CEST 2009

Great, that explains the crash.

All your caveats are duly noted. We promise to be careful.

So now can you tell us how to use these functions that are already

Our system is very RAM intensive and I'd love to be able to stay inside
erlang to do this.


-----Original Message-----
From: Mikael Pettersson [mailto:mikpe@REDACTED] 
Sent: Wednesday, April 29, 2009 2:26 AM
To: Joe Armstrong
Cc: Erlang
Subject: Re: [erlang-questions] A bug? - bloom filters - and loadable

Joe Armstrong writes:
 > Hi,
 > I recent got mail from Steve Kirsch (Hello Steve) - asking a number
> of questions about representation of bit vectors for writing Bloom
 > So he sent me a program, which crashes his machine (out of memory)
 > new(N, E) when N > 0, is_float(E), E > 0, E =< 1 ->
 >    {M, K} = calc_least_bits(N, E),
 >    #bloom{m=M, bitmap = <<0:((M+7) div 8 * 8)>>, k=K, n=N}.

Large bitmaps are problematic in Erlang.

You can do chunked representations, which will work but will incur high
overheads on both reads and writes.

Using large binaries as the code above does is fragile and inefficient.
Large binaries are allocated outside of the GCd heaps in a global shared
area and are accessed via small heap-allocated and GCd handles. Frequent
updates will require frequent new allocations and copies in the shared
area. Old versions of these binaries are only deallocated in response to
local (per-process) GCs when their handles die.

So creating a series of large binaries in rapid succession without
intervening GCs is a fairly reliable way of running out of memory.

HiPE needs large bitmaps for its register allocator, so we added our own
bitarray BIFs since all alternatives suck. As much as _I_ like them, I
cannot recommend them for application developers:
- they're only present in HiPE-enabled systems
- they're not documented or official in any way, and may change at any
- the effect of sending a bitarray to another process is unspecified
  (the processes may or may not share the object)
- Erlang is _supposed_ to protect application programmers from the
  of mutable data :-/
erlang-questions mailing list

More information about the erlang-questions mailing list