Billion-triple store

Hakan Mattsson hakan@REDACTED
Tue Apr 25 18:01:20 CEST 2006


On Sun, 23 Apr 2006, Joel Reymont wrote:

JR> How would I store a billion triples with Erlang?
JR> 
JR> I don't necessarily need the full power of RDF as
JR> storing triples in the form of {"Joel", has_a,
JR> daughter} would suffice. I would not mind complying
JR> with RDF of course but it seems that would be an extra
JR> burder due to the necessity of storing everything as
JR> strings, the need to implement tries for those and the
JR> way Erlang stores strings.
JR> 
JR> I'm not sure how to go about storing a billion of such
JR> triples in Mnesia. I suppose I would need to use a
JR> 64-bit machine and a disc_only_copy table.
JR> 
JR> Any suggestions?

Storing a billion records in a single dets file does
not feel so appealing. The repair time of such a file
would probably be loooooong.

You can however obtain a more manageable file size by
using a fragmented Mnesia table. I don't know how many
fragments that are optimal, but you can start with 1000
or so and then measure the system characteristics for
different number of fragments.

If you do not need the ACID properties of Mnesia, you
can also gain some performance by fragmenting the dets
file yourself. It is quite easy to implement a
customized fragmentation logic directly on top of dets.

/Håkan


More information about the erlang-questions mailing list