Billion-triple store

Andrae Muys andrae@REDACTED
Tue Apr 25 11:01:33 CEST 2006


On 24/04/2006, at 9:53 PM, Leif Johansson wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Joel Reymont wrote:
>> Folks,
>>
>> How would I store a billion triples with Erlang?
>>
>> I don't necessarily need the full power of RDF as storing triples  
>> in the
>> form of {"Joel", has_a, daughter} would suffice. I would not mind
>> complying with RDF of course but it seems that would be an extra  
>> burder
>> due to the necessity of storing everything as strings, the need to
>> implement tries for those and the way Erlang stores strings.
>>
>> I'm not sure how to go about storing a billion of such triples in
>> Mnesia. I suppose I would need to use a 64-bit machine and a
>> disc_only_copy table.
>>
>> Any suggestions?
>
> I'd also like an answer to that question. I did some experiments but
> don't understand the way to get mnesia to play nice. I assume you have
> looked at the way tripplestores are typically built with rdbms ? Some
> of the schemes used in things like 3store, sesame, kowari (?) might
> be translatable...
>
> I am interested in working on this.

Well as the lead maintainer of kowari, I would be very happy to  
discuss any requirements you might have, and see if we can't help you.

Currently the largest scalability test I am aware of for kowari was  
500million, but those results indicated that we hadn't reached our  
limit yet.  One of the store-layers designers did some calculations  
that indicate that we should be able to scale to 1-2billion without  
difficulty; although as one of the primary developers of the query  
layer I am aware of some bottle necks that are likely to interfere  
with any queries requiring extremely large intermediate results (~1e6  
tuples).

At the same time, there are plans to address these issues, and to  
break the scalability bottle necks that are preventing us reaching  
1e10 and 1e11 at the moment, these include promising prototypes of a  
new store design to improve locality and throughput that should  
result in us scaling comfortably to 1e10.

As far as interfacing with erlang is concerned, we currently support  
rmi and soap, as well as in-process java funcalls.  I am currently  
working on xmlrpc support, and I am aware of plans to introduce a  
rest interface as well.

Please let me know if there is anything I can do to help.

Andrae

-- 
Andrae Muys
andrae@REDACTED
Principal Kowari Consultant
Netymon Pty Ltd




More information about the erlang-questions mailing list