[erlang-questions] "actor database" - architectural strategy question
Mon Feb 17 22:22:01 CET 2014
On Mon, Feb 17, 2014 at 9:22 PM, Miles Fidelman
> Joe Armstrong wrote:
>> This sounds interesting. To start wit, I think swapping processes to
>> disk is just an optimization.
>> In theory you could just keep everything in RAM forever. I guess
>> processes could keep their state in dictionaries (so you could roll them
>> back) or ets tables (if you didn't want to roll them back).
>> You would need some form of crash recovery so processes should write some
>> state information
>> to disk at suitable points in the program.
> Joe... can you offer any insight into the dynamics of Erlang, when
> running with large number of processes that have very long persistence?
> Somehow, it strikes me that 100,000 processes with 1MB of state, each
> running for years at a time, have a different dynamic than 100,000
> processes, each representing a short-lived protocol transaction (say a web
> Coupled with a communications paradigm for identifying a group of
> processes and sending each of them the same message (e.g., 5000 people have
> a copy of a book, send all 5000 of them a set of errata; or send a message
> asking 'who has updates for section 3.2).
> In some sense, the conceptual model is:
> 1. I send you an empty notebook.
> 2. The notebook has an address and a bunch of message handling routines
> 3. I can send a page to the notebook, and the notebook inserts the page.
> 4. You can interact with the notebook - read it, annotate it, edit certain
> sections - if you make updates, the notebook can distribute updates to
> other copies - either through a P2P mechanism or a publish-subscribe
> At a basic level, this maps really well onto the Actor formalism - every
> notebook is an actor, with it's own address. Updates, interactions,
> queries, etc. are simply messages.
> Since Erlang is about the only serious implementation of the Actor
> formalism, I'm trying to poke at the edge cases - particularly around
> long-lived actors. And who better to ask than you :-)
> In passing: Early versions of Smalltalk were actor-like, encapsulating
> state, methods, and process - but process kind of got dropped along the
> way. By contrast, it strikes me that Erlang focuses on everything being a
> process, and long-term persistence of state has taken a back seat. I'm
> trying to probe the edge cases. (I guess another way of looking at this is:
> to what extent is Erlang workable for writing systems based around the
> mobile agent paradigm?)
>> What I think is a more serious problem is getting data into the system in
>> the first place.
>> I have done some experiments with document commenting and annotation
>> systems and
>> found it very difficult to convert things like word documents into a form
>> that looks half
>> decent in a user interface.
> Haven't actually thought a lot about that part of the problem. I'm
> thinking of documents that are more form-like in nature, or at least built
> up from smaller components - so it's not so much going from Word to an
> internal format, as much as starting with XML or JSON (or tuples), building
> up structure, and then adding presentation at the final step. XML -> Word
> is a lot easier than the reverse :-)
> On the other hand, I do have a bunch of applications in mind where parsing
> Word and/or PDF would be very helpful - notably stripping requirements out
> of specifications. (I can't tell you how much of my time I spend manually
> cutting and pasting from specifications into spreadsheets - for
> requirements tracking and such.) Again, presentation isn't that much of an
> issue - structural and semantic analysis is. But, while important, that's
> a separate set of problems - and there are some commercial products that do
> a reasonably good job.
> I want to parse Microsoft word files and PDF etc. and display them in a
>> format that is
>> recognisable and not too abhorrent to the user. I also want to allow
>> on-screen manipulation of
>> documents (in a browser) - all of this seems to require a mess of
>> Before we can manipulate documents we must parse them and turn them into
>> a format
>> that can be manipulated. I think this is more difficult that the storing
>> and manipulating documents
>> problem. You'd also need support for full-text indexing, foreign language
>> and multiple character sets and so
>> on. Just a load of horrible messy small problems, but a significant
>> barrier to importing large amounts
>> of content into the system.
>> You'd also need some quality control of the documents as they enter the
>> system (to avoid rubbish in rubbish out), also to maintain the integrity of
>> the documents.
> Again, for this problem space, it's more about building up complex
> documents from small pieces, than carving up pre-existing documents. More
> like the combination of an IDE and a distributed CVS - where fully
> "compiled" documents are the final output.
>> If you have any ideas of now to get large volumes of data into the system
>> from proprietary formats
>> (like ms word) I'd like to hear about it.
> Me too :-) Though, I go looking for such things every once in a while,
> - there are quite a few PDF to XML parsers, but mostly commercial ones
> - there are a few PDF and Word "RFP stripping" products floating around,
> that are smart enough to actually analyze the content of structured
> documents (check out Meridian)
> - later versions of Word export XML, albeit poor XML
> - there are quite a few document analysis packages floating around,
> including ones that start from OCR images - but they generally focus on
> content (lexical analyis) and ignore structure (it's easier to scan a
> document and extract some measure of what it's about - e.g. for indexing
> purposes; it's a lot harder to find something that will extract the outline
> structure of a document)
> In theory, there is no difference between theory and practice.
> In practice, there is. .... Yogi Berra
> erlang-questions mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions