<html><head><style>body{font-family:Helvetica,Arial;font-size:12px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">â€œLarge number of processes with very long persistenceâ€</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">You *will* run into GC issues here, and of all kinds</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">   - design artifacts (â€œhmm, the number of lists that I manipulate increases relentlesslyâ€¦â€)</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">   - misunderstanding (â€œBut I passed the binary on, without manipulating it at all!â€)</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">   - Bugs (Fred has a great writeup on this somewhere)</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"> </div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Just keep in mind that in the end, you will almost certainly end up doing some form of manual GC activities.  Again, the Heroku gang can probably provide a whole bunch of pointers on thisâ€¦</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:12px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">chees</div> <div id="bloop_sign_1392669504517230080" class="bloop_sign"><div style="font-family:helvetica,arial;font-size:13px"><div style="color: rgb(34, 34, 34); line-height: normal; font-family: Helvetica; word-wrap: break-word;"><div style="margin: 0in 0in 0.0001pt;"><font color="#1f497d" face="Calibri, sans-serif"><span style="font-size: 15px;"><b><div style="font-style: italic; margin: 0px; font-family: Calibri;"><b style="color: rgb(17, 85, 204);"><a href="http://www.gravatar.com/avatar/204a87f81a0d9764c1f3364f53e8facf.png" target="_blank" style="color: rgb(17, 85, 204);">Mahesh Paolini-Subramanya</a></b></div><div style="margin: 0px; font-family: Calibri;"><span style="font-weight: normal;">That tall bald Indian guy..</span> <br>

  </div></b></span></font></div></div><div style="color: rgb(34, 34, 34); line-height: normal; font-family: Helvetica; word-wrap: break-word;"><div style="margin: 0in 0in 0.0001pt;"><font color="#1f497d" face="Calibri, sans-serif"><span style="font-size: 15px;"><b><div style="margin: 0px; font-family: Calibri;"><div style="font-family: Helvetica; word-wrap: break-word;"><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 11pt; font-family: Calibri, sans-serif;"><div style="margin: 0px; font-family: Calibri; color: rgb(1, 108, 226);"><a href="https://plus.google.com/u/0/108074935470209044442/posts" target="_blank" style="color: rgb(17, 85, 204);">Google+</a><span style="color: rgb(31, 73, 125);">  | <a href="http://dieswaytoofast.blogspot.com/" target="_blank" style="color: rgb(17, 85, 204);"><span style="color: rgb(1, 108, 226);">Blog</span></a></span> <span style="color: rgb(31, 73, 125);">  | <span style="color: rgb(1, 108, 226);"><a href="https://twitter.com/dieswaytoofast" target="_blank" style="color: rgb(17, 85, 204);">Twitter</a></span></span><span style="color: rgb(31, 73, 125);">  | </span><a href="http://www.linkedin.com/in/dieswaytoofast" target="_blank" style="color: rgb(17, 85, 204);">LinkedIn</a></div></span></div></div></div></b></span></font></div></div></div></div> <br><p style="color:#A0A0A8;">On February 17, 2014 at 3:22:22 PM, Miles Fidelman (<a href="mailto://mfidelman@meetinghouse.net">mfidelman@meetinghouse.net</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div>Joe Armstrong wrote:

<br>> This sounds interesting. To start wit,  I think swapping processes to  

> disk is just an optimization.

<br>> In theory you could just keep everything in RAM forever. I guess  

<br>> processes could keep their state in dictionaries (so you could roll  

> them back) or ets tables (if you didn't want to roll them back).

<br>>

<br>> You would need some form of crash recovery so processes should write  

<br>> some state information

> to disk at suitable points in the program.

<br>

<br>Joe...  can you offer any insight into the dynamics of Erlang, when  

<br>running with large number of processes that have very long persistence?   

<br>Somehow, it strikes me that 100,000 processes with 1MB of state, each  

<br>running for years at a time, have a different dynamic than 100,000  

<br>processes, each representing a short-lived protocol transaction (say a  

web query).

<br>

<br>Coupled with a communications paradigm for identifying a group of  

<br>processes and sending each of them the same message (e.g., 5000 people  

<br>have a copy of a book, send all 5000 of them a set of errata; or send a  

message asking 'who has updates for section 3.2).

<br>

<br>In some sense, the conceptual model is:

1. I send you an empty notebook.

<br>2. The notebook has an address and a bunch of message handling routines

3. I can send a page to the notebook, and the notebook inserts the page.

<br>4. You can interact with the notebook - read it, annotate it, edit  

<br>certain sections - if you make updates, the notebook can distribute  

<br>updates to other copies - either through a P2P mechanism or a  

publish-subscribe mechanism.

<br>

<br>At a basic level, this maps really well onto the Actor formalism - every  

<br>notebook is an actor, with it's own address.  Updates, interactions,  

queries, etc. are simply messages.

<br>

<br>Since Erlang is about the only serious implementation of the Actor  

<br>formalism, I'm trying to poke at the edge cases - particularly around  

<br>long-lived actors.  And who better to ask than you :-)

<br>

<br>In passing: Early versions of Smalltalk were actor-like, encapsulating  

<br>state, methods, and process - but process kind of got dropped along the  

<br>way.  By contrast, it strikes me that Erlang focuses on everything being  

<br>a process, and long-term persistence of state has taken a back seat.   

<br>I'm trying to probe the edge cases. (I guess another way of looking at  

<br>this is: to what extent is Erlang workable for writing systems based  

<br>around the mobile agent paradigm?)

<br>

<br>

<br>

<br>>

<br>> What I think is a more serious problem is getting data into the system  

> in the first place.

<br>> I have done some experiments with document commenting and annotation  

<br>> systems and

<br>> found it very difficult to convert things like word documents into a  

<br>> form that looks half

> decent in a user interface.

<br>

<br>Haven't actually thought a lot about that part of the problem. I'm  

<br>thinking of documents that are more form-like in nature, or at least  

<br>built up from smaller components - so it's not so much going from Word  

<br>to an internal format, as much as starting with XML or JSON (or tuples),  

<br>building up structure, and then adding presentation at the final step.   

<br>XML -> Word is a lot easier than the reverse :-)

<br>

<br>On the other hand, I do have a bunch of applications in mind where  

<br>parsing Word and/or PDF would be very helpful - notably stripping  

<br>requirements out of specifications.  (I can't tell you how much of my  

<br>time I spend manually cutting and pasting from specifications into  

<br>spreadsheets - for requirements tracking and such.)  Again, presentation  

<br>isn't that much of an issue - structural and semantic analysis is.  But,  

<br>while important, that's a separate set of problems - and there are some  

commercial products that do a reasonably good job.

<br>

<br>> I want to parse Microsoft word files and PDF etc. and display them in  

<br>> a format that is

<br>> recognisable and not too abhorrent to the user. I also want to allow  

<br>> on-screen manipulation of

<br>> documents (in a browser) - all of this seems to require a mess of  

> Javascript (in the browser)and a mess of parsing programs inn the server.

<br>>

<br>> Before we can manipulate documents we must parse them and turn them  

<br>> into a format

<br>> that can be manipulated. I think this is more difficult that the  

<br>> storing and manipulating documents

<br>> problem. You'd also need support for full-text indexing, foreign  

<br>> language and multiple character sets and so

<br>> on. Just a load of horrible messy small problems, but a significant  

<br>> barrier to importing large amounts

> of content into the system.

<br>>

<br>> You'd also need some quality control of the documents as they enter  

<br>> the system (to avoid rubbish in rubbish out), also to maintain the  

> integrity of the documents.

<br>

<br>Again, for this problem space, it's more about building up complex  

<br>documents from small pieces, than carving up pre-existing documents.   

<br>More like the combination of an IDE and a distributed CVS - where fully  

"compiled" documents are the final output.

<br>

<br>>

<br>> If you have any ideas of now to get large volumes of data into the  

<br>> system from proprietary formats

> (like ms word) I'd like to hear about it.

<br>>

<br>

<br>Me too :-)  Though, I go looking for such things every once in a while, and:

<br>- there are quite a few PDF to XML parsers, but mostly commercial ones

<br>- there are a few PDF and Word "RFP stripping" products floating around,  

<br>that are smart enough to actually analyze the content of structured  

<br>documents (check out Meridian)

<br>- later versions of Word export XML, albeit poor XML

<br>- there are quite a few document analysis packages floating around,  

<br>including ones that start from OCR images - but they generally focus on  

<br>content (lexical analyis) and ignore structure (it's easier to scan a  

<br>document and extract some measure of what it's about - e.g. for indexing  

<br>purposes; it's a lot harder to find something that will extract the  

<br>outline structure of a document)

<br>

<br>

<br>Cheers,

<br>

<br>Miles

<br>

<br>

<br>--  

In theory, there is no difference between theory and practice.

<br>In practice, there is.   .... Yogi Berra

<br>

<br>_______________________________________________

<br>erlang-questions mailing list

<br>erlang-questions@erlang.org

<br>http://erlang.org/mailman/listinfo/erlang-questions

<br></div></div></span></blockquote></body></html>