[erlang-questions] List Question
Andrew McIntyre
andrew@REDACTED
Wed Aug 9 01:40:22 CEST 2017
Hello Richard,
The oo implementation we have is in 2 parts,
1. parse into a tree with no knowledge of semantics
2. Use a specific Object to provide an interface to the data based on
its type
I am involved in standards process and there is a blog post here for
anyone interested:
https://kb.medical-objects.com.au/display/PUB/HL7v2+parsing
Essestially plan to do same in erlang
1. Parse into a tree
2. Create unit that has functions for reading the known values for a
specific segment - these are autogenerated eg have a msh.erl that has
a msh:Sending_Facility() function
3. Use reader functions that allow message to become more complex, but
permit old function to continue to work
Andrew
Wednesday, August 9, 2017, 9:27:24 AM, you wrote:
RAOK> From what I've been able to glean about the HL7 message
RAOK> format, there are two aspects:
RAOK> * the basic syntax is the multi-level delimited thingy but
RAOK> * there is also *semantics*, predefined message types,
RAOK> an assortment of data types, rules for mapping data
RAOK> types to trees and so on.
RAOK> From a quick look at the Elixir HL7 parser, they have
RAOK> taken steps to handle some (but perhaps not all) of the
RAOK> *semantics* of HL7 and don't just give a tree of strings,
RAOK> but more structured data.
RAOK> Parsing the multi-level delimited syntax is trivial.
RAOK> Dealing with the semantics is not.
RAOK> I think that in figuring out for yourself how to work
RAOK> with HL7 messages in Erlang, the starting point would
RAOK> be
RAOK> - what message types do you want to handle?
RAOK> - what kinds of data occur in them?
RAOK> - how do you want to represent those kinds of
RAOK> data in Erlang?
RAOK> - do you actually want to represent *every* field
RAOK> at all? Some might not be relevant to you.
RAOK> - would you be streaming messages through a
RAOK> system (like some sort of pub/sub queueing
RAOK> middleware), summarising messages, storing
RAOK> them, or what?
RAOK> - what does a type declaration for a message type
RAOK> look like in HL7? Is there some way to automatically
RAOK> derive parsing code from that?
RAOK> What I'm getting at with the last point is that there
RAOK> is ASN.1 support for Erlang. Give it an ASN.1
RAOK> definition, and you get Erlang code out the other end.
RAOK> I am particularly thinking of the PADS project:
RAOK> PADS: Processing Arbitrary Data Streams
RAOK> Kathleen Fisher and Bob Gruber, AT&T Labs
RAOK> Slides in ppt http://homepages.inf.ed.ac.uk/wadler/xmlbinding/
RAOK> Transactional data streams, such as sequences of stock-market
RAOK> buy/sell orders, credit-card purchase records, web server
RAOK> entries, and electronic fund transfer orders, can be mined very
RAOK> profitably. As an example, researchers at AT&T have built
RAOK> customer profiles from streams of call-detail records to significant financial effect.
RAOK> Often such streams are high-volume: AT&T's call-detail stream
RAOK> contains roughly 300 million calls per day requiring
RAOK> approximately 7GBs of storage space. Typically, such stream data
RAOK> arrives ``as is'' in ad hoc formats with poor documentation. In
RAOK> addition, the data frequently contains errors. The appropriate
RAOK> response to such errors is application-specific. Some
RAOK> applications can simply discard unexpected or erroneous values
RAOK> and continue processing. For other applications, however, errors
RAOK> in the data can be the most interesting part of the data.
RAOK> Understanding a new data source and producing a suitable parser
RAOK> are crucial first steps in any use of such data. Unfortunately,
RAOK> writing parsers for this kind of data is a difficult task, both
RAOK> tedious and error-prone. It is complicated by lack of
RAOK> documentation, convoluted encodings designed to save space, the
RAOK> need to handle errors robustly, and the need to produce
RAOK> efficient code to cope with the scale of the stream. Often, the
RAOK> hard-won understanding of the data ends up embedded in parsing
RAOK> code, making long-term maintenance difficult for the original
RAOK> writer and sharing the knowledge with others nearly impossible.
RAOK> The goal of the PADS project is to provide languages and tools
RAOK> for simplifying data processing. We have a preliminary design of
RAOK> a declarative data-description language, PADSL, expressive
RAOK> enough to describe the data feeds we see at AT&T in practice,
RAOK> including ASCII, binary, EBCDIC, Cobol, and mixed data formats.
RAOK> From PADSL we generate a tunable C library with functions for
RAOK> parsing, manipulating, and summarizing the data. In joint work
RAOK> with Mary Fernandez and Ricardo Medel, we are working to
RAOK> integrate PADS and XQuery to support declarative querying of
RAOK> data sources with PADS descriptions.
RAOK> --------------------
RAOK> The PADS project moved from AT&T to
RAOK> http://pads.cs.tufts.edu/doc.html
RAOK> I say this in all seriousness: if I had a need to process
RAOK> GB of HL7 data, I would start by seeing if PADS was adequate
RAOK> to describe it, and if so I'd write an HL7->Erlang data
RAOK> translator in C or ML (as PADS has C and ML versions).
RAOK> If not, I'd see what ideas I could steal from PADS.
RAOK> Using a declarative data language to describe the message types
RAOK> I was interested in would be an up-front cost, but it would
RAOK> hugely simplify later maintenance.
--
Best regards,
Andrew mailto:andrew@REDACTED
sent from a real computer
More information about the erlang-questions
mailing list