[erlang-questions] List Question

Joe Armstrong erlang@REDACTED
Tue Aug 8 10:00:59 CEST 2017


Hello,

I'm going to go way off topic here and not answer your specific
question about lists ...

Your last mail had the information I need - you're trying to parse HL7.
I have a few comments.

1) Your original question did not bother to mention  what problem you
were trying to solve -
    You asked about a sub-problem that you encountered when trying to
solve your principle
    problem (principle problem = parse HL7) (sub-problem = differentiate lists)

 2) It's *always* a good idea to ask questions about the principle
problem first !!!!

I didn't know what HL7 was - my immediate thought was
 'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
proper I mean "has expended a significant amount of thought on writing a parser"

Google is your friend - It told me what HL7 was (I hadn't a clue here
- "never heard of it")
and it turned up a parser in elixir

    https://github.com/jcomellas/ex_hl7

>From the quality of the documentation I assume this is a *proper*
implementation.

Now elixir compiles to .beam files and can be called from Erlang -
which raises another
sub problem "how do I compile the elixir code and call it from Erlang"
and begs the
question "is this effort worthwhile"

Given that a parser for HL7 exists in elixir it might be sensible to
use it "off the shelf"

I have a feeling that elixir folks are good at reusing erlang code -
but that reuse in the
opposite direction is less easy.

The last time I fiddled a bit (yesterday as it happened) - it turned
out to be less than
blindingly obvious how to call other than trivial elixir code from erlang.

I was also wondering about cross-compilation. Has anybody written
something that turns
erlang code into elixir source code or vice. versa.

Cheers

/Joe





On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
<andrew@REDACTED> wrote:
> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible. Its HL7
> natively and this is my test:
>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters. The hierarchy is only so deep and using lists of
> lists to provide a tree like way to access the data eg Field 3, repeat
> 1 component 2 subcomponent1
>
> Parsed it looks like this:
>
> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>
> As the format evolves over time the hierarchy can be extended, but
> older clients can still read the value they are expecting if they
> follow the rules, like reading the first value in the list when you
> only expect one value to be there.
>
> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format, hence
> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.
>
> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>
> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>> Hello zxq9,
>>>
>>> Thanks, Unfortunately I do not know the value of the string that will
>>> be there. Its an extensible hierarchy that can be several lists deep -
>>> or not. Might need to revise the data structure
>
> z> In this case it can be useful to consider a way of tagging values.
>
> z> Imagine we want to represent a directory tree structure and have a
> z> descent-first traversal function recurse over it while creating the
> z> tree. We have two things that can happen, there is a flat list of
> z> new directories that need to be created, and there is the
> z> possibility that the tree depth extends deeper at each node.
>
> z> The naive version would look like what you have:
>
> z> ["top_dir_1",
> z>  "top_dir_2",
> z>  ["next_level_1",
> z>   "next_level_2"]]
>
> z> This leaves a bit to be desired, not only because of the problem
> z> you have pointed out that makes it difficult to know what is deep
> z> and what is shallow, but also because we don't really have a good
> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>
> z> So consider instead something like this:
>
> z> [{"top_dir_1", []},
> z>  {"top_dir_2", []},
> z>  {"top_dir_3",
> z>   [{"next_level_1", []},
> z>    {"next_level_2", []}]}]
>
> z> Now we have a representation of each directory's name AND its contents.
>
> z> We can traverse this laterally AND in depth without any ambiguity
> z> or need for carrying around a record of where we have been (by
> z> using depth recursion and tail-call recursion):
>
>
> z> make_tree([{Dir, Contents} | Rest]) ->
> z>     ok =
> z>         case filelib:is_dir(Dir) of
> z>             true ->
> z>                 ok;
> z>             false ->
> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
> z>                 file:make_dir(Dir)
> z>         end,
> z>     ok = file:set_cwd(Dir),
> z>     ok = make_tree(Contents),
> z>     ok = file:set_cwd(".."),
> z>     make_tree(Rest);
> make_tree([]) ->>
> z>     ok.
>
>
> z> Not so bad.
>
> z> In your case we could represent things perhaps a bit better by
> z> separating the types and tagging them. Instead of just "FT" and
> z> whatever other string labels you might want, you could either use
> z> atoms (totally unambiguous) or tuples as we have in the example
> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>
> z> [{value, "foo"},
> z>  {tree,
> z>   [{value, "bar"},
> z>    {value, "foo"}]},
> z>  {value, "baz"}]
>
>
> z> So then we do something like:
>
>
> z> traverse([{value, Value} | Rest]) ->
> z>    ok = do_thing(Value),
> z>    traverse(Rest);
> z> traverse([{tree, Contents} | Rest]) ->
> z>    ok = traverse(Contents),
> z>    traverse(Rest);
> traverse([]) ->>
> z>    ok.
>
>
> z> Anyway, don't be afraid of varying your value types to say exactly
> z> what you mean. If your strings like "FT" only had meaning within
> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>
>
> z> [foo,
> z>  bar,
> z>  [foo,
> z>   bar],
> z>  foo]
>
>
> z> So then we can do:
>
>
> z> traverse([foo | Rest]) ->
> z>     ok = do_foo(),
> z>     traverse(Rest);
> z> traverse([bar | Rest]) ->
> z>     ok = do_bar(),
> z>     traverse(Rest);
> z> traverse([Value | Rest]) when is_list(Value) ->
> z>     ok = traverse(Value),
> z>     traverse(Rest);
> traverse([]) ->>
> z>     ok.
>
>
> z> And of course, you can not use a guard if you want to match on a
> z> list shape in the listy clause there, but that is a minor detail.
> z> The point is to make your data types MEAN SOMETHING REASONABLE
> z> within your system. Use atoms when your values are meaningful only
> z> within your system. Strings are for the birds.
>
> z> -Craig
> z> _______________________________________________
> z> erlang-questions mailing list
> z> erlang-questions@REDACTED
> z> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> Best regards,
>  Andrew                             mailto:andrew@REDACTED
>
> sent from a real computer
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions



More information about the erlang-questions mailing list