[erlang-questions] List Question

Juan Jose Comellas juanjo@REDACTED
Mon Aug 7 17:32:23 CEST 2017


Andrew, if you want to store the data in a format that is as compact as
possible, I'd recommend storing the HL7 message itself as a binary and
parsing on demand. If you want to store the data pre-parsed, then I would
store them as list of segments where each segment is represented by a
nested tuple. That way you can reference the fields, components, etc., by
their index in an O(1) operation, and you can still easily add or remove
segments from a message.

What I'm describing is similar to the intermediate format used by an HL7
parser (https://github.com/jcomellas/ex_hl7) I wrote for Elixir. You could
probably use it as inspiration for what you need. I had also created
another parser in Erlang (https://github.com/jcomellas/ehl7) that maps the
segments to records, but part of it is in C using NIFs.

Let me know if you have any other doubts.



On Mon, Aug 7, 2017 at 10:46 AM, Andrew McIntyre <
andrew@REDACTED> wrote:

> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible. Its HL7
> natively and this is my test:
>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters. The hierarchy is only so deep and using lists of
> lists to provide a tree like way to access the data eg Field 3, repeat
> 1 component 2 subcomponent1
>
> Parsed it looks like this:
>
> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>
> As the format evolves over time the hierarchy can be extended, but
> older clients can still read the value they are expecting if they
> follow the rules, like reading the first value in the list when you
> only expect one value to be there.
>
> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format, hence
> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.
>
> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>
> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
> >> Hello zxq9,
> >>
> >> Thanks, Unfortunately I do not know the value of the string that will
> >> be there. Its an extensible hierarchy that can be several lists deep -
> >> or not. Might need to revise the data structure
>
> z> In this case it can be useful to consider a way of tagging values.
>
> z> Imagine we want to represent a directory tree structure and have a
> z> descent-first traversal function recurse over it while creating the
> z> tree. We have two things that can happen, there is a flat list of
> z> new directories that need to be created, and there is the
> z> possibility that the tree depth extends deeper at each node.
>
> z> The naive version would look like what you have:
>
> z> ["top_dir_1",
> z>  "top_dir_2",
> z>  ["next_level_1",
> z>   "next_level_2"]]
>
> z> This leaves a bit to be desired, not only because of the problem
> z> you have pointed out that makes it difficult to know what is deep
> z> and what is shallow, but also because we don't really have a good
> z> way to represent a full tree (what would be the name of a directory
> containing other directories?).
>
> z> So consider instead something like this:
>
> z> [{"top_dir_1", []},
> z>  {"top_dir_2", []},
> z>  {"top_dir_3",
> z>   [{"next_level_1", []},
> z>    {"next_level_2", []}]}]
>
> z> Now we have a representation of each directory's name AND its contents.
>
> z> We can traverse this laterally AND in depth without any ambiguity
> z> or need for carrying around a record of where we have been (by
> z> using depth recursion and tail-call recursion):
>
>
> z> make_tree([{Dir, Contents} | Rest]) ->
> z>     ok =
> z>         case filelib:is_dir(Dir) of
> z>             true ->
> z>                 ok;
> z>             false ->
> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
> z>                 file:make_dir(Dir)
> z>         end,
> z>     ok = file:set_cwd(Dir),
> z>     ok = make_tree(Contents),
> z>     ok = file:set_cwd(".."),
> z>     make_tree(Rest);
> make_tree([]) ->>
> z>     ok.
>
>
> z> Not so bad.
>
> z> In your case we could represent things perhaps a bit better by
> z> separating the types and tagging them. Instead of just "FT" and
> z> whatever other string labels you might want, you could either use
> z> atoms (totally unambiguous) or tuples as we have in the example
> z> able (also totally unambiguous). I prefer tuples, though, because they
> are easier to read.
>
> z> [{value, "foo"},
> z>  {tree,
> z>   [{value, "bar"},
> z>    {value, "foo"}]},
> z>  {value, "baz"}]
>
>
> z> So then we do something like:
>
>
> z> traverse([{value, Value} | Rest]) ->
> z>    ok = do_thing(Value),
> z>    traverse(Rest);
> z> traverse([{tree, Contents} | Rest]) ->
> z>    ok = traverse(Contents),
> z>    traverse(Rest);
> traverse([]) ->>
> z>    ok.
>
>
> z> Anyway, don't be afraid of varying your value types to say exactly
> z> what you mean. If your strings like "FT" only had meaning within
> z> your system consider NOT USING STRINGS, and using atoms instead. That
> makes it even easier:
>
>
> z> [foo,
> z>  bar,
> z>  [foo,
> z>   bar],
> z>  foo]
>
>
> z> So then we can do:
>
>
> z> traverse([foo | Rest]) ->
> z>     ok = do_foo(),
> z>     traverse(Rest);
> z> traverse([bar | Rest]) ->
> z>     ok = do_bar(),
> z>     traverse(Rest);
> z> traverse([Value | Rest]) when is_list(Value) ->
> z>     ok = traverse(Value),
> z>     traverse(Rest);
> traverse([]) ->>
> z>     ok.
>
>
> z> And of course, you can not use a guard if you want to match on a
> z> list shape in the listy clause there, but that is a minor detail.
> z> The point is to make your data types MEAN SOMETHING REASONABLE
> z> within your system. Use atoms when your values are meaningful only
> z> within your system. Strings are for the birds.
>
> z> -Craig
> z> _______________________________________________
> z> erlang-questions mailing list
> z> erlang-questions@REDACTED
> z> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> Best regards,
>  Andrew                             mailto:andrew@REDACTED
>
> sent from a real computer
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170807/7447daa0/attachment.htm>


More information about the erlang-questions mailing list