[erlang-questions] How can I break this string into a list of strings?

zxq9 <>
Sun Dec 25 06:38:47 CET 2016


On 2016年12月25日 日曜日 14:21:00 zxq9 wrote:
> On 2016年12月24日 土曜日 16:34:22  wrote:
> > I would like to break it into a list:
> > 
> >     ["<h1>Hello!</h1>", "<h2>How are you?</h2>", "<p>Some text\n and more text.</p>"]
> > 

While it is in my mind... we can use a variety of techniques to get output either exactly like, or perhaps more useful but very similar to the above based on the previously presented toy function:

  5> {ok, Split} = htmler:consume("<h1>Hello!</h1>\n     <h2>How are you?</h2>\n    <p>Some text\n and more text.</p>").
  {ok,[{"h1","Hello!"},
       10,32,32,32,32,32,
       {"h2","How are you?"},
       10,32,32,32,32,
       {"p","Some text\n and more text."}]}
  6> [Contents || {_, Contents} <- Split].                                                                              
  ["Hello!","How are you?","Some text\n and more text."]

Or even:

  7> ["<" ++ Label ++ ">" ++ Contents ++ "</" ++ Label ++ ">" || {Label, Contents} <- Split].
  ["<h1>Hello!</h1>","<h2>How are you?</h2>","<p>Some text\n and more text.</p>"]

But seriously, why would I still want those tags in there at all?


>   14> io:format("~tp~n", [htmler:consume("<h1>Hello!</h1>\n     <h2>How are <em>you?</em></h2>\n    <p>Some text\n and more text.</p>")]).
>   {ok,[{"h1","Hello!"},
>        10,32,32,32,32,32,
>        {"h2",[72,111,119,32,97,114,101,32,{"em","you?"}]},
>        10,32,32,32,32,
>        {"p","Some text\n and more text."}]}

This case is more interesting and won't work with a simple list comprehension to filter out the elements that are not tuples -- but an explicit function would do the trick.

>   3> htmler:consume("<body><h1>Hello</h1><p>foo</p></body>").
>   {ok,[{"body",[{"h1","Hello"},{"p","foo"}]}]}

This last case is more like what you are going to actually be encountering in real HTML and XML docs -- and it is very similar to the case above, the real difference is that your "main list" that was returned is not wrapped in a tuple the way everything else is (well, that's not entirely true -- the function actually does return a tuple: {ok, Contents}. Maybe this could be leveraged to write a general pretty printing or interpretation function?

Anyway, blahblah. I think you get the idea.

Time for pumpkin pie! Wee!

-Craig


More information about the erlang-questions mailing list