[erlang-questions] How can I break this string into a list of strings?

Richard A. O'Keefe ok@REDACTED
Fri Jan 6 04:05:42 CET 2017



On 25/12/16 10:34 AM, lloyd@REDACTED wrote:
> Suppose I have the following string:
>
>     "<h1>Hello!</h1>\n     <h2>How are you?</h2>\n    <p>Some text\n and more text.</p>"
>
> I would like to break it into a list:
>
>     ["<h1>Hello!</h1>", "<h2>How are you?</h2>", "<p>Some text\n and more text.</p>"]
>
> string:token(MyString, "$\n") doesn't work because it would break the paragraph.

So you don't have "a string", you have an XML fragment that happens to
be stored as a string.  My problem in reading this is that I don't have
the faintest idea what you want to happen IN GENERAL.
  - What is to happen if there is a newline character in an attribute?
    "<img src='...' alt='Two gorillas\One cop'>
  - What is to happen if there is a newline between tokens inside a tag?
    "<a href=\n'....'\n>Anchor Text</a\n>"
  - What is to happen if there is a newline inside an element other than
    a <p> element?
    "<h1>Two gorillas, one cop<br>\nSpoof movie of the year</h1>"
  - What is to happen if there AREN'T newlines?
    "<h1>Hello!</h1><h2>How are you?</h2><p>Some text\n etc.</p>"
    White space between block level elements isn't significant, which
    means that you can't in general expect it to be there or to be
    preserved by other tools.
  - What is to happen if the newlines between elements are doubled?
    "<h1>Hello!</h1>\n\n     <h2>How are you?</h2>\n\n    <p>Some text\n 
and more text.</p>"
  - What is to happen if there is a newline whitespace sequence
    at the end?
     "<h1>Hello!</h1>\n     <h2>How are you?</h2>\n    <p>Some text\n
     and more text.</p>\n     "

If I needed to do this, I'd look for an XML library in which I could do
    Fragment = xml:parse_fragment(String),
    [xml:unparse_element(Element) || Element <- Fragment]

(At least, that's my GUESS about what you want to achieve.)

To me, this seems like a textbook example of why Strings Are Wrong
and regular expressions make it incredibly easy to do the wrong thing.



More information about the erlang-questions mailing list