[erlang-questions] How can I break this string into a list of strings?
Richard A. O'Keefe
ok@REDACTED
Fri Jan 6 04:05:42 CET 2017
On 25/12/16 10:34 AM, lloyd@REDACTED wrote:
> Suppose I have the following string:
>
> "<h1>Hello!</h1>\n <h2>How are you?</h2>\n <p>Some text\n and more text.</p>"
>
> I would like to break it into a list:
>
> ["<h1>Hello!</h1>", "<h2>How are you?</h2>", "<p>Some text\n and more text.</p>"]
>
> string:token(MyString, "$\n") doesn't work because it would break the paragraph.
So you don't have "a string", you have an XML fragment that happens to
be stored as a string. My problem in reading this is that I don't have
the faintest idea what you want to happen IN GENERAL.
- What is to happen if there is a newline character in an attribute?
"<img src='...' alt='Two gorillas\One cop'>
- What is to happen if there is a newline between tokens inside a tag?
"<a href=\n'....'\n>Anchor Text</a\n>"
- What is to happen if there is a newline inside an element other than
a <p> element?
"<h1>Two gorillas, one cop<br>\nSpoof movie of the year</h1>"
- What is to happen if there AREN'T newlines?
"<h1>Hello!</h1><h2>How are you?</h2><p>Some text\n etc.</p>"
White space between block level elements isn't significant, which
means that you can't in general expect it to be there or to be
preserved by other tools.
- What is to happen if the newlines between elements are doubled?
"<h1>Hello!</h1>\n\n <h2>How are you?</h2>\n\n <p>Some text\n
and more text.</p>"
- What is to happen if there is a newline whitespace sequence
at the end?
"<h1>Hello!</h1>\n <h2>How are you?</h2>\n <p>Some text\n
and more text.</p>\n "
If I needed to do this, I'd look for an XML library in which I could do
Fragment = xml:parse_fragment(String),
[xml:unparse_element(Element) || Element <- Fragment]
(At least, that's my GUESS about what you want to achieve.)
To me, this seems like a textbook example of why Strings Are Wrong
and regular expressions make it incredibly easy to do the wrong thing.
More information about the erlang-questions
mailing list