[erlang-questions] Parsing XML with xmerl_xpath/string/2
Michael Santos
michael.santos@REDACTED
Wed Apr 15 22:42:41 CEST 2015
On Wed, Apr 15, 2015 at 04:15:20PM -0400, lloyd@REDACTED wrote:
> Well, pick myself up from one stumble only to trip over the next.
>
> I've successfully downloaded an XML book file from Amazon and extracted title and author code using this function, much of which I borrowed from Dave Thomas:
>
> get_book_data(ISBN) ->
> {ok, {_Status, _Headers, Body}} = httpc:request(get_book_request(ISBN)),
> {Xml, _Rest} = xmerl_scan:string(Body),
> [ #xmlText{value=Author}] = xmerl_xpath:string("//Author/text()", Xml),
> [ #xmlText{value=Title}] = xmerl_xpath:string("//Title/text()", Xml),
> [ #xmlText{value=Publisher}] = xmerl_xpath:string("//Manufacturer/text()", Xml),
> {Author, Title, Publisher}.
>
> That same file has URLs for downloading book cover images. The XML looks like this:
>
> ...
> <LargeImage>
> <URL>
> http://ecx.images-amazon.com/images/I/51shI6vQ-SL.jpg
> </URL>
> <Height Units="pixels">500</Height>
> <Width Units="pixels">333</Width>
> </LargeImage>
> ...
>
> Modifying get_book_data/1 to include a xmerl_xpath:string/2 call that looks like this:
>
> [ #xmlText{value=LargeImage}] = xmerl_xpath:string("//LargeImage/text()", Xml),
>
> fails:
>
> ** exception error: no match of right hand side value []
> in function amz_lookup:get_book_cover_images/1 (/home/lloyd/wga/site/src/LitUtils/amz_lookup.erl, line 110)
>
> My hunch is that the string "//LargeImage/text())" is the problem. xmerl_xpath:string/2 docs give me no comfort. Clearly I don't understand them sufficiently.
>
> I've tried xmerl_xpath:string("//URL/text()"), xmerl_xpath:string("//LargeImage/URL/text()"), etc. etc. etc.--- all to no avail. Google search also comes up empty.
>
> So once more to the well of wisdom. Can some kind soul point out the error of my ways?
Using "//LargeImage/URL/text()" works for me:
1> [ #xmlText{value=LargeImage}] = xmerl_xpath:string("//LargeImage/URL/text()", Xml).
[#xmlText{parents = [{'URL',1},{'LargeImage',1}],
pos = 1,language = [],
value = "http://ecx.images-amazon.com/images/I/51shI6vQ-SL.jpg",
type = text}]
You can figure out what is going on by building up the xpath statement
in the shell:
1> Str = "<LargeImage><URL>http://ecx.images-amazon.com/images/I/51shI6vQ-SL.jpg</URL><Height Units=\"pixels\">500</Height><Width Units=\"pixels\">333</Width></LargeImage>".
2> {Xml,_} = xmerl_scan:string(Str).
3> rr(xmerl).
4> xmerl_xpath:string("/", Xml).
5> xmerl_xpath:string("/LargeImage", Xml).
6> xmerl_xpath:string("/LargeImage/URL", Xml).
More information about the erlang-questions
mailing list