[erlang-questions] Parsing XML with xmerl_xpath/string/2

Wed Apr 15 22:42:41 CEST 2015

On Wed, Apr 15, 2015 at 04:15:20PM -0400, lloyd@REDACTED wrote:
> Well, pick myself up from one stumble only to trip over the next.
> 
> I've successfully downloaded an XML book file from Amazon and extracted title and author code using this function, much of which I borrowed from Dave Thomas:
> 
> get_book_data(ISBN) ->
>    {ok, {_Status, _Headers, Body}} = httpc:request(get_book_request(ISBN)),
>    {Xml, _Rest} =  xmerl_scan:string(Body),
>    [ #xmlText{value=Author}] = xmerl_xpath:string("//Author/text()", Xml),
>    [ #xmlText{value=Title}] = xmerl_xpath:string("//Title/text()", Xml),
>    [ #xmlText{value=Publisher}] = xmerl_xpath:string("//Manufacturer/text()", Xml),
>    {Author, Title, Publisher}.
> 
> That same file has URLs for downloading book cover images. The XML looks like this:
> 
> ...
> <LargeImage>
>   <URL>
>       http://ecx.images-amazon.com/images/I/51shI6vQ-SL.jpg
>   </URL>
>       <Height Units="pixels">500</Height>
>       <Width Units="pixels">333</Width>
> </LargeImage>
> ...
> 
> Modifying get_book_data/1 to include a xmerl_xpath:string/2 call that looks like this:
> 
>    [ #xmlText{value=LargeImage}] = xmerl_xpath:string("//LargeImage/text()", Xml),
> 
> fails:
> 
> ** exception error: no match of right hand side value []
>      in function  amz_lookup:get_book_cover_images/1 (/home/lloyd/wga/site/src/LitUtils/amz_lookup.erl, line 110)
> 
> My hunch is that the string "//LargeImage/text())" is the problem. xmerl_xpath:string/2 docs give me no comfort. Clearly I don't understand them sufficiently.
> 
> I've tried xmerl_xpath:string("//URL/text()"), xmerl_xpath:string("//LargeImage/URL/text()"), etc. etc. etc.--- all to no avail. Google search also comes up empty.
> 
> So once more to the well of wisdom. Can some kind soul point out the error of my ways?

Using "//LargeImage/URL/text()" works for me:

1> [ #xmlText{value=LargeImage}] = xmerl_xpath:string("//LargeImage/URL/text()", Xml).
[#xmlText{parents = [{'URL',1},{'LargeImage',1}],
          pos = 1,language = [],
          value = "http://ecx.images-amazon.com/images/I/51shI6vQ-SL.jpg",
          type = text}]

You can figure out what is going on by building up the xpath statement
in the shell:

1> Str = "<LargeImage><URL>http://ecx.images-amazon.com/images/I/51shI6vQ-SL.jpg</URL><Height Units=\"pixels\">500</Height><Width Units=\"pixels\">333</Width></LargeImage>".

2> {Xml,_} = xmerl_scan:string(Str).

3> rr(xmerl).

4> xmerl_xpath:string("/", Xml).

5> xmerl_xpath:string("/LargeImage", Xml).

6> xmerl_xpath:string("/LargeImage/URL", Xml).