Python, XPath: Find all links to images

I'm using lxml in Python to parse some HTML and I want to extract all link to images. The way I do it right now is:

//a[contains(@href,'.jpg') or contains(@href,'.jpeg') or ... (etc)]

There are a couple of problem with this approach:

  • you have to list all possible image extensions in all cases (both "jpg" and "JPG"), wich is not elegant
  • in a weird situations, the href may contain .jpg somewhere in the middle, not at the end of the string

I wanted to use regexp, but I failed:

//a[regx:match(@href,'.*\.(?:png|jpg|jpeg)')]

This returned me all links all the time ...

Does anyone knows the right, elegant way to do this or what is wrong with my regexp approach ?

5
задан Nicu Surdu 1 May 2011 в 21:07
поделиться