On Sat, Mar 14, 2009 at 09:24:48AM -0000, Stefano Rivera wrote:
> * You should probably use get_soup from utils rather than creating your own soup. It knows about encodings and compression.
Not sure if I follow your thinking on this one here - is the intention
to get rid of a beautifulsoup dependancy?
Personally, I find this code:
soup = BeautifulSoup(urlopen(url))
title = soup.title.string
easier to read, and simpler than calling get_html_parse_tree to
retrieve an etree (and the resultant iteration on that); also what
would be the benefit of using get_html_parse_tree to get back a soup?
All that is required is the (1st) title here.
Cheers,
Jonathan
P.S. of course, the following is also elegant:
import lxml.html
t = lxml.html.parse(url)
title = t.find(".//title").text
Hi Stefano--,
On Sat, Mar 14, 2009 at 09:24:48AM -0000, Stefano Rivera wrote:
> * You should probably use get_soup from utils rather than creating your own soup. It knows about encodings and compression.
Not sure if I follow your thinking on this one here - is the intention
to get rid of a beautifulsoup dependancy?
Personally, I find this code: urlopen( url))
soup = BeautifulSoup(
title = soup.title.string
easier to read, and simpler than calling get_html_parse_tree to
retrieve an etree (and the resultant iteration on that); also what
would be the benefit of using get_html_parse_tree to get back a soup?
All that is required is the (1st) title here.
Cheers,
Jonathan
P.S. of course, the following is also elegant: parse(url) ".//title" ).text
import lxml.html
t = lxml.html.
title = t.find(