Thursday, February 17, 2011

extracting paragraph in python using lxml

Programmer Question

I would like to extract paragraphs in html by python. I used lxml module but it doesn't do exactly what I am looking for.



print html.parse(url).xpath('//p')[1].text_content()

Here is the First Paragraph.

Here is the second Paragraph.

Paragraph Three."




I should add that, in different pages I have different number of paragraph, so would like to make a list and put paragraph into it after that.



Find the answer here

No comments:

Post a Comment

LinkWithin

Related Posts with Thumbnails