Beautiful Soup (HTML parser)
For other uses, see Beautiful Soup.
Original author(s) | Leonard Richardson |
---|---|
Stable release | 4.4.1 / September 28, 2015 |
Written in | Python |
Platform | Python |
Type | HTML parser library, Web scraping |
License | Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1] |
Website |
www |
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[2]
It is available for Python 2.6+ and Python 3.
Code example
# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2
webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))
See also
References
- ↑ "Beautiful Soup website". Retrieved 18 April 2012.
Beautiful Soup is licensed under the same terms as Python itself
- ↑ "Beautiful Soup website". Retrieved 18 April 2012.
This article is issued from Wikipedia - version of the Saturday, April 16, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.