常用解析HTML模块—BeautifulSoup
BeautifulSoup()对象
soup = BeautifulSoup(html文本, features='lxml')
soup = BeautifulSoup(open('file.html', 'r', encoding='utf-8'), 'lxml')
soup.head
soup.head.name
soup.meta.attrs
soup.meta.attrs['http-equiv']
soup.link.attrs['href']
soup.div.attrs['class']
soup.meta['http-equiv']
soup.link['href']
soup.div['class']
soup.title.string
soup.h3.string
soup.head.title.string
soup.head.contents
soup.head.children
for i in soup.body.descendants:
print(i)
soup.title.parent
soup.title.parents
soup.p.next_sibling
list(soup.p.next_sibling)