如何按类查找元素

我在使用Beautifulsoup解析带有“class”属性的HTML元素时遇到了麻烦。代码看起来像这样

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div["class"] == "stylelistrow"):
        print div

我在脚本完成后的同一行上得到一个错误。

File "./beautifulcoding.py", line 130, in getlanguage
  if (div["class"] == "stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
   return self._getAttrMap()[key]
KeyError: 'class'

如何消除这个错误呢?

当前回答

你可以使用BS3优化你的搜索，只找到那些给定类的div:

mydivs = soup.find_all("div", {"class": "stylelistrow"})

2011-02-18 12:04:10

其他回答

其他答案对我不起作用。

在其他回答中，findAll被用于soup对象本身，但我需要一种方法来对从findAll之后获得的对象中提取的特定元素中的对象执行类名查找。

如果您试图在嵌套的HTML元素中进行搜索，以按类名获取对象，请尝试下面的-

# parse html
page_soup = soup(web_page.read(), "html.parser")

# filter out items matching class name
all_songs = page_soup.findAll("li", "song_item")

# traverse through all_songs
for song in all_songs:

    # get text out of span element matching class 'song_name'
    # doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
    song.find("span", "song_name").text

注意事项:

I'm not explicitly defining the search to be on 'class' attribute findAll("li", {"class": "song_item"}), since it's the only attribute I'm searching on and it will by default search for class attribute if you don't exclusively tell which attribute you want to find on. When you do a findAll or find, the resulting object is of class bs4.element.ResultSet which is a subclass of list. You can utilize all methods of ResultSet, inside any number of nested elements (as long as they are of type ResultSet) to do a find or find all. My BS4 version - 4.9.1, Python version - 3.8.1

2020-05-21 13:13:40

这可以让我访问class属性(在beautifulsoup 4上，与文档所说的相反)。KeyError返回的是一个列表，而不是字典。

for hit in soup.findAll(name='span'):
    print hit.contents[1]['class']

2014-07-29 07:03:36

下面的方法对我很有效

a_tag = soup.find_all("div",class_='full tabpublist')

2019-07-13 11:36:24

或者我们可以使用lxml，它支持xpath和非常快!

from lxml import html, etree 

attr = html.fromstring(html_text)#passing the raw html
handles = attr.xpath('//div[@class="stylelistrow"]')#xpath exresssion to find that specific class

for each in handles:
    print(etree.tostring(each))#printing the html as string

2020-04-18 08:03:38

试着先检查div是否有class属性，就像这样:

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs:
    if "class" in div:
        if (div["class"]=="stylelistrow"):
            print div

2011-02-18 12:02:37

如何按类查找元素

推荐文章

最新文章

标签