我在使用Beautifulsoup解析带有“class”属性的HTML元素时遇到了麻烦。代码看起来像这样

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div["class"] == "stylelistrow"):
        print div

我在脚本完成后的同一行上得到一个错误。

File "./beautifulcoding.py", line 130, in getlanguage
  if (div["class"] == "stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
   return self._getAttrMap()[key]
KeyError: 'class'

如何消除这个错误呢?


当前回答

下面的方法对我很有效

a_tag = soup.find_all("div",class_='full tabpublist')

其他回答

以下操作应该可以工作

soup.find('span', attrs={'class':'totalcount'})

用你的类名替换'totalcount',用你正在寻找的标签替换'span'。此外,如果类包含多个带空格的名称,只需选择一个并使用即可。

附注:这个函数用给定的条件找到第一个元素。如果你想找到所有的元素,那么将'find'替换为'find_all'。

试着先检查div是否有class属性,就像这样:

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs:
    if "class" in div:
        if (div["class"]=="stylelistrow"):
            print div

其他答案对我不起作用。

在其他回答中,findAll被用于soup对象本身,但我需要一种方法来对从findAll之后获得的对象中提取的特定元素中的对象执行类名查找。

如果您试图在嵌套的HTML元素中进行搜索,以按类名获取对象,请尝试下面的-

# parse html
page_soup = soup(web_page.read(), "html.parser")

# filter out items matching class name
all_songs = page_soup.findAll("li", "song_item")

# traverse through all_songs
for song in all_songs:

    # get text out of span element matching class 'song_name'
    # doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
    song.find("span", "song_name").text

注意事项:

I'm not explicitly defining the search to be on 'class' attribute findAll("li", {"class": "song_item"}), since it's the only attribute I'm searching on and it will by default search for class attribute if you don't exclusively tell which attribute you want to find on. When you do a findAll or find, the resulting object is of class bs4.element.ResultSet which is a subclass of list. You can utilize all methods of ResultSet, inside any number of nested elements (as long as they are of type ResultSet) to do a find or find all. My BS4 version - 4.9.1, Python version - 3.8.1

CSS选择器

单班第一场比赛

soup.select_one('.stylelistrow')

匹配列表

soup.select('.stylelistrow')

复合类(即与另一个类)

soup.select_one('.stylelistrow.otherclassname')
soup.select('.stylelistrow.otherclassname')

复合类名中的空格,例如class = stylelistrow otherclassname被替换为"."。您可以继续添加类。

类列表(OR -匹配当前的任何一个)

soup.select_one('.stylelistrow, .otherclassname')
soup.select('.stylelistrow, .otherclassname')

类属性,其值包含一个字符串,例如"stylelistrow":

以“style”开头:

[class^=style]

以row结尾

[class$=row]

包含“列表”:

[class*=list]

^, $和*是操作符。更多信息请点击:https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors

如果你想排除这个类,那么,以anchor tag为例,选择没有这个类的anchor tags:

a:not(.stylelistrow)

你可以在:not()伪类中传递简单、复合和复杂的css选择器列表。见https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/:不是


Bs4 4.7.1 +

innerText包含字符串的特定类

soup.select_one('.stylelistrow:contains("some string")')
soup.select('.stylelistrow:contains("some string")')

N.B.

汤式饮料2.1.0 + 2020年12月

NEW: In order to avoid conflicts with future CSS specification changes, non-standard pseudo classes will now start with the :-soup- prefix. As a consequence, :contains() will now be known as :-soup-contains(), though for a time the deprecated form of :contains() will still be allowed with a warning that users should migrate over to :-soup-contains(). NEW: Added new non-standard pseudo class :-soup-contains-own() which operates similar to :-soup-contains() except that it only looks at text nodes directly associated with the currently scoped element and not its descendants.

具有特定子元素的特定类,例如标签

soup.select_one('.stylelistrow:has(a)')
soup.select('.stylelistrow:has(a)')

soup.find("form",{"class":"c-login__form"})

多个

res=soup.find_all("input")
for each in res:
    print(each)