如何按类查找元素

我在使用Beautifulsoup解析带有“class”属性的HTML元素时遇到了麻烦。代码看起来像这样

soup = BeautifulSoup(sdata)
mydivs = soup.findAll('div')
for div in mydivs: 
    if (div["class"] == "stylelistrow"):
        print div

我在脚本完成后的同一行上得到一个错误。

File "./beautifulcoding.py", line 130, in getlanguage
  if (div["class"] == "stylelistrow"):
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 599, in __getitem__
   return self._getAttrMap()[key]
KeyError: 'class'

如何消除这个错误呢?

当前回答

关于@Wernight对上面关于部分匹配的答案的评论…

你可以部分匹配:

西班牙凉菜汤:

from gazpacho import Soup

my_divs = soup.find("div", {"class": "stylelistrow"}, partial=True)

两者都将被捕获并作为Soup对象列表返回。

2020-10-09 19:40:32

其他回答

直接的方法是:

soup = BeautifulSoup(sdata)
for each_div in soup.findAll('div',{'class':'stylelist'}):
    print each_div

确保你使用了findAll的外壳，它不是findAll

2013-04-10 07:48:46

CSS选择器

单班第一场比赛

soup.select_one('.stylelistrow')

匹配列表

soup.select('.stylelistrow')

复合类(即与另一个类)

soup.select_one('.stylelistrow.otherclassname')
soup.select('.stylelistrow.otherclassname')

复合类名中的空格，例如class = stylelistrow otherclassname被替换为"."。您可以继续添加类。

类列表(OR -匹配当前的任何一个)

soup.select_one('.stylelistrow, .otherclassname')
soup.select('.stylelistrow, .otherclassname')

类属性，其值包含一个字符串，例如"stylelistrow":

以“style”开头:

[class^=style]

以row结尾

[class$=row]

包含“列表”:

[class*=list]

^， $和*是操作符。更多信息请点击:https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors

如果你想排除这个类，那么，以anchor tag为例，选择没有这个类的anchor tags:

a:not(.stylelistrow)

你可以在:not()伪类中传递简单、复合和复杂的css选择器列表。见https://facelessuser.github.io/soupsieve/selectors/pseudo-classes/:不是

Bs4 4.7.1 +

innerText包含字符串的特定类

soup.select_one('.stylelistrow:contains("some string")')
soup.select('.stylelistrow:contains("some string")')

N.B.

汤式饮料2.1.0 + 2020年12月

NEW: In order to avoid conflicts with future CSS specification changes, non-standard pseudo classes will now start with the :-soup- prefix. As a consequence, :contains() will now be known as :-soup-contains(), though for a time the deprecated form of :contains() will still be allowed with a warning that users should migrate over to :-soup-contains(). NEW: Added new non-standard pseudo class :-soup-contains-own() which operates similar to :-soup-contains() except that it only looks at text nodes directly associated with the currently scoped element and not its descendants.

具有特定子元素的特定类，例如标签

soup.select_one('.stylelistrow:has(a)')
soup.select('.stylelistrow:has(a)')

2019-05-23 03:50:27

使用class_=如果你想在不指定HTML标签的情况下查找元素。

对于单个元素:

soup.find(class_='my-class-name')

对于多个元素:

soup.find_all(class_='my-class-name')

2021-02-16 10:47:43

单

soup.find("form",{"class":"c-login__form"})

多个

res=soup.find_all("input")
for each in res:
    print(each)

2021-06-27 15:17:07

具体到BeautifulSoup 3:

soup.findAll('div',
             {'class': lambda x: x 
                       and 'stylelistrow' in x.split()
             }
            )

会找到所有这些:

<div class="stylelistrow">
<div class="stylelistrow button">
<div class="button stylelistrow">

2014-12-09 21:48:51

如何按类查找元素

推荐文章

最新文章

标签