当你用chromedriver使用Selenium时，网站能检测到吗?

我一直在用Chromedriver测试Selenium，我注意到一些页面可以检测到你正在使用Selenium，即使根本没有自动化。甚至当我手动使用Chrome通过Selenium和Xephyr浏览时，我经常会看到一个页面说检测到可疑活动。我已经检查了我的用户代理和浏览器指纹，它们都与正常的Chrome浏览器完全相同。

当我在普通的Chrome浏览器中浏览这些网站时，一切都很好，但当我使用Selenium时，我被检测到。

理论上，chromedriver和Chrome在任何web服务器上看起来应该是完全一样的，但不知何故它们可以检测到它。

如果你想要一些测试代码，试试这个:

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=1, size=(1600, 902))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--profile-directory=Default')
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--disable-plugins-discovery");
chrome_options.add_argument("--start-maximized")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.delete_all_cookies()
driver.set_window_size(800,800)
driver.set_window_position(0,0)
print 'arguments done'
driver.get('http://stubhub.com')

如果你在stubhub周围浏览，你会在一两个请求内被重定向和“阻止”。我一直在研究这个问题，但我不知道他们是如何判断用户正在使用Selenium的。

他们是怎么做到的?

我在Firefox中安装了Selenium IDE插件，当我在普通的Firefox浏览器中只使用附加插件访问stubhub.com时，我被禁止了。

当我使用Fiddler查看来回发送的HTTP请求时，我注意到“假浏览器”的请求经常在响应头中有“无缓存”。

是否有一种方法可以从JavaScript检测我是否在Selenium Webdriver页面中?建议当你在使用网络驱动程序时没有办法检测。但这些证据表明情况并非如此。

该网站将指纹上传到他们的服务器上，但我检查了一下，Selenium的指纹与使用Chrome时的指纹是相同的。

这是他们发送到服务器上的指纹载荷之一:

{"appName":"Netscape","platform":"Linuxx86_64","cookies":1,"syslang":"en-US","userlang":"en-
US","cpu":"","productSub":"20030107","setTimeout":1,"setInterval":1,"plugins":
{"0":"ChromePDFViewer","1":"ShockwaveFlash","2":"WidevineContentDecryptionMo
dule","3":"NativeClient","4":"ChromePDFViewer"},"mimeTypes":
{"0":"application/pdf","1":"ShockwaveFlashapplication/x-shockwave-
flash","2":"FutureSplashPlayerapplication/futuresplash","3":"WidevineContent
DecryptionModuleapplication/x-ppapi-widevine-
cdm","4":"NativeClientExecutableapplication/x-
nacl","5":"PortableNativeClientExecutableapplication/x-
pnacl","6":"PortableDocumentFormatapplication/x-google-chrome-
pdf"},"screen":{"width":1600,"height":900,"colorDepth":24},"fonts":
{"0":"monospace","1":"DejaVuSerif","2":"Georgia","3":"DejaVuSans","4":"Trebu
chetMS","5":"Verdana","6":"AndaleMono","7":"DejaVuSansMono","8":"LiberationM
ono","9":"NimbusMonoL","10":"CourierNew","11":"Courier"}}

它在Selenium和Chrome中是相同的。

vpn只用于一次使用，但在加载第一个页面后就会被检测到。显然，正在运行一些JavaScript代码来检测Selenium。

听起来他们就像是在网络应用防火墙后面。看看modsecurity和OWASP，看看它们是如何工作的。

实际上，您要问的是如何进行机器人检测逃避。这不是Selenium WebDriver的目的。它是用来测试你的web应用程序，而不影响其他web应用程序。这是可能的，但基本上，您必须查看WAF在其规则集中寻找什么，如果可以的话，特别避免使用selenium。即使这样，它仍然可能不起作用，因为您不知道他们使用的是什么WAF。

您做了正确的第一步，即伪造用户代理。如果这不能工作，那么WAF是合适的，你可能需要变得更棘手。

这一点来自其他答案。首先要确保正确地设置了用户代理。也许让它攻击本地网络服务器或嗅探流出的流量。

2015-10-23 23:28:23

即使你发送了所有正确的数据(例如，Selenium没有显示为扩展，你有一个合理的分辨率/位深，等等)，也有许多服务和工具可以分析访问者的行为，以确定参与者是用户还是自动化系统。

例如，访问一个网站，然后立即通过将鼠标直接移动到相关按钮上来执行一些操作，在不到一秒钟的时间内，这是用户实际上不会做的事情。

它也可能是一个有用的调试工具，使用网站，如https://panopticlick.eff.org/，以检查您的浏览器有多独特;它还将帮助您验证是否有任何特定的参数表明您正在运行Selenium。

2015-10-25 22:01:14

据说Firefox在使用webdriver时设置window.navigator.webdriver === true。这是根据一个旧的规格(例如:archive.org)，但我无法在新的附录中找到它，除了一些非常模糊的措辞。

它的测试是在文件fingerprint_test.js中的selenium代码中，其中末尾的注释说“目前仅在firefox中实现”，但我无法通过一些简单的greping识别该方向的任何代码，无论是在当前(41.0.2)firefox发布树中还是在chromium树中。

我还发现了一个关于2015年1月firefox驱动程序b82512999938中指纹识别的旧提交的评论。该代码仍然在昨天从javascript/firefox-driver/extension/content/server.js下载的Selenium GIT-master中，并附有一个链接到当前w3c webdriver规范中措辞略有不同的附录的注释。

2015-10-27 23:44:32

用下面的代码编写一个HTML页面。你会看到在DOM中selenium应用了outerHTML中的webdriver属性:

< html > < >头 < script type = " text / javascript " > < !-- 函数显示窗口(){ javascript:(警报(document.documentElement.outerHTML)); ｝ //--> > < /脚本 > < /头身体< > < >形式 <input type="button" value="Show outerHTML" onclick="showWindow()" > > < /形式身体< / > < / html >

2015-10-28 04:10:54

尝试在Chrome的特定用户配置文件中使用Selenium。这样，您就可以作为特定用户使用它，并定义您想要的任何内容。当这样做时，它将作为“真正的”用户运行。用进程资源管理器查看Chrome进程，你会发现标签的不同之处。

例如:

username = os.getenv("USERNAME")
userProfile = "C:\\Users\\" + username +
    "\\AppData\\Local\\Google\\Chrome\\User Data\\Default"

options = webdriver.ChromeOptions()
options.add_argument("user-data-dir={}".format(userProfile))
# Add any tag here you want.
options.add_experimental_option(
    "excludeSwitches",
    """
        ignore-certificate-errors
        safebrowsing-disable-download-protection
        safebrowsing-disable-auto-update
        disable-client-side-phishing-detection
    """.split()
)
chromedriver = "C:\Python27\chromedriver\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)

谷歌Chrome标签列表在这里

2015-10-28 16:39:35

正如我们已经在问题和发布的答案中发现的那样，这里有一个名为“蒸馏网络”的反网络抓取和机器人检测服务。根据该公司CEO的采访:

尽管他们可以创造新的机器人，但我们找到了识别的方法硒是他们使用的一个工具，所以我们阻止硒不不管他们在机器人上迭代了多少次。我们现在正在做使用Python和许多不同的技术。一旦我们发现了规律从一种机器人中脱颖而出，然后我们对其进行逆向工程他们使用的技术并将其识别为恶意的。

要了解他们究竟是如何检测硒的，还需要时间和更多的挑战，但目前我们可以肯定地说:

it's not related to the actions you take with Selenium. Once you navigate to the site, you get immediately detected and banned. I've tried to add artificial random delays between actions, take a pause after the page is loaded - nothing helped it's not about browser fingerprint either. I tried it in multiple browsers with clean profiles and not, incognito modes, but nothing helped since, according to the hint in the interview, this was "reverse engineering", I suspect this is done with some JavaScript code being executed in the browser revealing that this is a browser automated via Selenium WebDriver