I've accepted an answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible to track hits, and any workaround (sending a non-cached web-beacon, writing to a unified table, etc.) slows the site down worse than the bots would. There is likely some pricey hardware from Cisco or the like that can help at a high level, but it's hard to justify the cost if CAPTCHA-ing everyone is an alternative. I'll attempt a more full explanation later, as well as cleaning this up for future searchers (though others are welcome to try, as it's community wiki).


这是关于woot.com上的垃圾销售。我是Woot Workshop的总统,Woot Workshop是Woot的子公司,负责设计,撰写产品描述,播客,博客文章,并主持论坛。我使用CSS/HTML,对其他技术几乎不熟悉。我与开发人员密切合作,在这里讨论了所有的答案(以及我们的许多其他想法)。






所以我们又回到了扫描IP, a)在这个云网络和垃圾邮件僵尸的时代是相当无用的,b)考虑到来自一个IP地址的业务数量,捕获了太多无辜的人(更不用说非静态IP isp的问题和试图跟踪它的潜在性能影响)。












你的网站被非人类攻击,拖慢了所有人的速度。 编剧最终“赢得”了产品,让常客感到被骗了。


The user experience sucks for humans, as they have to decipher CAPTCHA, pick out the cat, or solve a math problem. If the perceived benefit is high enough, and the crowd large enough, some group will find their way around any tweak, leading to an arms race. (This is especially true the simpler the tweak is; hidden 'comments' form, re-arranging the form elements, mis-labeling them, hidden 'gotcha' text all will work once and then need to be changed to fight targeting this specific form.) Even if the scripters can't 'solve' your tweak it doesn't prevent them from slamming your front page, and then sounding an alarm for the scripter to fill out the order, manually. Given they get the advantage from solving [a], they will likely still win [b] since they'll be the first humans reaching the order page. Additionally, 1. still happens, causing server errors and a decreased performance for everyone.




将道具卖给非脚本人。 保持网站运行的速度不被机器人减慢。 不要让“正常”用户完成任何任务来证明他们是人类。


ASP.net AJAX控件工具包中的NoBot控件呢?


我将要描述的方法有两个要求。1) Javascript被强制执行2)一个具有有效http://msdn.microsoft.com/en-us/library/bb894287.aspx浏览器会话的web浏览器。


Moving along to the problem and the solution. The problem is in two parts. The first is that you cannot block out an individual for "doing bad things". To fix this you setup a method that takes in the browsers valid session and generate a md5sum + salt + hash (of your own private device) and send it back to the browser. The browser then is REQUIRED to return that hashed key back during every post / get. If you do not ever get a valid browser session, then you reply back with "Please use a valid web browser blah blah blah". All popular browsers have valid browser session id's.


Now this next part is why it requires javascript. On the client you build a simple hash for each character that comes from the keyboard versus the value of the text in the textarea. That valid key comes over to the server as a simple hash and has to be validated. While this method could easily be reverse engineered, it does make it one extra hoop that individuals have to go through before they can submit data. Mind you this only prevents auto posting of data, not DOS with constant visits to the web site. If you even have access to ajax there is a way to send a salt and hash key across the wire and use javascript with it to build the onkeypress characters "valid token" that gets sent across the wire. Yes like I said it could easily be reversed engineered, but you see where I am going with this hopefully.


You see here, the goal is to 1) make the anonymous non-anonymous (even if it's only per session) and 2) develop a method to identify bots vs. normal people by establishing patterns in the way they use your system. You can't say that the latter is impossible, because I have done it before. While, my implementations were for tracking video game bots I would seem to think that those algorithms for identifying a bot vs. a user can be generalized to the form of web site visits. If you reduce the traffic that the bots consume you reduce the load on your system. Mind you this still does not prevent DOS attacks, but it does reduce the amount of strain a bot produces on the system.











少数用户将被要求经历重重考验 少数用户将无法获得特别优惠





No matter what, you will have to do some IP based throttling to thwart the 'bot slamming'. Since it seems important to you to allow unauthenticated (non-logged-in) visitors to get the special offers, you only have IPs to go by initially, and although they're not perfect, they do work against single-IP bots. Botnets are a different beast, but I'll come back to those. For now, we will do some simple throttling to beat rapid-fire single-IP bots. The performance hit is negligable if you run the IP check before all other processing, use a proxy server for the throttling logic, and store the IPs in a memcached lookup-optimized tree structure.


With rapid-fire single-IP bots throttled, we still have to address slow single-IP bots, ie. bots that are specifically tweaked to 'fly under the radar' by spacing requests slightly further apart than the throttling prevents. To instantly render slow single-IP bots useless, simply use the strategy suggested by abelenky: serve 10-minute-old cached pages to all IPs that have been spotted in the last 24 hours (or so). That way, every IP gets one 'chance' per day/hour/week (depending on the period you choose), and there will be no visible annoyance to real users who are just hitting 'reload', except that they don't win the offer. The beauty of this measure is that is also thwarts 'alarm bots', as long as they don't originate from a botnet. (I know you would probably prefer it if real users were allowed to refresh over and over, but there is no way to tell a refresh-spamming human from a request-spamming bot apart without a CAPTCHA or similar)


You are right that CAPTCHAs hurt the user experience and should be avoided. However, in _one_ situation they can be your best friend: If you've designed a very restrictive system to thwart bots, that - because of its restrictiveness - also catches a number of false positives; then a CAPTCHA served as a last resort will allow those real users who get caught to slip by your throttling (thus avoiding annoying DoS situations). The sweet spot, of course, is when ALL the bots get caught in your net, while extremely few real users get bothered by the CAPTCHA. If you, when serving up the 10-minute-old cached pages, also offer an alternative, optional, CAPTCHA-verified 'front page refresher', then humans who really want to keep refreshing, can still do so without getting the old cached page, but at the cost of having to solve a CAPTCHA for each refresh. That is an annoyance, but an optional one just for the die-hard users, who tend to be more forgiving because they know they're gaming the system to improve their chances, and that improved chances don't come free.


Christopher Mahan had an idea that I rather liked, but I would put a different spin on it. Every time you are preparing a new offer, prepare two other 'offers' as well, that no human would pick, like a 12mm wingnut for $20. When the offer appears on the front page, put all three 'offers' in the same picture, with numbers corresponding to each offer. When the user/bot actually goes on to order the item, they will have to pick (a radio button) which offer they want, and since most bots would merely be guessing, in two out of three cases, the bots would be buying worthless junk. Naturally, this doesn't address 'alarm bots', and there is a (slim) chance that someone could build a bot that was able to pick the correct item. However, the risk of accidentally buying junk should make scripters turn entirely from the fully automated bots.



Okay............ I've now spent most of my evening thinking about this, trying different approaches.... global delays.... cookie-based tokens.. queued serving... 'stranger throttling'.... And it just doesn't work. It doesn't. I realized the main reason why you hadn't accepted any answer yet was that noone had proposed a way to thwart a distributed/zombie net/botnet attack.... so I really wanted to crack it. I believe I cracked the botnet problem for authentication in a different thread, so I had high hopes for your problem as well. But my approach doesn't translate to this. You only have IPs to go by, and a large enough botnet doesn't reveal itself in any analysis based on IP addresses.




不要试图阻止机器人使用你的网站 不去寻求立即见效的解决办法,打持久战



这可以让你记录下速度 客户在买东西。


例如,设置3小时的窗口 从某个不知名的时间开始 天(午夜?)只有机器人和隐士 会不断刷新一个页面3 好几个小时才拿到订单 秒。不要改变基准时间, 只有窗户的大小。








您的订单已经下单,正在排队。 您的订单已经处理完毕。 您的订单已发出。

用户认为他们排在一个公平的队列中。每1小时处理一次队列,让普通用户也经历一次队列,以免引起怀疑。只有在机器人和隐士账户排队超过“人类平均下单时间+ x小时”后,才会处理他们的订单。有效地减少机器人对人类的影响。











Actually, best practice seems to be to use two hidden fields, one with an initial value, and one without. It's the rare bot which can ignore both fields. Check for one field to be blank, and the other to have the initial value. And hide them using CSS, not by making them "hidden" fields: .important { display : none ; } Please don't change the next two fields. Bots tend to like fields with names like 'address'. The text in the paragraph is for those few rare human beings who have a non-CSS capable browser. If you're not worried about them, you can leave it out. In the logic for processing the form, you'd do something like: if (address2 == "xyzzy" and address3 == "") { /* OK to send / } else { / probably have a bot */ }