I've accepted an answer, but sadly, I believe we're stuck with our original worst case scenario: CAPTCHA everyone on purchase attempts of the crap. Short explanation: caching / web farms make it impossible to track hits, and any workaround (sending a non-cached web-beacon, writing to a unified table, etc.) slows the site down worse than the bots would. There is likely some pricey hardware from Cisco or the like that can help at a high level, but it's hard to justify the cost if CAPTCHA-ing everyone is an alternative. I'll attempt a more full explanation later, as well as cleaning this up for future searchers (though others are welcome to try, as it's community wiki).


这是关于woot.com上的垃圾销售。我是Woot Workshop的总统,Woot Workshop是Woot的子公司,负责设计,撰写产品描述,播客,博客文章,并主持论坛。我使用CSS/HTML,对其他技术几乎不熟悉。我与开发人员密切合作,在这里讨论了所有的答案(以及我们的许多其他想法)。






所以我们又回到了扫描IP, a)在这个云网络和垃圾邮件僵尸的时代是相当无用的,b)考虑到来自一个IP地址的业务数量,捕获了太多无辜的人(更不用说非静态IP isp的问题和试图跟踪它的潜在性能影响)。












你的网站被非人类攻击,拖慢了所有人的速度。 编剧最终“赢得”了产品,让常客感到被骗了。


The user experience sucks for humans, as they have to decipher CAPTCHA, pick out the cat, or solve a math problem. If the perceived benefit is high enough, and the crowd large enough, some group will find their way around any tweak, leading to an arms race. (This is especially true the simpler the tweak is; hidden 'comments' form, re-arranging the form elements, mis-labeling them, hidden 'gotcha' text all will work once and then need to be changed to fight targeting this specific form.) Even if the scripters can't 'solve' your tweak it doesn't prevent them from slamming your front page, and then sounding an alarm for the scripter to fill out the order, manually. Given they get the advantage from solving [a], they will likely still win [b] since they'll be the first humans reaching the order page. Additionally, 1. still happens, causing server errors and a decreased performance for everyone.




将道具卖给非脚本人。 保持网站运行的速度不被机器人减慢。 不要让“正常”用户完成任何任务来证明他们是人类。


Well, nobody knows you're a bot either. There's no programatic way to tell the whether or not there's a human on the other end of the connection without requiring the person to do something. Preventing scripts/bots from doing stuff on the web is the whole reason CAPTCHAs were invented. It's not like this is some new problem that hasn't seen a lot of effort expended on it. If there were a better way to do it, one that didn't involve the hassle to real users that a CAPTCHA does, everyone would be using it already.







The good news is that they only have a limited window of time in which to win the race. And what I don't think they have is an unlimited number of smart people who are on call to reverse engineer your site at the moment you unleash a deal. So if you can make them jump through a specific hoop that is hard for them to figure out, but automatic for your legitimate customers (they won't even know it's there), you can delay their efforts just enough that they get beat by the massive number of real people who are just dying to get your hot deal.

The first step is to make your notion of authentication non-binary, by which I mean that, for any given user, you have a probability assigned to them that they are a real person or a bot. You can use a number of hints to build up this probability, many of which have been discussed already on this thread: suspicious rate activity, IP addresses, foreign country geolocation, cookies, etc. My favorite is to just pay attention to the exact version of windows they are using. More importantly, you can give your long-term customers a clear way to authenticate with strong hints: by engaging with the site, making purchases, contributing to forums, etc. It's not required that you do those things, but if you do then you'll have a slight advantage when it comes time to see special deals.

Whenever you are called upon to make an authentication decision, use this probability to make the computer you're talking to do more-or-less work before you will give them what they want. For example, perhaps some javascript on your site requires the client to perform a computationally expensive task in the background, and only when that task completes will you let them know about the special deal. For a regular customer, this can be pretty quick and painless, but for a scammer it means they need a lot more computers to maintain constant coverage (since each computer has to do more work). Then you can use your probability score from above to increase the amount of work they have to do.

To make sure this delay doesn't cause any fairness problems, I'd recommend making it be some kind of encryption task that includes the current time of day from the person's computer. Since the scammer doesn't know what time the deal will start, he can't just make something up, he has to use something close to the real time of day (you can ignore any requests that claim to come in before the deal started). Then you can use these times to adjust the first-come-first-served rule, without the real people ever having to know anything about it.

The last idea is to change the algorithm required to generate the work whenever you post a new deal (and at random other times). Every time you do that, normal humans will be unaffected, but bots will stop working. They'll have to get a human to get to work on the reverse-engineering, which hopefully will take longer than your deal window. Even better is if you never tell them if they submitted the right result, so that they don't get any kind of alert that they are doing things wrong. To defeat this solution, they will have to actually automate a real browser (or at least a real javascript interpreter) and then you are really jacking up the cost of scamming. Plus, with a real browser, you can do tricks like those suggested elsewhere in this thread like timing the keystrokes of each entry and looking for other suspicious behaviors.

So for anyone who you know you've seen before (a common IP, session, cookie, etc) you have a way to make each request a little more expensive. That means the scammers will want to always present you with your hardest case - a brand-new computer/browser/IP combo that you've never seen before. But by putting some extra work into being able to even know if they have the bot working right, you force them to waste a lot of these precious resources. Although they may really have an infinite number, generating them is not without cost, and again you are driving up the cost part of their ROI equation. Eventually, it'll be more profitable for them to just do what you want :)





脚本kiddies可以找出AJAX查询并将其自动化,但是,对来自同一IP的请求进行速率限制也很容易。由于标准人类用户没有从浏览器发起这些请求的典型方法,因此很明显,从同一IP向AJAX URL发起的高速率请求将由某种形式的自动化系统发起。







If your main concern is performance degradation, and you're looking at true hammering, then you're actually dealing with a DoS attack, and you should probably try to handle it accordingly. One common approach is to simply drop packets from an IP in the firewall after a number of connections per second/minute/etc. For example, the standard Linux firewall, iptables, has a standard operation matching function 'hashlimit', which could be used to correlate connection requests per time unit to an IP-address.


编辑: 正如novatrust指出的那样,仍然有ISP实际上没有分配ip给他们的客户,因此有效地,这样一个ISP的脚本客户将禁用该ISP的所有客户。