Heroku has been added to the list of cloud sites with availability issues. Their write-up makes me nostalgic for my Motorola pager and the IBM AIX systems that always used to wake me up at night.
The primary on-call engineer was paged and quickly escalated the problem to the on-call incident commander who began investigating. The situation very quickly worsened and some nodes became unresponsive, causing applications with DNS records pointing to those IP addresses to become unreachable.
Oh, but wait, this isn’t some old traditional IT environment. This is the cloud. Applications are unreachable? A traditional IT environment attack, a flood of SYN packets, is to blame.
These attacks affected any customers who had their DNS configured to direct a root domain name (i.e.: mydomain.com, but not www.mydomain.com) at the IP addresses we publish in our documentation.  Customers with applications directed via CNAME records at proxy.heroku.com were generally unaffected. Because Heroku.com was also directed at these IP addresses, deploys via git push heroku were similarly affected.
This outage comes just as cloud DDoS mitigation marketing is taking off and encouraging companies to trust their service providers.
VeriSign, which now sells a $35K/year DDoS mitigation service, has just published a scare-report that the chance of attack is higher than ever.
Examination of a global sample of websites revealed that when availability problems occur, sites hosting their own DNS (representative of most enterprises today) are much more impacted than those using third-party managed DNS providers
The CloudFlare blog explains how they would perform traffic analysis for you and then apply limits to reduce the impact duration of DoS attacks.
One of our user’s site was under a denial of service (ddos) attack earlier this week. You can see the surge in traffic (the green line). Then, soon after, you can see the CloudFlare system start to learn from the change in data and start identifying that the new traffic is indeed an attack (red line), rather than a surge in legitimate traffic. As CloudFlare starts to identify the new traffic as an attack, the system starts to block it at our edge nodes around the world.
I wonder if we should call this the Boa Constrictor defence to DDoS — it looks like a well-fed snake.
There are numerous theoretical advantages to having a service provider respond on your behalf to an incident. The reality, however, may be a very different story. One big question to ask a provider is whether they will “own” or personally accept the incident as their own if you transfer responsibility. And that begs the question of whether an incident is in any way a consequence of their architecture/environment.