Making it Right ~ Details and #HootSuite Credit for the #AWS Outage

By Dave Olson | 4 years ago | 35 Comments

HootSuite ERUptime is Everything

For web-centric companies like HootSuite, last Thursday was a worst-case scenario. With major service interruptions to Amazon Web Services, HootSuite was down for approximately 15 hours until our engineers restored service.

In general, we enjoy stellar performance with minimal outages on either HootSuite or Owly (our URL shortener) and now service over 3 million social networks sending over a million updates per day with almost zero downtime. We also have held strong during very active posting periods including the Japanese earthquake.

We know how important up-time is for you and truly appreciate the kind words from our users who missed using HootSuite. Further, many of you rely on HootSuite for your business and we take your trust seriously. As such, we’re taking all steps to prevent future mishaps.

What follows are notes about what we’re doing to “make it right” plus a technical breakdown about “what happened.”

Making it Right

Our Terms of Service to users outlines that we’ll provide refunds after a 24 hour outage. While this outage was significantly less, we acknowledge users were inconvenienced and we want to make things right.

With this in mind, we are offering a 50 Point credit (value $50) for the HootSuite Social Analytics tool to all users. Redeem your credit by May 13th, 2011 by using coupon code: HOOTREPORT and use the report credit within a month. For Pro and Enterprise customers, we’ll reach out via email with an additional coupon.

Note: After redeeming your coupon code, you’ll see it itemized on your invoice but the total won’t be updated until the billing date.

We are taking steps to increase redundancy of our services and data across multiple geographic regions. This was a bit of a unique outage which is highly unlikely to occur again, but we’ll be even more prepared for future emergencies.

Thanks for your continued support,


Ryan Holmes, CEO HootSuite

And now for technical updates…

What Happened?

It’s important to note that we enjoy a great relationship with Amazon Web Services and HootSuite was able to grow quickly due to their cloud computing offerings. However, technology can fail and in this case, the cloud zone hosting HootSuite went down (see Amazon Health chart), but we were able restore well in advance of the affected zones coming back online.

In brief, HootSuite has backups across multiple availability zones within Amazon’s North Virginia data center. We restored service relatively quickly by rebuilding our infrastructure into a new zone using backups of data.

Whatever happened on Amazon’s end is out of our control but, for your interest, here is a technical post-mortem from HootSuite CTO, Simon Stanlake:

“At approximately 01:00h PDT on Thursday morning (April 21) HootSuite began experiencing issues accessing EBS volumes on several of its AWS hosted instances. Critically, this included our production and slave database servers.

Following Amazon’s recommended best practices, we keep copies of our database across multiple availability zones at their North Virginia data center. Storing across multiple availability zones is meant to keep data always available, since availability zones are engineered to be highly reliable and independent of each other (see Amazon AWS FAQ). However in this case the outage affected EBS volumes on all availability zones, so we were essentially forced to sit and wait while Amazon worked to restore EBS access.

In the late afternoon on Thursday EBS access began returning to 3 out of 4 availability zones in the North Virginia region. However the remaining affected availability zone contained our production database, which was still not responding.

After waiting for several more hours with no sign of improvement we made the decision to cut over to a backup copy of the database in a working zone and re-spun our entire infrastructure there. This brought HootSuite back online at approximately 19:00h PDT but necessitated rolling back any changes made after 22:30h PDT on Tuesday April 19. We hated to make this decision but it turned out to be the best option, as Amazon did not restore service to our production database until 03:57h PDT, Sunday April 24.”

Lost in the Tubes

Since we restored from a database, there is a period of missing data from about Tuesday 22:37h PDT until about Thursday 19:20h PDT). We are working to resolve these anomalies as accurately and quickly as possible. If you were working with HootSuite during this time, please note the following:

  • Any new users from this period will have their accounts recreated and payment status restored (and also tidy up duplicates) however, you’ll have to re-add social networks, search streams, draft and scheduled messages etc.
  • [UPDATE] Any existing users who changed their payment status should post a ticket to the Help Desk for assistance restoring their lost changes
  • Any messages scheduled to send during the outage were not delivered, plus any messages scheduled during the mentioned period will need to be re-added

We appreciate your patience and invite you to post a ticket at the Help Desk, if you require assistance with your account.

Keeping you Informed

Finally, this outage provided an opportunity to test our Emergency Messaging Procedure. To keep you informed, we posted updates via multiple Twitter accounts (remember we’re international) using a staging server instance of HootSuite. We also tracked progress on the HootSuite Facebook PageHelp Desk and blog.

Here are a few more articles about the outage:

Thanks to those who noticed our messaging strategy and we’re happy to hear your feedback about how to keep you best informed during unexpected outages.

Written by

27 comments
Markus
Markus

Thanks, good reaction!

You write about the 50points and 50 more for pro and enterprise.

i have pro, and when i added the coupon i have the 50 there, but not the 100.

Idea or wrong understanding?

best,

markus

Dave Olson
Dave Olson

Moved this question to the Help Desk.

John Vasko
John Vasko

We appreciate the gesture for the analytics points but would have preferred a credit to the account for the day's outage as well.

Pam Gilchrist
Pam Gilchrist

Thanks big owl for being sensitive to your customers and valuing us. I appreciate the explanation of the issue and taking the extra step to make things right.

You provide a great service that I highly recommend.

Keep Hooting.

Pam

Tracy Sestili
Tracy Sestili

Thank you for going above and beyond in rectifying the situation. I'm an independent consultant and not only rely on HootSuite for my clients but teach HootSuite and sell HootSuite to clients. Thanks to your updates and generous offer I now can continue to promote HootSuite favorably. Thank you!

Jeff Santos
Jeff Santos

Since the system restoration I have been experiencing problems with the Twitter post scheduling function:

1 - The system keeps alerting me about posts "that could not be sent" although they were actually already sent.

2 - The system has published a post 90 minutes early today and it still appears as "scheduled" on my control panel. Since I post at 15 minute regular intervals for my clients, this is a major upset, as I have to re-schedule the whole day if 1 schedule post fails.

3 - Yesterday night I could only schedule about a dozen posts. After that, the system would simply "swallow" my new posts, meaning they would disappear without even an error message.

4 - Finally, there is a very annoying bug in the scheduled posts stream. After a few entries it simply will not refresh the list, even if click on the "refresh stream" button. Even worse, the "Show More..." link never works. When I click on it, it shows the message "Loading more..." and nothing else happens. So, if I want to check my list of scheduled posts I have to refresh the whole page.

I just hope you solve these issues as quickly as you did restore the service after Amazon Cloud went down. For your reference, I am on a Ubuntu Maverick PC running Opera 11.10, Firefox 4.0 and Google Chrome 10.0.648.205. These issues are browser-independent and are also reproducible on a Windows Vista PC. In case you need more details about the reported problems feel free to contact me at the email adress supplied with this comment.

Thank you,

JS.

Ken Cook
Ken Cook

I haven't been with hootsuite long, but I am very Impressed with the service. It has been well worth the small monthly fee, and would recommend it to anyone. The way this company has handled the outage and aftermath has shown me that Hootsuite is truly a company worth working with for the Long Term.

Thank you for the Transparency.

Dave Olson
Dave Olson

Our pleasure Ken and we hope you'll be Hooting for a long time.

Jonathan
Jonathan

The 50 dollar credit is that towards are monthly pro account

Mike Abasov
Mike Abasov

Jonathan, no it's for the Social Analytics tools.

Jackie Joyride
Jackie Joyride

When will posting to FaceBook from HootSuite be back online?

Dave Olson
Dave Olson

We're working diligently amidst Facebook's API troubles. Please visit the Help Desk for most up-to-date info.

thebananarepublican
thebananarepublican

i totally agree with what Jon said and ironically, i have not been a pro user for long either. the coupon is much appreciated. can't wait to get the pro email :) it was a rough day thursday, but i flapped my wings a little.

Shalom [+]

thebananarepublican

Josemi
Josemi

How can we redeem the Social Analytics code?

Dave Olson
Dave Olson

Enter in the Coupon Code field when you run your 30-day free trial of Pro.

Emre
Emre

Thanks to the hootsuite people

Dave Olson
Dave Olson

You are welcome, thanks to our users.

Jon Clayton
Jon Clayton

I have not been a Hootsuite pro user for very long, but I do enjoy the service and features. I appreciate Hootsuite doing more than is required to please its customers. That is good service as well as smart business. Thanks!

Michelle Long
Michelle Long

The link for the HootSuite Social Analytics offered doesn't work.

Dave Olson
Dave Olson

We fixed it ;-)

Bob Garrett
Bob Garrett

Hey Dave

I am having problems accessing my HS Pro account this morning. I am not sure if its the issue - but I applied the 2 outage coupons to my account this morning. However I am now not able to log into my HS account to do anything.

I posted a ticket for your tech folks. I didnt use HS before adding the 2 coupons, so I am not sure if the coupons are the issue.

Thanks Bud