Another Anatomy
April 7, 2008 on 12:23 pm | In Foobars, Hardware, Insider View, New Features by Josh Jones | 122 Comments
Okay, nothing silly this time, I promise…
Some of you may have noticed that we’ve been having what a problem that is, although maybe not the worst in DreamHost history, definitely in the top 5.
There has been a DreamHost Status post about it, but it’s been going on so long, there obviously needs to be more said.

The History
The events that conspired to cause this horrible performance for everybody in our “blingy” cluster actually started to take root 19 months ago.
That was when I made this post asking our customers for some suggestions on storage. I made the mistake in that post of mentioning the name of one particular storage vendor who apparently does a search for their name in rss feeds of all kinds of blogs. I won’t mention their name again here, to test if they REALLY read this blog, but they were the one on the list right after “Netapp”.
Anyway, immediately a sales guy from there was hounding me about how great their product was. It would have super-duper reliability, super-duper performance, and super-duper ease-of-management. It was super-duper expensive compared to our current solution (about 3x the price per GB), so in the end I declined.
But, over the next year he kept hounding me and hounding me, and eventually the price came down to something in line with our current costs, so we decided to try one unit for our new cluster, “Blingy”. After we were satisfied with our internal testing, Blingy went live with the new storage solution in December 2007.

Smooth Sailing
At first, everything was fine, performance was great, everybody was hunky and dory. But then, as usage started to go up, the new file system started acting up. Around the same time every night, the system would stop responding to NFS requests for a while, which would immediately break web and mail service for everybody in the entire cluster.. thousands of customers.
Our Bad
Now, it can be a big mistake to put live customers on any new system. But honestly, we’d tested it lots, researched it a ton, and we added people very slowly at first, and it performed great.
Our biggest mistake I believe had nothing to do with what specific vendor or hardware we went with.. it was simply putting so many eggs in one basket!
Even with our Netapps (which are pretty much awesome), there are problems from time-to-time. However, a typical hosting cluster will have a dozen or so Netapps, which means any problems are one twelfth as big.
With Blingy, EVERY customer is on this one “mega” filer, which in theory should make for better performance, reliability, and ease of management. And since we got the clustered solution (in an active-active configuration)… there really is no single point of hardware failure in this thing.
But, as it turns out, there are a lot of non-hardware failures in the world.
Their Bad
Well, the techs at the vendor couldn’t figure out what was causing the NFS freezing, and so they recommended us doing a major OS upgrade to hopefully fix it.
During this whole time, the fiber channel disks were slowly filling up, and we’d been trying to move large files off to the sata pool (it’s a two-tiered solution, and there’s a feature that automatically moves less-accessed data to lower tiers).. however the thing couldn’t move the data fast enough. It couldn’t finish doing a “move job” in a single day, and every day it’d sort of “crash”, which would screw up the move job, and nothing would get moved.
Also, as the disk kept getting more full, performance kept getting worse, creating a vicious cycle. We ordered some more fiber channel disk shelves at the end of February to grow the main FC volume, since we couldn’t get things off to SATA, and it was supposed to come on March 10th and be installed at the same time as the major OS upgrade.
However, the disks didn’t end up getting installed until March 25th, and at that point it turned out we could NOT grow the FC volume with these disks (well, it was technically possible, but their on-site techs recommended VERY VERY heavily against it.. it would severly impact performance), which was sort of the whole point. So now we had a new FC volume which we still had to migrate users to.

Your Bad
Of course, this whole time, new customers just kept signing up, and being added to Blingy. What were you guys thinking?
By this point we knew this was a bad idea, but we didn’t have a new cluster ready (we’d expected Blingy to grow for another couple of months), and we try to never ever grow old clusters again once they’ve been “shut off” from new signups (because in time they stablize and have very few problems).
However, the moving people off to the new FC vol, or the original SATA vol, or even the new Netapp we also added to Blingy, just wasn’t happening fast enough. So on April 2nd we bit the bullet and switched Blingy off as the “new customer” cluster and started growing good old “Postal” again. Once we did that, we were finally able to get ahead of the curve and total usage on our first fiber channel volume has been slowly dropping ever since.
We tried at that point to contact the vendor to see if we could just get more drives that WOULD allow us to grow fcvol1, but they said their manufacturers were closed for inventory for a week after the end of the quarter and we couldn’t get anything until Friday, April 11th at the absolute soonest. Later they said they could find us some they could get us by Tuesday, April 7th, and we preliminarily said we’d take them.
This whole time we had a support ticket open with the vendor about the crashes (the OS upgrade didn’t fix it), and finally on April 3rd we received notice that they’d fixed the bug that they believed was causing it! However, the patch still needed to go through their “QA”. Finally, this Sunday April 6th they said it was all ready to be deployed, so last night we did.
What Now
Well, right now, performance is still not great on fcvol1… but mail and web should be pretty much working. One thing we’ve noticed is a website that hasn’t been visited in a long time will have a big lag still upon the first visit.. but then subsequent reloads/visits seem much faster.
At least the total disk usage is coming down now, and hopefully by tomorrow it’ll be below 85% which is supposedly a magic number where performance is fine. We’re going to keep off-loading it until things are great, though. We’ve got plenty of disk space for it, the problem is just it takes so long to move it.
We also I guess will find out tonight if the NFS freezing bug is fixed by this new patch. Hopefully so.

It’s Too Late…
I realize this is probably too little too late for many of you, but I just wanted to sincerely apologize for this whole big Blingy cluster-f*ck. Also, if you’re on Blingy (you can tell from the panel by clicking “account status” and looking at “Your Email Server”, we’d like to offer you a month worth of hosting credit.
To get it, all you need to do is contact support from our panel and make the subject of your message “Blingy Account Credit”. That’s all you have to do, and we’ll credit everybody who asks (and is actually on Blingy!) next Monday (April 14th).
Good Reminiscing Friday
March 21, 2008 on 6:09 pm | In Foobars, Insider View, Updates by Josh Jones | 78 Comments
Well, it was a little over two months ago that we had what I think is pretty safe to call the worst disaster in DreamHost history.
In retrospect to me, it’s kind of funny that the worst disaster didn’t turn out to be due to a security breach, a power outage, a loss of data, or actually anything related to our actual hosting service. I guess it shouldn’t be a surprise that people care a lot more about their bank accounts than they do their websites.
I have realized that billing is the one issue where how important we feel it is is completely at odds with how important you guys feel it is.
What I’m trying to say is, we’ve always been ultra-flexible and lax about how people pay, when people pay, or even about giving credits, discounts, or refunds. We figure, whatever, pay us when you’re ready, we’re not sending anybody to collections or ruining anybody’s credit over some measly bandwidth bill.

What we’ve always tried to focus on more (even though it might not seem like it at times!) is our hosting system’s stability, performance, and features.
I guess I’ve always figured that any billing-related error can be easily undone (worst case scenario, it costs us a little money); there is no lasting harm done to the customer. Whereas having a website or email problem could potentially cause permanent damage to somebody’s business or personal life or something?
Well then, let’s go back and see just how little money a worst case scenario actually costs, shall we?
Credits and refunds to cover people’s bank fees: $52,000.
Sigh, if only everybody kept a big cushion of cash in their account! The main damage that can be caused by a billing snafu is for people who get their account overdrawn, and because of that aren’t able to make a critical purchase, or have a check bounce, causing hassles and incurring bank fees. We offered to pay people any amount their bank charged them for going negative, and in the end that total looks like it came to about $52,000.

Accidental refunds: $170,000.
The worst part of this whole process (for us) turned out to be just after the accidental billing, ironically when we were trying to make things right!
If you recall, our system was not actually charging about 75% of the time we thought it did.. and so we refunded thousands of people who were never charged (but, 75% of the refunds didn’t work either). Well, out of all that, and after two months, there are still about 600 accounts who were credited a total of $170,000 in excess of what we charged them that we haven’t been able to get back from them or their bank.
It is slightly annoying when the same guy who complains to the high heavens when he thought he’d been over-charged $9,000 by accident conveniently disappears when we realize that actually, he’s been over-refunded $9,000 by accident.
Extra credit card fees: $82,000.
Another slightly annoying thing is that credit card processors don’t credit you back any fees when you refund a transaction. Overall, the extra credit card processing we did resulted in extra fees of about $350,000! Fortunately, after a whole lot of groveling and explaining the situation (and waiting two months), we finally got all but $82,000 of that back from First Data, American Express, and Discover Card.

Extra support messages: 20,000.
As you may have surmised, people wrote to us about this thing. About 20,000 times… and it would have been tens of thousands more if we hadn’t put up an “emergency block” against new messages for a little while in there.
How much this extra support actually cost (in terms of your wased time, tech support overtime pay, and other questions taking longer to answer to) is hard to say, but normally we only get about 45,000 messages in a whole month!
Accounts canceled: 1000.
It’s also kind of hard to say how many people actually closed their account because of the incident, but in January we did have about 1,000 more accounts closed than average. Assuming each of those accounts would have stayed for maybe another year, that’s another $120,000 down the Intertubes. It’s crazy… from all our power problems back in 2006, we hardly lost any accounts at all.

Goodwill lost: Priceless.
Yeah, it turns out this whole blog post is nothing more than another clichéd MasterCard commercial parody.
P.S. I guess it’s nice to know, less than two hours away from our biggest data center move ever, that we’ll cause a tiny fraction of the disruption to our customers that one unexpected fat finger did!
P.P.S. Thanks RIM, for scheduling a blackberry outage exactly at the same time. It makes us look better. And, maybe some of our Happy Customers will blame their lack of email tonight on you!
The Final Update
January 17, 2008 on 12:52 pm | In Foobars, Updates by Josh Jones | 427 CommentsOkay, all the people who had still not gotten their refunds was starting to seem a little weird, so after further investigation yesterday, I think we’ve finally got things completely fixed.
It turns out, there was a glitch in our new PayflowPro.pm that resulted in only the first transaction in a single second actually going through! According to Paypal’s site, that PayflowPro.pm should be just a drop-in replacement for the old PFProAPI.pm… and it did seem to be, after changing two lines everything seemed okay.
However, there was one little difference. The new HTTPS interface requires you to pass a unique id for each transaction, and PayflowPro.pm generated that unique id as follows:
my $request_id=substr(time . $data->{TRXTYPE} . $data->{INVNUM},0,32);
The problem was, we never passed in the (optional) “INVNUM” field.. we had an invoice number, but we passed it in as the (also optional) “COMMENT1″. So, our “unique” request_id was pretty much just the current time (plus whether it was a sale or a credit)!
In my testing this didn’t fail, because I didn’t run multiple transactions in the same second. Also, they apparently still return the same old success code we test for when this happens! But when multiple biller services run in parallel on all our controllers, lots of transactions end up happening on the same second.
The Upside
It turns out of the actually closer to $9,600,000 we thought we mistakenly charged, only actually about 1/4 of them ever _actually_ hit people’s credit cards. Our system thought we charged them, and they received an email receipt, but that was where it ended. It turns out we actually billed “only” about $2,100,000 incorrectly.
The Downside
This bug still existed until late last night (around 4am).. so when we ran our super-refunder script, the same thing was happening. Only about 1/4 of the refunds successfully went through. This resulted in the following situation:
About 9/16th of our customers: weren’t actually billed OR actually refunded.
About 1/16th of our customers: were billed AND were refunded.
About 3/16th of our customers: were billed BUT WEREN’T refunded.
About 3/16th of our customers: weren’t billed BUT WERE refunded. (of course, nobody wrote in about it!)
Anyway, last night we fixed the bug (by passing our invoice in as INVNUM) and re-ran another fixer that took an actual log of successful transactions downloaded from our processor and cross-referenced everything with our system. This is what it did:
About 9/16th of our customers: marked their bill and refund as $0 amount.
About 1/16th of our customers: left everything alone.
About 3/16th of our customers: redid the refund.
About 3/16th of our customers: redid the charge.
Double checking now, there were no more of those glitches from before, so everything seems okay.
Once again, all the stuff mentioned in the last post still holds true (you may not see the correction on your statement yet, but if you call your processor they should see it coming, for REALs this time), and once again, I’m very sorry about this whole fiasco.
Sincerely,
Josh Jones
P.S. For people wondering how the “robust and stable” rebiller could have created multiple future charges for the same date… I guess I meant “robust and stable” in regards to normal use over the last ten years. It looks like in this case, when multiple instances were running in parallel on a future date, race conditions allowed some multiple charges for the same period to be created. That too should never happen again now that we don’t allow future bill dates.
The Aftermath
January 16, 2008 on 4:35 pm | In Foobars, Updates by Josh Jones | 342 CommentsIt seems like it’s about time for a follow-up on things from yesterday.
First, I just want to apologize for the regular-style blog post about it yesterday. Hopefully this will be the (picture, bold, and italics-free) blog post many of you would have liked to have seen yesterday.
The current status: we believe to have refunded everybody who was incorrectly billed at this point. This was pretty much finished yesterday at 3pm, but there were a few stragglers who we got today. If you were charged and haven’t seen the refund show up on your credit card / bank statement yet, try calling your bank. Lots of places take a day or two or three or even four to update their statements even if the money’s already back in, but they should see it (by tomorrow for sure) if you call them.
If this/these erroneous charge(s) by us resulted in you having any sort of overdraft/bounced check/nsf fee from your financial institution, please contact our support team from the web panel. We’d just like to request that you include a copy of your statement with the necessary info showing the fees. It can be either a paper statement or a print out of your online statement, or even a screenshot of your online statement and it can be scanned and attached to your support message via our support form or faxed to us at 714-990-2600. If you fax it, please be sure to write your domain name or DreamHost account number on the fax. When we get this, we will put money on your credit card equal to the amount your bank charged you, as well as give you a DreamHost account credit for the same amount on top of that.
Another thing… if you’ve decided because of this fiasco you’d like to cancel hosting with us, we will allow you to get a full credit card refund of any unused portion of your pre-paid contract, even if you’re past our standard 97 day money-back guarantee. To do so, just close your account as normal from our web panel (”Billing > Manage Account” area). Then, after it’s done, write into support and let them know you’d like to get your remaining account credit refunded to your credit card due to the billing snafu of January 15th and we’ll be happy to comply.
Checks to Protect Your Balances
Finally, here are the precautions we’ve now added to our billing system to make sure nothing like this happen ever again:
1. Our biller service will no longer accept a date in the future.
2. This whole time, we did have an option to specify “never automatically bill me more than $X in a day” on our web panel. Of course, not too many people had this set, and why would they have to? Nevertheless, we’ve made a change now that even if you don’t have a specific daily limit set our system will not allow billing you in one day more than 50% more than the most you’ve ever authorized in the past.
3. Our rebiller does an automatic filling-in of old charges when it finds some missing. This should never actually happen anyway, but we’ve added a new check that if it ever finds itself filling in more than 3 missing charges on any account it stops immediately and notifies our financial team.
4. We’ve also added an overall check where if the total number of payments in a day are more than double the average number of payments we’ve gotten on that calendar day for the last seven months it fails and notifies our financial team.
And that’s it.. I hope this puts things more or less behind us. And remember, if you have any specific issues, our support team is always there!
And of course, my sincere apologies for all of this.
Thanks,
Josh Jones
P.S. I apologize for that joke about the triple billing in the newsletter thing too, but you have to admit, it was kind of ironic that I actually did screw up billing less than a week later.
P.P.S. Some of you have attempted to email us directly with information about unresolved issues stemming from this billing fiasco and have received autoresponders telling you you can’t email us directly. That restriction was unintentional has now been removed so please re-send us your email if you have not already contacted us through other means.
Um, Whoops.
January 15, 2008 on 9:52 am | In Foobars, Insider View, Musings by Josh Jones | 667 Comments
Hello.. how’s your morning going?
I hope it’s been a little better than mine.
We had a teensy eensy weensy little billing error last night… my first clue something was up when I saw this morning’s daily billing report (so far): $7,500,000.
It turns out due to my excessively fat fingers, nearly every one of our customers has been seriously over-billed in the last 12 hours.
I bet when you read this part of the last newsletter:
4. New Office!
Another important thing I’ve been doing instead of writing newsletters
is looking out the window of our NEW OFFICE:http://blog.dreamhost.com/2007/12/21/were-so-high-right-now-you-dont-even-know
If your next web hosting bill from us is mysteriously tripled, now you
know why.
.. you thought it was a joke!
Ha, the joke is on you! I guess. Um, okay, no, not really, I’m sorry.
How on earth could something like this happen?
Let Me Explain
A couple of weeks ago, just around new years, we started beefing up some of our internal “controller” servers. These are the machines that run all of our “behind-the-scenes” services; things from adding a user to registering a domain to configuring apaches to rebilling customers.
I was on a little-bit-too-long vacation, but when I got back, I noticed our daily credit card payments seemed a tad low in the new year.
So, late last week I tried re-running the billing services for all the days back three weeks or so. I knew this was safe, because after 10 years, the one thing you DO get perfect is your billing system. Our biller is pretty bug-free and robust at this point, because we’d be broke and eating bugs if it weren’t.
In fact, it’s so robust you can just run it on any day you want, and it’s safe. It won’t double-charge people and it’ll even automatically find any missing charges and catch everything up to the day you said.
Anyway, I ran it, and things were fine.. and sure enough, it caught a lot of missed payments. I didn’t have time to look into it right then, but I made a note to myself to check up on it on Monday (yesterday) and see if things were fine or still messed up.

Come Monday
Monday came. I checked the reports and sure enough, things were still pretty low. So I looked at the logs for some of the biller services, and I noticed they were only failing on the machines that had been recently upgraded!
That explained why we were getting some money still (since not all the controllers have been upgraded yet), but not all of it.
Anyway, it turned out there was no 64 bit version of the PFProAPI module we use to interface to the credit card transaction server. No big deal, there’s a new module that interfaces with their new and preferred https interface, and it was only a couple of lines of code to change to get us switched over!
So anyway, I made the change, and it worked, and I even tested it, and things were fine!
But then… late last night, I realized: when I re-ran those biller services last week, they must not have fixed everybody then either! It’s just that by running it again I randomly got different people being charged on the working controllers who had been assigned an upgraded (and therefore broken) one before.
So why not just run it all one more time?
Sure, it should be no problem! So I did, manually running the biller (which is normally automatically scheduled) for 2008-01-14, 2008-01-13, 2008-01-12, 2008-01-11, 2008-01-10, 2008-01-09, 2008-01-08, 2008-01-07, 2008-01-06, 2008-01-05, 2008-01-04, 2008-01-03, 2008-01-02, and 2008-01-01.
I probably should have just stopped there. But then I thought better. I thought to myself, “When did we start upgrading these controllers anyway?”
I couldn’t remember. But, since the biller is super-safe and robust anyway, I went ahead and ran it for 2008-12-31, 2008-12-30, 2008-12-29, 2008-12-28, 2008-12-27, 2008-12-26, and 2008-12-25, just for the hell of it.
Notice Anything?
Don’t feel bad if you didn’t. I kind of missed it myself.
THOSE SHOULD HAVE BEEN 2007!!
Heh, uh.. um, er.. my bad?
So what happened?
Well, that super-robust and stable biller did what it was programmed to do, it ran as though today was December 31st, 2008!
And what did it see? Well, it saw a whole lot of accounts (essentially all of them) who for some unknown, mysterious reason hadn’t been charged at all for eleven and a half months!
So off it went, busily through the night, “fixing” everything up for “today”, December 31st, 2008.
Really, it’s sort of amazing this never happened before in the last ten years.

There IS a bug here.
I can imagine the half second or so of thought that sprinted through the programmer’s mind when he was adding the ability to allow you to pass in what day to run the biller as though today is:
Hmm.. well, I could see us POSSIBLY wanting to be able to bill for a future date.
Well guess what… NO! We will NEVER want to rebill as though today were a day that hasn’t happened yet! But instead, somebody along the line (Sage? Me? Somebody else?) figured, “What’s the harm in keeping it flexible?”
About $7,500,000 in harm, that’s what!
The serious part.
The end to this story is that of course, I’m very very sorry, we’re very very sorry, and I’m sure you’re very very sorry this happened. I really am. I understand the sort of problems that an unexpected large charge to your credit card (or worse yet, your debit card) can cause. If the tone of this blog post seemed a little light, I apologize I don’t mean to offend and I realize how serious an issue this is. I’ve been up since 3:50am trying to undo the damage and maybe I’m a little shell-shocked.
A new service is running right now (in parallel on all the controllers) that fixes all those future charges, re-enables your account if it was erroneously suspended, and if your credit card was automatically rebilled, refunds the payment automatically. You don’t have to contact us or your bank, and you’ll get an email when your account is finished fixing up. It’s going to take several more hours to complete. There are (or were, after this incident) a lot of you these days!
If, because of this billing mistake, you somehow incurred some fees from your bank or credit card company, please let us know after tomorrow (today we are just replying to all 10,000+ billing messages with a generic explanation) and we’ll do our best to make it right for you.
And of course, the biller no longer allows dates in the future.
The moral of this story is that “flexibility” is rarely desired in programming! The less a program will accept/the less a program will do/the less options and preferences it has, the more usable it is/the more understandable it is/the more stable it is.
Tough Love

When designing a program, you’ve got to make some tough decisions .. and when you really can’t decide if this is something your users will need someday, err on the side of leaving it out.
Otherwise, your users will someday err on the side of your face.
Schadenfreude
July 24, 2007 on 11:30 pm | In Business, Foobars, Insider View, Musings, Tech News by Josh Jones | 20 Comments
Almost exactly a year ago today, DreamHost experienced its last unplanned power outage.
Last ever?
Last ever so far! Who knows what the future holds? (Besides me.)
But for now, I’m just glad the present has been a little better for DreamHost customers than for 365 Main’s!
Because in case you hadn’t heard or noticed, power outages in San Francisco today caused downtime at Craigslist, Technorati, TypePad, LiveJournal, Yelp, RedEnvelope, and more!

Who here is glad DreamHost is in sunny, safe, earthquake, mudslide, forest fire, riot, tsunami-free, Los Angeles now? And who here is publicly enjoying that 365 Main is not?
Here’s a big hint: he’s really good looking and wrote this post.
Of course, the real reason we had no problems is not because our data center is finally super reliable, or that Los Angeles itself never has so much as a cloudy day, or even that we’re just lucky.
It’s because I am in Chicago at HostingCon and so am temporarily unable to break anything.
Of course, that’s not really true either. I’m not in Chicago; as everyone knows, I’m a compulsive liar. In fact, this statement is a lie.
But, even if I was at hosting con (and everybody knows we don’t go to hosting conventions), my ability to break DreamHost systems knows no boundary of time or space, and strikes at any time, usually without warning and definitely without mercy.
Why were we were spared this time?
The honest truth is that any data center can, at any time and for any reason, no matter what precautions they take, have an outage! You’d think making a reliable data center would be a lot easier than making a reliable software service, seeing as how it’s all just power cables, air conditioning, and gasoline.
And yet somehow, it seems like all even the best and most expensive data centers can do is make the outages a little less frequent.

What IS a poor host to do?
Nothing, really.
I mean, the only way you can really achieve “five nines” uptime is by having an entire architecture designed around the assumption that ANYTHING can fail… and at the worst possible time. Duh.
However, like most Las Vegas escorts, that sort of redundancy does not come easily. Or cheap. And the truth of the matter is unless you’re Google, most likely an entire day of downtime once a year is not going to cost you as much as it would to truly prevent it.
In fact, I wish there were some low-reliability data centers out there! I bet if somebody made an ultra low-cost data center, one that provided “adequate” cooling, network, and power capacity, but no UPS, fire-suppression, generators, crazy physical security, or extra earthquake protection, they would clean UP.
They could probably charge around half of any data center I’ve ever seen, and I bet with only twice the downtime… and that would be very appealing.
I mean, think about it… how many of you could deal with an extra day of downtime per year for half the price? Heck, you’d probably be fine with FOUR days of downtime a year if it meant 75% off.. but would you pay double to save 12 hours of downtime a year? Would you pay FOUR times as much to save 18? Eight times as much to save 21?
That’s pretty much how it works, and I’m guessing not a lot of you would.
Of course, maybe I’m over-estimating the cost savings of skimping on redundancy in a data center a little, and maybe I’m under-estimating the reliability hit a tiny bit. On the other hand, my blog posts have never been wrong before.

AND, if somebody did come out with a “Crap-of-the-Art” data center, it’d make it a lot more feasible for those who really need reliability to get two; thereby keeping all their company’s eggs out of one risqué basket.
In fact, what we’ve been doing over the last year is breaking our system down into smaller and smaller isolated “clusters,” and distributing them between three data centers (all in LA). The idea being, data centers will go down.. let’s at least try and keep the eggs in our other baskets un-scrambled. And since we’re not really counting on much reliability from them anyway, it sure would be nice if those data centers all charged a lot less!
Of course our network still has a single (though redundant) point of failure, but we are working towards eventually making each data center a complete stand-alone “node”… some day.
This day, however, I think I’ll just go to bed… while taking pleasure in the fact that it was somebody else this time!
Super Lame Apology
February 28, 2007 on 10:14 pm | In Business, Foobars, Insider View, Musings, Rants, Updates by Josh Jones | 154 CommentsWe are all really bearry sorry about the extended downtime this Sunday from the planned power outage!
The power was only out for about an hour, but as it came back on, there was trouble, trouble, trouble. Our router started acting funny, some file servers were mis-configured, some web servers didn’t want to come back on, and so on, and so on, and so on…
Although most things were back up and running within the five hours, the network in general was still flakey for about 8 hours, and everything wasn’t TOTALLY fixed for about 36 hours.
We really thought things would go a lot smoother, given that for once we had some advance warning, but good old Murphy was in full effect, y’all, again.. urgh.
Anyway, to try and make up for it a little bit, we thought we’d offer something we’ve never offered before at DreamHost, something we thought we’d never need, something we always thought a little silly… an SLA!

That’s right, I’m offering you a… Super Lame Apology!
HA ha ha! Oh, did you think I meant a “Service Level Agreement”?
But really, isn’t that all a typical SLA is?
“We’re sorry we broke our promise, here’s credit for the 46 minutes you were down. Sorry.”
Lame!
In web hosting, it’s usually a credit for the exact amount of time you were down, sometimes a full day’s worth, or I guess if you are really paying a lot, a month’s worth.. though an SLA like that even in the high-end business world would be a rare animal indeed.

In the case of the outage this past weekend, if you were paying $8.95 a month you were down for anywhere from 6 to 44 cents worth of service. What would you think to yourself if we automatically credited you 44 cents on your next monthly bill?
You’d probably think either:
A. Is this 44 cent credit because February only had 28 days?
or
B. My site is down for hours and all I get is 44 cents?! That barely pays for the stamp I’m going to need to mail my foot all the way up your butt, DreamA$$Host!!
In fact, even if we gave you a full month’s credit, $8.95, you’d probably think the same thing. Either A. you didn’t really care, and the money doesn’t matter, or B. you really did care, and the money doesn’t matter.
The truth is though, we do offer an “SLA”… the same “service level agreement” you’ll find at McDonalds, Nordstrom’s, Staples, or just about any other successful business. If any customer ever comes to us with even an eigth-way legitimate gripe, we’ll do our best to fix it, even if it means giving them an account credit or their money back (even after our 97-day money-back guarantee period). Better to lose a customer on good terms than on bad, eh?

So, if we’ll happily give refunds anyway, why not go ahead and lay it all out in a “real” SLA?
I guess mostly because we feel they’re B.S. Case in point, we actually have SLAs from our data centers! Which is why I sleep sowell at night, knowing our servers are safe and sound. HA!
Not only do they fail to meet the SLA, I believe we’ve never gotten a single service credit out of them for outages… and I’ve asked!
The only useful thing you can get out of an SLA is the ability to break a long-term contract without penalty. All you really want is for everything to just work. If you’re constantly having to exercise your SLA, you’d trade all the service credits in the world for a new provider!
If that’s not the case, you don’t really care about the downtime and are just complaining to get the money! Shame on you! Go back to fatwallet.com where you come from! Hissssss!
All I’m saying is, since we’re in an industry with such a low barrier to entry, and since there’s nothing stopping you from switching hosts at any time, we really already have a lot of incentive to make our service as good as we can.
I know we fubar it sometimes, and I know we fubar it a lot, and when we do, you guys are doing the right thing by bitching and moaning and even quitting us. But a service level agreement wouldn’t change a thing.
So, so-o-o-o-o-o-o-orry!
And that’s the Super-est, Lame-est, Apology-est SLA you’re going to get!
Read This Now!
February 23, 2007 on 5:38 pm | In Foobars, Hardware, Insider View, Updates by Josh Jones | 32 Comments
Quick, before it’s gone!
If you enjoy all the hilarious hijinks, illuminating illustrations, and jovial jokes of the DreamHost Blog, you better suck down a local copy TODAY…
We’re having a planned power outage tomorrow night!
(Click that link for some more details.. it’ll be from 11:15pm PST (GMT -0800) tomorrow night (Saturday) to hopefully much less than 5 hours from then.)
Not planned by us though, planned by our building. It would have been very nice if they could have given us a little earlier heads up, or avoided the outage at all, but no, they just can’t. And trust me, we want this to happen even a tiny bit less than you do!
So, this site will be down then, as well as all other DreamHost services, with the exception of ns2.dreamhost.com and dreamhoststatus.com, which are kept off-site for exactly this sort of situation.

Well, I just thought I better post something about it here too.. thanks for your understanding, and we’re really really really really sorry.
P.S. Here’s the pic the building emailed us of the problem:

So, um, yeah. I think what that shows is a piece of metal is vibrating next to that wire and cutting into the rubber insulation… and if it gets much further in, KABOOM!
Some Late Night Moves!
January 25, 2007 on 6:10 pm | In Foobars, Funnyish, Hardware, Insider View, Updates by Josh Jones | 36 Comments
Last night we made some moves.
Patrick and I moved about 60 servers!
And I only dropped one! (Sorry about that, bomberman.)
It took about two hours, and here we are, wrapping things up:
Stage 2? At 12:30 in the morning, after moving 60 servers, what else could we possibly want to move for a STAGE 2?
Hmm… something about “Brea”?

We passed this car in the parking lot.. and soon, we were at the OTHER DreamHost office.
We waited.. THE CON WAS ON!
Patrick had told Pete (who lives right by the office) that he was just in the area, at 1:30am on a Wednesday, and I wanted him to pick up some WWF glasses for the downtown office. But, HAD PETE PLAYED US FOR FOUR FOOLS?!
Apparently not….
We made short work of the coveted sign.

And then I decided to go raid the kitchen…. WHAAaaaa!!!!
Yes, very funny, Brea. But who’s wearing the cool shades now?!

As long as we were there, we thought we might as well have some fun…
And some more fun…
We took our time. We even checked out the Official DreamHost Museum!
Why hello there, Señor Corona, you sure are working late tonight!

Of course, we couldn’t just leave those poor, unsuspecting Breaites bare-walled!
Around 3:30, we were back “home.” Mission complete. Tired. Satisfied. Ugly.
The neon sign was finally where it has always been destined to be. Down in our NOC. The HEART of DreamHost.

Epilogue…
No Run-Of-The-Mill Week, This!
January 19, 2007 on 3:24 pm | In Business, Foobars, Funnyish, Insider View, Musings, Updates by Josh Jones | 23 CommentsMONDAY!
We started the week off by having a little MLK, JR. weekend sale on our web site. Apparently, some of our affiliates hate civil rights because we got some angry emails that our little MLK, JR. day stunt was stealing their referrals by putting a promo code of our own right on our website!
They have a good point. Why would somebody use the promo code they were given when they get to the website and see a better one in a pop-up window! All these affiliates are working super-hard in the hopes of some good paypal lovin’, only to find out the very company they’ve been shilling to all their croneys has turned around and STABBED THEM IN THE FACE!
Et tu, Joshé?
Now, why so ever would we do something like that? Of course we don’t want to hurt our affiliates! So why steal their referrers in this way? Especially when our promo code was more than $97, it actually costs us more to “steal” these people about to sign up with some other promo code.
The strange, but true, but unbelievable, but really honestly true, but you-can’t-comprehend-it, but it’s for reals, truth is we’ve found that putting a pop-up promo code like that on our main site actually helps all signups, ACROSS THE BOARD.
We’re not sure why ourselves, but we get more no-promo-code signups, more affiliate-promo-code signups, and of course, more DreamHost-promo-code signups whenever we have that pop-up there! And, it seems to also have no residual effect on signups on other days afterwards.
Maybe people just feel some loyalty to their original promo code. Maybe they don’t care. Maybe they just don’t see or read the pop-up. Maybe seeing that code gets everybody in some kind of weird web-hosting frenzy and they just decide to sign up, whether they use the code or not! I know, it’s CRAZY, but that’s why I like marketing!
So basically, please please PLEASE trust us, affiliates! We try our best to only take actions in everybody’s best interests, and we’ve got nothing to gain by “stealing” your referrals!
TUESDAY!
Brett was checking the public voicemail when he came across this little gem apparently from a phone number in British Columbia.
Now, most of us didn’t really think that was the scariest bomb threat they’d ever heard. But, it was decided to notify our main data center building anyway. They then called the LAPD, and so a lot of Tuesday was spent explaining what “we do,” and how we have a “voice mail” in a “computer file”.
Also that “Brea” is a city in the LA area, different from “La Brea”, a street.
P.S. No bomb so far.
P.P.S. And really, what sort of threat gives you a deadline that’s a 48-hour window?
WEDNESDAY!

Head Honcho Michael is out of town right now, so apparently ANONYMOUS LAZY HAPPY DREAMHOST EMPLOYEE thought it’d be fine to sneak in a little nap on his office couch. And it would have been fine, if he could have only kept his pants on.
(Notice the huge amounts of DreamHost power that permeate the office?)
THURSDAY!

Aw shucks! We just got a huge shipment of World Wrestling Fund (or something?) pint glasses and travel mugs for thanks for the generous donation DreamHost and her customers gave a few months ago! Well, we’d split all the glasses in half with you, but I guess we’re just pessimists.
Consider them gifts from you to us for keeping prices down while we REDICLOUS-LY add (and subtract) bandwidth and disk space!
FRIDAY!
Whoops, did I spell something ridiculously wrong up there?
My apologies, I must have just been influenced by this SECRET PACT email I received today from an UN-NAMED WEB HOST!
I’m just trying to get some key players in the industry to agree on a few things:
a. That the current disk space/bandwidth allocations are rediclous
b. To cap them at a specific range (based on price)
c. To create a self-enforcement method for the industryI was wondering if DreamHost would be interested in joining the discussions. So far, I have spoken to almost every major hosting provider in our segment.
I guess somebody doesn’t read our blog! Doesn’t he know our whole “lowering disk and bandwidth” thing is just a coy marketing ploy?
And in summary, what a wild, zany, not-run-of-the-mill week it’s been!
And, OH, I just remembered.
This IS a run-of-the-mill week at DreamHost!
Busted!
Powered by WordPress. Pool theme by Borja Fernandez, modified by DreamHost.
Entries and comments feeds.
^Top^



