The Official DreamHost Blog!Tales From the Inside!
Blog Pages

A Brief Look Back


As the end of the decade approaches, it’s a perfect time to look back and reflect on the past.  What has gone wrong and what has gone right?  We have a way of making waves, even if not always in the way we might like.  Here’s some reflections on some of those waves.

Power

Starting about 4 years ago, we were battling a problem of power constraints at our data center.  That led to a general inability to provision data center space the way we wanted to and we had to become very creative to manage our system.  The biggest problems from that time were two unplanned power outages of the entire building followed by an emergency planned outage 8 months later.  Power is one of the lifelines of a data center and not being able to rely on it can be a major distraction.  It was during that period that we established our off network status site (dreamhoststatus.com), which has proven to be a great asset.  Those issues are long behind us now, and looking forward we have plenty of power capacity.

Network

Pretty much in the middle of the power situation we were also hit with a networking problem between our two core routers that was causing serious website slowness.  We had grown so quickly that nobody really understood exactly how the network devices were interacting.  That combined with the distraction of the power situation made it take way too long to resolve the problem.  From that, we learned the hard way what we needed to do to improve and maintain the network.  Things have improved, but we were still recently hit with a couple network outages.  They were caused by human errors and weaknesses in the procedures we had in place.  We have already refined those procedures and future improvements to the networking infrastructure will also help to mitigate the potential for human error to result in outages.  This is still very much a work in progress but huge steps have been taken already and more are still to come.

Data Storage

The next major hurdle we faced involved our data storage infrastructure.  In the early days we had migrated from disks inside each server to network attached storage for the added redundancy and to allow us to better utilize our available storage.  At that time hard drives were only 9 gigabytes and our users were generally only using a hundred megabytes or less of space.  Boy, have things changed!  We now have users with multiple terabytes of data and even everyday websites sometimes have multiple gigabytes of files.  The huge growth of online video and the popularity of digital photography increased the demands on our storage infrastructure by a couple orders of magnitude over the years.  To accommodate that growth, our simple system of file servers had grown to a large network of over 100 individual servers that we had to constantly juggle data between.  The cost per gigabyte for our Network Appliance based system had also not come down nearly as fast as the per-user storage requirements had gone up, but we had become reliant on network storage for things such as backups, rapid recovery from server failures, and seamless sharing of data between separate hosting and email accounts.  We were addicted to network storage, and our next couple of major performance problems came from gloriously failed experiments with other storage products.  One of them was cheap and unreliable, and the other was expensive and unreliable.  We couldn’t win!  (Note that we haven’t used either of those for awhile so both of them may work better now than they did for us a couple of years ago.)

Through those years that our addiction to network storage had developed, a shift had happened that we hadn’t noticed.  First, individual hard drives had dropped like a rock in price and skyrocketed in capacity.  Second, users were consuming data at such a rate that evenly utilizing our available storage was no longer a problem.  Switching back to locally installed storage was the answer we had been looking for!  We started experimenting with the new server architecture and developing a backup strategy.  Then once the pieces were all in place, we started moving forward with it.  So the storage bottleneck was resolved and we had a clear path forward, but we were still throwing out a lot of knowledge we had learned over the preceding decade and were starting over from scratch in a lot of ways.  With every technology shift comes with it a new set of problems and this one was no exception.  It’s been about a year and a half since then and we’re already on our fourth revision of the server configuration (three different RAID cards with two different configs).  The current hardware has been working out quite well but we’re doing some testing to see if it can be optimized further.

Looking into next year, our core technology systems are under control and we’ll be able to focus more on improving the service than we have in years.  The future looks very bright…. but more on that tomorrow!

Filed: Insider View

19 Responses to “A Brief Look Back”
  1. Rob Says:

    Nice to see some trasparancy in regard to your operations, and what problems your company has had. Unfortunatlly, it will be a case of “we will believe it when we see it” for the optimistic outlook! Heres to a 100% up-time 2010!!

  2. Linto Says:

    Plz add me on Farmville

  3. Kevin Worthington @ kevinworthington.com Says:

    Great post! I’ve been 99.999% happy with Dreamhost, but the outages have not been exactly fun. Thankfully (fingers crossed), the worst is behind us. Thanks for being so up-front about what really goes on behind the scenes in the data center. Cheers!

  4. Tim Says:

    When I saw the picture of the thumb, I thought that was to present Josh “fat-fingering” the billing system which caused everyone to be double billed.

    Nonetheless, I still love you guys.

    Just wish you’d ditch Lighttpd for NGINX on the private servers as a supported alternative to Apache.

  5. Brad Says:

    I look forward to seeing the Dreamhost improvements in 2010!

  6. JoelW Says:

    @Tim: you can choose between lighttpd, nginx, or apache on Private Servers already. =)

  7. M@ Says:

    If you had reliability problems with Coraid, you were doing something wrong. ;)

  8. Kyle Says:

    @M: The main issue we had was corruption of XFS volumes. And our volumes tend to be quite large. The XFS developers are incredibly smart bunch of guys and have made a lot of headway in this department so perhaps it’s quite better now.

  9. Jessica H Says:

    @Dallas

    I love these kind of technical posts. Please keep them coming.

    Question: so does this mean that everyone, private server and shared hosting customers, all are now on local disk instead of NAS/SAN?

  10. Jessica H Says:

    @JoelW

    >> “@Tim: you can choose between lighttpd, nginx, or apache on Private Servers already. =)”

    Can you do this now as Tim said, with Dreamhost **supporting** it?

  11. Andrew F Says:

    @ Jessica H:

    We still have some older customers on networked storage, but we’re slowly transitioning away from it for most purposes. If you signed up or were moved between servers in the last year and a half or so, chances are that you’re on local storage.

    With regard to Nginx, we do officially support it now… we just haven’t announced it yet (it’ll probably be in the next newsletter). You can select it in the “Configure Server” section of the DreamHost Panel.

  12. Peter Hewitt Says:

    So when do we know when we’ve been moved onto a newer local storage system? Is it when the web server itself changes in the panel because I still have a file server listed.

    I clicked the “go for this first” button ages ago but have no idea whether I actually moved servers or not.

  13. DH Customer Says:

    @Peter Hewitt: I’ve clicked that button too, but I’m also still on the old networked storage server.

    A simple check is to type the following when connected with SSH:

    pwd -P

    If it returns something like: /home/.something/username you’re still on an old server. The essential part is the .something which basicly indicates the network share on which your files are located.

  14. Peter Hewitt Says:

    ah yes, so I’m still on the old system too as I get .hank in the reply :(

  15. sdayman Says:

    @Dallas, thanks for the enlightening post. I really enjoyed the read. I love hearing about the nuts and bolts of the operation.

    @peter/DH Customer, I clicked that link long ago and it took ages before I got switched over. I don’t think I received any notification, but had noticed that my Webserver had changed in my Account Status listing. I’ve not noticed any significant differences since my last server was pretty stable.

  16. John Herman Says:

    I was hoping to hear about the great billing disaster of December 2007 and how you might have improved customer service since then too.

  17. Dallas Kashuba Says:

    @John Herman I intentionally stuck with the systemic technical hurdles we have faced. The billing issue was a specific human and programming problem (though I think it was the culmination of some poor decisions we had made up to that point) and was out of the scope of what I was addressing.

    We did specifically discuss changes we had made to prevent a billing error of that type in the future back when the billing error occurred. You can see that here… http://blog.dreamhost.com/2008/01/16/the-aftermath/

    We have also been making some changes internally with regards to our programming methodology and management and that does deserve its own post pretty soon. I think we need to work out a few more of the details before then, but we have made some simple procedure changes that have already reduced the amount of buggy code going live.

  18. The Digital Productivity Blog Says:

    One of the things I’ve always liked about Dreamhost is the openness with which it reveals its internal operations (warts and all). I think that’s the reason why customers (including myself) have stuck with it even with all the problems through the years.

    Hopefully, those are all behind us (right guys?).

  19. Frank Church Says:

    Are we going to be seeing code in particular placed on flash hard drives. I am not sure of the economics but if you can device a system where users are reasoned with into using seperate filesystems for media files and code on faster flash drives your performance will greatly increase.

    Perhaps you can develop a file system which can detect and move media files onto different storage transparently that will be helpful