
Not good, decent, people like you and me, anyway.
Maybe it used to be. Maybe, back in the old days of akebono.stanford.edu and hit counters and free porn you could find an actual, true, honest-to-goodness, person on the Internet. But not anymore. Nope, not anymore.
What am I talking about?
Robots man, robots.
Just like in the future, we are living in a world of robots. Or, as they prefer to be called, “bots”. Or, as they prefer us to get used to calling them, “overlords.”

“Why do I bring this up”, you ask? Why now, when robots have been building our cars, walking on mars, and marrying our daughters for decades?
I’ll give you a clue. It has something to do with that new DreamHost PS service I mentioned a scant one post ago.
Give up?
Good, I win! Now let me explain.
One of the big reasons The only reason you’d want your own Private Server is to be isolated from other sites on your shared server. And the reason you’d want to be isolated from them is so nobody but YOU can crash your server. And the REASON sites crash their server is because they’re getting more visits than they can handle.
It seems to me though, more visits than they can handle is a hugely varying value. For some sites, just one visit is too many. For others, say, a nice static html page, there is pretty much no limit.
Nevertheless, most sites on one of our shared server, even the really poorly coded ones, really have no problem handling a few thousand visitors a day. There only start to be problems when a completely dynamic site gets tens of thousands of daily visitors.
In fact, one of the sites we used to test out DreamHost PS fell in exactly this category. It was a frequently updated, decently popular blog (and for SOME reason, blogs just can’t be static html, can they? oh nooooo….), and on an average day, it got over 10,000 unique page visits (that’s not counting images, css files, etc..).
The blog was constantly causing problems on their shared server. We had them turn on caching, but it would still spike frequently and suck bazookas of memory. I guess it was just TOO POPULAR! Imagine, tens of thousands of good old human beings reading that blog, every single day! It only made sense that a site of this magnitude would need its own private server.
In fact, judging by the amount of load we see on servers, we must host a lot of sites in the five figures of daily visitors. But something about that just didn’t sit right with me. Just from running a few of my own stupid sites, I know how hard it is just to get in the ones figure of daily visitors.
So, I decided to look at the log file from yesterday, August 9th, 2007… and low and behold.. The Internet is not for People:
Pages Percent Type 11406 100% Total Page Views 8033 70.4% Spiders.. Yahoo, Google, MSN, Ask.. including 20% mystery spiders (I assume up to no good!) 1943 17% Comment Spammers 798 7% RSS Readers and Aggregators 632 5.6% Actual Humans ©
We’re a minority out there, you and I! The Internet circa 2007 is made of robots, by robots, for robots! By rampant extrapolation, almost 95% of the page views to the entire Internet are made by machines!

However, in the end, all these machines are doing is trying to organize things a little better for us humans. It would be no fun at all to visit every website in the world each and every single time you wanted to find a picture of a monkey eating ice cream! Better to let our future omnipotent masters do the dirty work for now.
(Also! When I examined the “actual humans” visits more closely, 40% of those hits were the result of an image search.. and 35% were multiple pages by the same human. Meaning overall, only 149 different people actually visited that blog yesterday to actually view it in its intended entirety … barely 1% of the total page visits!)
All these robots cause problems though. It’s been well known since 1994 that 99.99% of the sites on the Internet get absolutely NO traffic. It’s how web hosts make money.
But now, that’s all changed! The only thing safe to say is 99.99% of the sites on the Internet get absolutely no HUMAN traffic! Every site now gets search engine spiders, feed aggregators, and spammers.. a veritable ARMY of undead automotrons! And those undead robot hits hit your site just as hard as living human hits.
A ton of times in the past, a site was crashing a shared server, and it turned out all we had to do was block Googlebot from visiting it and everything was fine. We figured that was better than just disabling the entire site, and yet sometimes we caught some crazy flak for it! Which saddened us so greatly we even made a wiki page to try and explain the situation.
It probably would have been better to just disable these sites.. it’s not like any humans were actually visiting them anyway.
After blocking GoogleBot, people then had two options:
1. Keep blocking.
2. Fix their site’s inefficiencies and un-block.
At least now our Happy Customers have a simple third option:
Just in time too, I’ve noticed my robot attack insurance premiums have been increasing recently… how strange.



August 10th, 2007 at 5:33 pm
I would absolutely love if with a DH-PS account, you would also get your own virtualized e-mail and database (MySQL) container protected from others.
That would be aweeeeeeeeeeeeeeeeeeesome.
August 10th, 2007 at 5:56 pm
Just to clarify my last post. I really don’t care to actually have shell access to the individual virtualized MySQL and e-mail partitions.
It would just be nice to have it separated from the shared users.
Does that make sense?
Meaning, if I were to buy a DH-PS account. I would actually get 3 virtual containers.
* 1 container for web, with the typical shell access / web panel etc.
* 1 for MySQL, though I really don’t need any direct access other than phpMyAdmin for MySQL from the Panel.
* 1 container for my own e-mail, though I wouldn’t need direct access to the server either. Just as long as e-mail works and I can still use the Panel to add/remove accounts.
August 10th, 2007 at 7:18 pm
I second Tim, although email is not as important to me (I host it elsewhere for my important sites anyway). Email is already on separate servers, and is not generally tied to web traffic in any real way (besides spammers loving your domain if you put up your email address in plain text on your site and they grab it).
MySQL, however, is very strongly tied to the sites with the issues, since most of them are dynamically database driven! I don’t know how the partitioning works, but I’d like to see a higher level of MySQL service with the Private Servers. I’m not sure what that looks like, whether it’s a private MySQL instance/container, or a MySQL server that has less clients as well, etc. Frankly, the specifics are probably best left to DreamHost.
None of the sites I run, most of which I haven’t moved to DreamHost yet (I just signed up recently) get anywhere near enough traffic to overload even a shared account, but I’m getting a private server anyway because they’re all WordPress sites and I’d like to ensure that if (when?!) they get extra traffic that they’ll be able to handle it and keep on sailing!
Now if only DreamHost would approve my PS :-) (And the one for the company I work for, who have their own account and sites.)
This should be an interesting solution to compare against Media Temple, who I almost went with due to their grid model (esp. the MySQL containers they offer) but actually stuck with Dream Host due to the Private Servers option and frankly, the control panel, which is better, has more features, and is more flexible at DreamHost than Media Temple right now, although Media Temple is doing a lot of things right as well.
August 10th, 2007 at 9:20 pm
Handling Googlebot overload is not an all-or-nothing situation. You can set up Google Webmaster Tools for your site, then go to the Crawl Rate tab and choose the Slower option. Google will crawl the site less frequently, reducing the load on the server without disabling crawling altogether.
P.S. Typo: “in it’s intended entirety” –> “in its intended entirety”
August 11th, 2007 at 1:59 am
On my Internet, linking to web pages designed to crash browsers without warning the reader, is considered severe asshattery.
Please avoid committing such crimes against humanity again, or I shall tell Drillowl about how you made my firefox-bin respond to nothing but “kill” after eating tons of cpu and memory for a few minutes. :P
August 11th, 2007 at 6:45 am
Thomas: oh, it really does something? I didn’t see anything happening in my Safari … but then again. Safari is different than Firefox and doesn’t probably have that kind of problem as Firefox.
August 11th, 2007 at 7:35 am
Miikka: well my Firefox suffers from the usual power user disease of being loaded up with a few extensions, which makes it perform very poorly. Extreme javascripts or tons of connections or reloading of pages makes it go belly up if it hasn’t been restarted in a while, but now that you mentioned it, I decided to try this Killer Rabbit of a page with other browsers.
So I opened the page again, now with “Konqueror, which like Safari uses KHTML rather than Gecko. The page didn’t display properly, but used some CPU, being clearly stressful and flashed around some 503 errors from the server. I also rechecked the page in another browser using Mozilla’s Gecko engine, Epiphany. Like expected, the page, like in Firefox, actually does it hideous deed of drawing red and blue bars all over the screen, forcing CPU usage to 100% instantly. A freshly started copy of Epiphany without many tabs open didn’t die as clearly as a Firefox instance used I had used for a couple of hours to browse Google Reader, Gmail, Slashdot and the local news. After a few seconds worth of delay it managed to close itself without forcing me to wave around “kill”, but I guess that it’d have become just as messy as Firefox if i had a ton of tabs open.
Conclusion: maybe it’s a good idea to not whitelist newdream.net in the controls for the lovely NoScript extension, which i normally use to block all javascript other annoyances by default. I’m still not sure if it’s a pleasant or unpleasant surprise to experience a browser killing prank by following links from a corporate blog post… ;)
August 11th, 2007 at 11:52 am
I also agree with Tim that a Private Server should come with private MySQL.
E-mail is a different story. I don’t mind e-mail being on a shared server since the dynamics of e-mail is soooooo much different than the web.
But MySQL and Web is extremely closely linked that if you get a Private Server for Web, you should also get a Private Server for MySQL.
Just my thoughts
August 12th, 2007 at 6:55 am
“Me too!” Agree with Tim and Jacob on the MySQL.
August 12th, 2007 at 9:41 am
Selling QOC sounds like a change in terms. Bait and switch selling is what this blog post is talking about without actually even knowing it! When I bought my hosting I bought it to work on THE internet. THE internet has a raft of unwritten traffic specifications that I feel dreamhost has signed up for dealing with when we made the deal. Just because DH thought they would fleece us all because no one will go to our sites does not make this an acceptable change.
August 12th, 2007 at 12:05 pm
Is this your first day on the internet?
Upgrading from shared hosting to a VPS is an option that has been around long before DH offered it. How about dedicated servers… ever hear of them?
August 12th, 2007 at 1:23 pm
I wonder… anyone know if there is a script out there, possibly for .htaccess, that will limit the number of times that googlebot can access, say, per day or per month? I know you can block multiple simultaneous connections, but can you set maximums (based on logs. maybe)?
August 12th, 2007 at 6:59 pm
Internet no está hecho para las personas…
Josh Jones ha publicado una entrada en el blog oficial de Dreamhost llamado The Internet is not for People. Cabe mencionar que las entradas de dicho blog son de todo menos normales. Josh suele decir verdades como puños… pero escondidas detrás d…
August 12th, 2007 at 8:12 pm
Does anybody believe user agent data?
August 12th, 2007 at 9:55 pm
Does anybody believe that enough people spoof their user agent to matter?
True user agent stats would show even more bot traffic over human, since it would expose the ones that ID themselves as a browser.
On the other hand, the average user isn’t spoofing anything. And the ones that spoof one browser to ID as another to get a site to display properly aren’t affecting the bot-to-human ratio.
August 12th, 2007 at 11:57 pm
> Selling QOC sounds like a change in terms.
No, it’s not.
We’ve never allowed a customer site to threaten the stability or performance of other peoples’ sites. Regardless of the reason or whether the “users” browsing a site are human or automated, if a site starts causing other people trouble, we’re going to make any changes up to taking the entire site down to stop it from doing so.
Banning a single type of bot until our customer can find a way to limit the damage is something we do instead of taking the site down.
While the occasional person may fall victim to this policy, everyone else on the server benefits. Shared host is, after all, shared hosting.
- Jeff @ DreamHost Web Hosting
August 13th, 2007 at 8:57 pm
@Mike,
> Does anybody believe that enough people spoof their user agent to matter?
Enough that some sites like Google, Harvard blogs, and others check, reject requests, and give an error about it.
> the average user isn’t
Do you have data on that? ;-)
August 13th, 2007 at 9:34 pm
People blocking users that spoof doesn’t really mean it’s a large number of people. There are sites that block people with cookies disabled, but that doesn’t mean that most people disable cookies.
You really need data to believe that the average web surfer doesn’t spoof their user agent? Start here: compare the number of IE users to Firefox users. There go the ones that need a Firefox extension to do it.
I’d be willing to bet that the average web surfer doesn’t even know what a user agent is, let alone how to spoof it. It’s useless info for someone that just wants to browse ebay, check email, etc…
Even if every single Firefox user did it, it’s still small percentage.
And like I said… spoofing it to be another browser has no effect on the accuracy of a human-to-bot ratio.
Also, note that “average web surfer” isn’t the same as “average geek browsing the FF extension sites.” ;-)
August 14th, 2007 at 12:27 am
Why would you even want to spoof your user-agent? I don’t see much advantage in that…
I think it’s good that DH block the Googlebot if it goes haywire. It might just be you who is on the same server and suffering from it.
August 14th, 2007 at 1:00 am
> Why would you even want to spoof your user-agent? I don’t
> see much advantage in that…
Some sites – banks in particular are notorious for this – deny access to “alternative” browsers based on user agent even though they would otherwise work fine. Even though one such alternative browser (Firefox) is becoming incredibly popular, some sites still haven’t seen the light.
I do think it’s kind of counter-productive unless you also send in a request for them to support other browsers, but user-agent switching will often work in a pinch.
> I think it’s good that DH block the Googlebot if it goes
> haywire. It might just be you who is on the same server and
> suffering from it.
Absolutely.
Also, understand that this is being done as a temporary measure, in order to allow a customer to make any necessary fixes (if possible) so that Googlebot doesn’t kill their site or anyone else’s. It’s not meant to be a punishment, just a way to ensure stability in the short term. We understand that Google rankings are important to people.
- Jeff @ DreamHost
August 14th, 2007 at 7:59 am
> that doesn’t mean that most people disable cookies.
Enough do to matter.
> You really need data to believe that the average web surfer doesn’t spoof
In context it’s the average blog.dreamhost.com reader that matters.
> Even if every single Firefox user did it, it’s still small percentage.
90% of Firefox users pretend to be iE users. ;-)
> Why would you even want to spoof your user-agent?
Read newspapers – Google doesn’t usually log in. :-)
Because you can. So web developers can’t assume too much. And the number 1 reason:
So Josh thinks way more than a dozen people read this blog, but we’re pretending to be bots. ;-)
August 14th, 2007 at 9:24 am
I agree with tim as well that MYSQL should be in there. cpu is great for image processing, but other than that it’s very based on mysql.
is the PS a feature or scam? if it’s a feature then it should be a little less crippled.
it would also be great to be able to choose php extensions to run on your Private server. ie.. ohhh IMAP and a slew of ones that never should have been disabled in the first place.
I know the general answer is, if you don’t like it shut up and leave, but there’s a limited choice of other good- hosting, so it would be nice if dreamhost simply filled in a few of the holes that really aren’t tough to fix at all…
it’s a pain to have to move sites and domains once you’re established simply becuase a few extentions are missing.
PS is the perfect palce for those to be, it’s sort of a ‘duh’ thing, you pay for the extra cpu, it would be nice to be able to use it for something
“hey I just got DH PS!”
ya, what can you do with it?
“…. I can spit out ‘for loops’ at like… 700mhz!…”
…neat buddy… that’s uh… neat.
August 14th, 2007 at 9:25 am
I mean extentions that should have been NOT disabled in the first place…. ahh, grammer errors on a rant. great. I’m an idiot.
August 14th, 2007 at 11:10 am
thanks for linking a page meant to crash your browser =(
but shouldn’t that be, crash your browser IF you have javascript enabled??
also, btw, despite severely lagging it did not crash my browser ;)
August 14th, 2007 at 11:17 am
Next thing you know, the robots will be leaving messages behind like “all your bases belong to us”…
August 15th, 2007 at 10:36 am
Off-topic, but DreamHost Status is returning “Error establishing a database connection”, the Control Panel is not responding, outgoing mail is broken, etc etc.
This blog and its comments appear to be the only working channel of communication between DreamHost and customers right now.
August 15th, 2007 at 10:40 am
I am having the same problems as Kevin today: can’t connect to my web sites, FTP, webmail (and normal e-mail?), Dreamhost control panel, or Dreamhost server status page. I received a “too many connections” error for webmail at the start of the problems, but now everything keep timing out, without any appropriate error message.
August 15th, 2007 at 10:42 am
Its their database servers all stopped responding.
One static page shows data, but anything that may touch MySQL is dead.
August 15th, 2007 at 10:48 am
http://www.dreamhoststatus.com/ is now returning a 404 Not Found.
Please spend a few dollars a month to host that page with a DIFFERENT web hosting company, so it is likely to work when your customers want to know why things are broken and when they are expected to come back.
Then we won’t feel compelled to pollute your blog with off-topic comments (like this one :-) ).
August 15th, 2007 at 10:51 am
There is some really wacky stuff going on, all the sites do not work, and even DreamHost Status which was in a separate location than everything else (or used to be) is having multiple problems.
MySQL was down.
Then everything worked.
Then the site returns a 404 error…
Oh okay, now it says:
“DreamHost is currently experiencing some networking difficulties which is causing parts of our network to be inaccessible.
We are working hard to fix the problem right now and will have an update for you soon.”
August 15th, 2007 at 11:00 am
Dreamhost is turning into Nightmarehost. Please, please, please repair these problems! Servce interruption is so frequent that we are considering changing companies.
August 15th, 2007 at 11:47 am
Holy crap.. you linked to thehun.net… HAHAHA.. awesome.
August 15th, 2007 at 11:57 am
Nearly 2 (known) hours of downtime now, and still no way to contact support to find out server status – or anything else about the problem.
What’s the deal, guys?
I second Kevin’s suggestion … have another avenue of communication available so your customers can feel informed about the status of your resolution to any problems.
August 15th, 2007 at 12:47 pm
Most Edutaining Corporate Blog…
I only cottoned to it recently, but judging by the archives Dreamhost’s blog has been around for at least a year. Whether they’re talking about a legendary company kickball game, revealing some of the spam complaint calls they get, or taking on the i…
August 16th, 2007 at 9:46 am
[...] “The Internet is not for People” my current shared hosting provider tells the [...]
August 16th, 2007 at 8:08 pm
Good blog!
This morning, I found a ton of FTP error mails in our inbox, seems the webserver has still having hangovers… Hope this is going to be fixed soon, because we are losing members for our site, http://www.mypacis.com/ , a social network with a social agenda: promiting peace by linking people.
Btw, if you wanna blog about this, that would make a difference, and make-up for the lost members during downtime. I understand you cannot blog about each one of the 500k and going sites hosted on Dreamhost, but if you approve our mission, it is something you can at least consider to do.
Thanks!
August 17th, 2007 at 11:48 am
“Btw, if you wanna blog about this, that would make a difference, and make-up for the lost members during downtime.”
Methinks this is the wrong spirit to have behind what you claim your main goal to be. I am a member of a site with a similar mission, but I would not feel very good about them were they to attempt to strong arm with guilt. Might I suggest you build yourself a sweatlodge and rethink a few things? May The Force be with you.
August 18th, 2007 at 6:12 pm
I am a robot. Yes-hu. I find this thread funny. Funny HAHA! Pathetic water-beasts. Cower! I’ll go ahead and thrash your puny site with 800 requests.
per second.
My user-agent name is.. Jeff! RAWRRRR!
August 18th, 2007 at 7:24 pm
The above thread should be filed under “humor”, not “international terroristic threats, not humor”, fyi: water-beasts.
August 28th, 2007 at 1:37 pm
I cant believe you put up a link to The Hun. Now thats Funny!
September 5th, 2007 at 8:27 am
[...] Eat THAT, commenters. [...]
November 3rd, 2007 at 2:20 am
It’s scary – nothing more – nothing less.