« MyBlogLog Is Looking For A Bevy Of Engineering Talent | Main | Hot Member - Director Tom »

MyBlogLog Crawlers & Bots

We have noticed a dramatic increase in the number of bots crawling MyBlogLog.com, collecting data on our members and trying to position avatars at the top of Reader Rolls over the last several weeks.

In a few cases, the number of requests from these bots have had the equivalent of a Denial of Service (DoS) attack -- not only on MyBlogLog.com but also our member sites.  This cannot continue.  If you are crawling this site, please stop.

To address this problem in the short term, we are implementing new server monitoring and will ban anyone hitting the server more than 1000 pages an hour. We will also be updating our Terms of Service to make scraping/gaming of site content against the rules. Beyond that it's for the lawyers to resolve; and nobody wants that.

That said, we have a bunch of new ways to get MyBlogLog data coming in the next several weeks.  While we can't talk about them now, please stay tuned to this blog.

Thanks,
Todd

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c54bf53ef00e0098dcc9c8833

Listed below are links to weblogs that reference MyBlogLog Crawlers & Bots:

Comments

Josh Lane

Great news, Todd. I have noticed increased bot traffic on my sites. Wonder if it is related to this issue.

Glad you guys are on top of it.

WebUrbanist

Truly too bad about the bots ... hope it all gets sorted out without having to involve lawyer-types :(

Tsewang Rinzin

Hahaa. This bots/crawlers remind of the time today when I looked into my statcounter to see where the small traffic to my blogsite originating from. And it was mostly from my MyBlogLog page.

rob

good luck with it - it's an ongoing battle, what with proxies and spoofers and the various tools these people employ to get past such stuff :(

Wendell

I hope somebody in-house is tracking MBLs struggle to create a networking / sharing hub that isn't swamped by bots, spammers and the SEO crowd. I don't know if you guys will prevail, but I think the twists and turns would make a wonderful and revealing article or book someday.

ndpthepoetress

Hey hardworking MyBlogLog Team, You are not alone in this headache! Such has also hit myspace: "if you got locked for being phished in the last hour, it's a bug! we're working on fixing it right now. sorry about that! the mechanism to detect if your account has been phished has been very powerful for stopping phishing, but it went a little haywire just now. things should be back to normal soon!" Meanwhile; for the MyBlogLog Team and all annoyed by this - here are a few banana aspirins for everyone!

JohnC

Interesting development. Due to being involved only with MyBlogLog; and only one other 'community' site, that being Fuelmyblog; I was starting to get suspicious why I've had a radical increase in spam to my site mailbox. I've had it going for over three years, and NEVER had the amount of spam that's been hitting me over the last 8 days.

Still think 1000 hits an hour is a bit high. Are those hits on Member's blog sites that pull the widget, MyBlogLog page views, or a combination of both?

Eric Marcoullier

John -- the 1,000 hits an hour is per reader, not per site. We have lots of sites pulling hundreds of thousands of page views a day and that's all good. We're referring to user who keep reloading pages and hitting every blog in the system in order to show up at the top of the reader roll.

Hope that clarifies!

JohnC

Sorry, should have been more specific.

Are the hits mentioned counted when a person visits a site/blog that includes the MBL widget, when a person visits an MBL member/community page, or are both types of visits considered hits?

Seperate issue...on the gamers, why not create a count for number of blogs an MBL member hits that have an MBL widget. If they hit 10 blogs a minute; in any two minutes out of five; that hold the MBL widget, don't allow their browser to pull the MBL widget for the next hour.

Jill Alexander

I'm really new to this. What on earth is bot traffic?

paintchip

I read about the increased number of bots all over the place. The last thing I read was about how Yahoo is sending out hundreds of spiders now. And apparently ticking off a bunch of forum owners because of the increase it has put on their servers, etc. Invisiton Power Board admin team seem to be launching some sort of campaign against Yahoo over it.
Might want to jump over there and see what they know about the subject.

Tomas

I know a little about the bot traffic, but the numbers are impressive. Wow, how it would be fine, if these clicks would transform into the responses to our posts - into the living conversation. It is so hard to stay in silence and to talk just with myself.

JohnC

Here's the problem, follow the math:

Start
MBL(Issue)=spider/unknown
If unknown=Yahoo
then
MBL(Issue)=spider/Yahoo
and
MBL(Issue)-(spider/Yahoo)=0
else
Issue=(spider/unknown)/MBL
End

If it was Yahoo, it's nothin'. If it wasn't Yahoo, then the Issue lies in MBL dividing the unknown spider into pieces.

Divide and conquer.

Early rising 101

Yes, i've noticed after embedding MBL gadget to my new blog that it added 2 more seconds of load time to my site which is A LOT! It's good you guys are catching it and striking back quickly. Keep up!

MisterSteve

Okay. I will pretend that I understand all this. Basically the problem is, bots that don't cause any problems for big sites can cause havoc with smaller sites that have less resources to handle the load. Seems like Yahoo and the like could ease off a little.

Goddess

Y'all are amazing. Knocking 'em out faster than Raid!!! LOL Keep up the great work :-) KUDOS.

Rob Beland

Boy I wish I knew what you were talking about...

Are you telling me that somebody has let a bunch of robots loose in the MyBlogLog offices???

Lance

Rob, that's hilarious! LOL

Aldon Hynes

Back on Tuesday, I wrote a blog post about the scraping of MyBlogLog that I've been doing. You can read a little bit about it on my blog, Orient Lodge,

http://www.orient-lodge.com/node/2365

Unlike the gaming that is a big issue and concern, my scraping is aimed at producing graphs of the interactions in MyBlogLog. I hope some of you have looked at my graphs and thought about their implications.

As I've noted in my blog posts, hopefully a good API will make that sort of scraping unnecessary. However, until an API is available, I would urge MyBlogLog not to completely ban scraping.

That said, the way I would approach the scraping and gaming is simply put a limit how often you get added as a 'Recent Reader'.

If someone spends less than ten or fifteen seconds looking at my site, then I don't consider them having recently read my site.

So when someone visits a site, they shouldn't get added as a recent reader if they visited a different site within the past 10 or 15 seconds.

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

Recent Readers

Follow

  • Join My Community at MyBloglog!

Search