AgXphoto.info
Program: Blekko Bot "scoutjet"
Symptoms: DDOS Attack; "scoutjet" in URLs in server access logs; DDOS storm IP addresses often blacklisted as "commenter spam."
Targets: Self-contained primary content like JPEG and PDFs.
Trends: 18 month old URLs (old enough to be old; new enough to be new); violates website's Terms of Use for Service and Content; over-aggressive spider.
UPDATE: We have noticed that Blekko's bots are not only in the middle of the arriving requests in our logs during this DDOS attack, but that Blekko is in the business of providing "crowdsourced" ratings of websites for its commercial search engine.
It may be that Blekko is the core participant in the DDOS; it looks like they have a financial motive for seizing content.
Fault can't be determined just from reviewing access logs, but Blekko's bot is clearly in there. So, if they are not a part of the DDOS, at the very least they are trying to scrape data during someone else's DDOS Attack.
Once again, review your files and know your system. Blekko's bot got my attention because these guys never show up to my website.
Some of the files Blekko has tried to gain access to from our website have been registered with the Library of Congress. As our faithful readers know, some of the content files have been sold for thousands of dollars.
Blekko has not attempted to contact us for use of those files in accordance with our website's Terms of Use. We have sent them an email advising them that they are coming close to receiving our bill, should they persist.
Blekko investors have made millions on promises of commercializing "crowdsourced" search engine optimization. Effectively, it's a large popularity contest in which investors get rich and content providers, like us, get paid nothing.
The Blekko bot was easily detectable in server-side access logs. Inside a swarm of IP addresses associated with "comment spammers", the Blekko bot's URL is plainly visible.
Look for "scoutjet".
The Blekko bot seemed to favor PDF copies of our original content, like equipment reviews. The PDF, once taken, would not need to be linked back to our website to be reused.
The URLs they were using were at least 18 months old, and were neutralized by our periodic system improvements.
Blekko's website features some quaint instructions for webmasters; effectively telling us that it is up to us to keep their spider off of our website. We're told to set the rate of query and to build our bots.txt file to keep it out of there. It's not our program. It's Blekko's.
If they had put even a minimal effort into actually reading our website, they would have seen our publicly posted passwords. I hand them out on business cards to actual people interested in reading the content. I don't think I gave any to Blekko.
They would have had to read the card and adjust the bot to negotiated the password challenge; which, by the way, is just as easy to program a spider to do as it is to tell a person to do.
Instead, Blekko's bot and its cloud of surround-sound ____ decided to hammer the nameserver with a 1980s war-dialing of antiquated URLs.
Blekko's sales pitch is that they know where the good stuff is.
Well, hell, we gave them the password.
ORIGINAL:
Our nameserver has been bogged down, on and off, for the past day (12 NOV 2010 to 13 NOV 2010) with a DDOS attack. Our files are still intact.
The DDOS time frame coincided with the arrival of the Blekko bot and an international comment spammer at the same time. Nothing but love for you both.
A check of our logs shows that right in the middle of this attack, unwanted and previously unknown bots from "Blekko," a search engine from http://scoutjet.com, were hammering our website with antiquated file requests.
According to their website, webmasters can limit how frequently their bot crawls our site.
How quaint.
Coincidentally, those files Blekko's bots were looking for are available on the website for people who read it. The addresses they went along with have long since been phased out, deleted, and human readers have been referred to replacement directories. It's obvious, from just looking at the URLs, that no one actually read and evaluated these directions given to the Blekko bot. They're just grabbing whatever to start up their search engine.
A recent trip to my local public library offers some good advice on how to avoid causing just this sort of problem with your bot, if you run one. I recommend the book, "Spidering Hacks" by Kevin Hemenway. Be sure to check out the subsection on not rudely hogging bandwidth. Just a suggestion.
For the meanwhile, we're going to use this blogger site; content-wise, it is close to mirror. Eventually, someone in Silly Valley will realize that Blekko's new bot crawls right around the time this large DDOS seems to be taking place. It unfortunately seems to be the Blekko bot and an international comment spammer at the same time.
We're building a better mousetrap, regardless of what happens with the nameserver.
# # #
Blekko's Twitter: http://twitter.com/blekko
Scoutjet webcrawler for Blekko: http://www.scoutjet.com/
Blekko.com: http://blekko.com/
Reference for update:
http://www.mwd.com/2010/11/bleko-launches-human-filtered-search-engine/
Title updated.
Scoutjet webcrawler for Blekko: http://www.scoutjet.com/
Blekko.com: http://blekko.com/
Reference for update:
http://www.mwd.com/2010/11/bleko-launches-human-filtered-search-engine/
Title updated.
# # #
No comments:
Post a Comment