Archive for the ‘WWW’ Category
Comments Spammer 404s Again
Monday, January 10th, 2011
Inevitably this post about 404 errors and comment spam got spam comments that broken the pattern with multiple 404s generated for one comment from IPs around the world. Nice to know that it stimulated someone to go to all that trouble. Still the usual suspects still fit the pattern. None of them got past the spam blocking anyway
Vue Search Sites Automated and Other Site Tweaks
Saturday, January 8th, 2011
Back in October I did some fiddling with the Vue sites search engine that uses Google custom search to let you search the sites in my Vue links directory. Each time the list of sites changed I had to manually upload a file listing the sites to be included that was generated from the database behind the directory. Not particularly time consuming but in theory it could be automated. I tried to set up the automation but for no readily apparent reason it just didn’t work.
In December my 404 error log checking picked up hits on one of the page that I’d made to automate this. So I republished that page and now through the magic of the Internet the automation seems to be all systems go.
I’d somehow managed to knock out the redirection of my old blogspot blog to the one integrated here. I hadn’t realised just how many visitors were coming through from there still so breaking / fixing it has had a surprising impact on the impworks visitor statistics!
I was also seeing occasional 404s on the directory page for tags that didn’t exist so I’ve thrown in a simple tag cloud page to give visitors to that page something to chew on.
Comments Spammer 404s Revisited
Tuesday, January 4th, 2011
I’ve been meaning to revisit my WordPress 404 Errors for Spam Comments but somehow it kept slipping through the cracks. I was at a bit of a loose end so I did a quick run through my spam comment log and my site’s error logs…
There were 424 spam comments to my site in December 2010. Of those 74 had an associated 404 error for the comment page that WordPress would have made had the comment been approved. All the comments and 404 errors that paired had IP addresses that matched. I’d say that suggests that the spambots are checking if they succeeded to get a comment posted rather than someone auditing to check the inbound links they’ve paid for exist.
The test for a comment page that causes the 404 error isn’t happening immediately so the bots have been coded to allow for comments being delayed. I can’t think of a reason a comment would be delayed other than moderation. Most of the comments are so blatantly spam that I can’t see most of them being approved. I did a quick sample of the ones I looked at the ones with long lists of links weren’t tested to see if they got through and the random garbage text with a couple of links were sometimes tested.
The most heavily checked spam comments were short like “A very interesting post thanks for writing it!” or “Thanks For the exelent info. I’ll be back in the future. Thanks again! [Link to drugs spam site]“. Maybe a few of the short ones like the second one might slip past someone who’s taken the drugs from the site linked to? I imagine some like the first one will get through on a high traffic blog with lots of comments to moderate.
WordPress 404 Errors for Spam Comments
Wednesday, May 5th, 2010
Ever since I rebuilt my website with WordPress and imported my blog from blogspot I’ve kept an eye on my server error logs. I like to make sure people who arrive here from the old blog or from other links end up where they were looking to go. While I’ve redirects that catch almost all of the possible routes in I want to be sure and checking the error logs gives me a way to spot any I’ve missed.
I just went through last months logs. Of the around 250 error pages served there was a rather annonymous blog page that needed redirection setting up for. It accounted for 5 of the hits. 10 were typos or calls to deleted pages. 60 were bots trying crude hack attacks on the site trying known vulnerabilities in a variety of software. That left around 170 that I’d not been able to explain. Every one a call to a comments page that when I checked didn’t exist. The address was a unique combination of a unique comment number and the page the comment had been posted on.
2006/02/more-thoughts-on-the-game-without-a-snappy-title/comment-page-1/#comment-9940
I’ve seen 404s like this for months and couldn’t work out why they were appearing.
Then it struck me what they were. Those comment pages were for comments that Akismet caught as spam. The only way (short of brute force which would show up in the logs) that someone could know the combination of the comments unique ID and the page it had been posted on was to be the original poster (or the spam system that posted it) . Those 404 pages must either be the spam bot coming back later to see if it worked or some other system running quality control before paying out for links to a site having been created.
I’ve not had a chance to match up spam messages to 404s because I keep my spam logs clear but I’m going to keep an eye on it and see if they support the idea. I’m intrigued to see how often they check a comment, if they come fromt the same IP as the spam message and how long after posting they check.
Fort Knox
Wednesday, March 3rd, 2010
I thought I’d post another satellite picture after last weeks picture of the boneyard. Being a fan of the old Bond film Goldfinger I thought Fort Knox and the United States Bullion Depository (the large building with nothing around it at the bottom of the map).
Aircraft Boneyard
Tuesday, February 23rd, 2010
I noticed that several newspapers had shots taken from Google Earth of the eleven square kilometer US Airforce 309th Aerospace Maintenance and Regeneration Group Aircraft Boneyard today so I thought I’d dig it up on Google maps to have a look myself.
Where Should Buzz Have Been?
Wednesday, February 10th, 2010
Google launching Buzz has crystalized a thought that’s been running through my head for a while. Ever since the WWW came on the scene there have been a series of landgrabs for bits of the Internet. Bits that give revenue which might be page impressions giving advertising revenue or e-commerce giving cold, hard, cash revenue. Sometimes its been the potential for revenue or the myth of potential revenue. That’s where we had the first big crash when the bubble burst.
At the moment a pile of the players are going after the social networking pie. Google Buzz, Twitter, Facebook and all the other general purpose networks are all busy trying to tie us into their set up. The problem from a users point of view is that we end up with our network fragmented depending on where our friends are. These networks are quite crude at the moment. Basically they tend to be quite private or quite public but with messages shared with everyone or just one person.
Then there are all the niche network sites built on old style forums where we meet up with people who share an interest. There are all sorts of sites which may have something you’re interested in from personal blogs by friends, news sites and everything else in between.
Keeping up with all of those takes time. RSS and Atom feeds made it easier to keep up with sites by letting us know when there was something to read rather than having to trawl over the sites every few days. TweetDeck lets you pull your social networks together in one place.
So after all the preamble here is my real thought – is the real winner down the road not going to be whichever social network grabs the most members or gets the model right but will it be the tool that we all use to keep up with Social Networks, RSS, forums and Email in one place. I don’t know if it will be a web app, something you install or a mix of both but I wish that was the Buzz Google had launched today instead of the one we got. That would seem to me to have been closer to Google’s mission to organise the world’s information.



