Blog pages and META tags
There are a number of files that you want search engines to use for finding other files but you don't want them indexing. These files are the following:
blog/blog.html blog/files/category-*.html blog/files/archive-*.html blog/files/[0-9][0-9]-\*-20[0-9][0-9].html
Assuming, of course that you've set up "blog" as your blog page. These files change too often to be of any use to a search engine when indexing. So you want to add the following meta tag to each one:
<meta name="robots" content="noindex,follow">
But you simply can't do it. If you use the blog page inspector, and set these there, they will be set for every file in the blog. Not good.
The only way to do this is to massage the files after they have been updated in your site. I use Dreamhost and they give a shell account as a standard feature. So, I can put something into the cron that runs every five minutes, and this thing that runs (through a framework I've built for this type of thing) will change the "robots" meta tag from "all" to "noindex,follow" as I'd like.
The sourcecode to the script follows, but you can also download it: fixrobots source code.
#!/usr/bin/perl -w
use strict;
use File::Find();
my $search = 'meta name="robots"';
my $replace = 'meta name="robots"
content="noindex,follow"';
my @filelist = ();
sub wanted
{
push @filelist, $File::Find::name if /^blog\.html\z/s or
/^archive-.*\.html\z/s or
/^category-.*\.html\z/s or
/^\d\d-.*-20\d\d\.html\z/s;
}
if ($#ARGV == -1)
{
File::Find::find({wanted => \&wanted},
'<my home
direcdtory>/derekwyatt.org/public/blog');
}
else
{
@filelist = @ARGV;
}
foreach my $f (@filelist)
{
open INFILE, $f;
$/ = undef;
my $file = <INFILE>;
$/ = "\n";
close INFILE;
$file =~ s/<$search.*?>/<$replace\/>/sig;
open OUTFILE, ">$f";
print OUTFILE $file;
close OUTFILE;
}
Feel free to take it, and change it to suit your tastes.
Note that it plugs in very nicely into the changetrigger framework I put together with the following plugin file (the above code being called "fixrobots"):
The sourcecode for the plugin aspect follows but you can also download it: blog_nasties source.
#!/bin/bash
CTfilelist()
{
find ~/derekwyatt.org/public/blog
\
-name
blog.html -o
\
-name
archive-\*.html -o
\
-name
category-\*.html -o
\
-name
[0-9][0-9]-\*-20[0-9][0-9].html
}
CTaction()
{
local filelist=
while getopts A:D:C: opt
do
case $opt in
A) filelist="$filelist $(<$OPTARG)"
;;
C) filelist="$filelist $(<$OPTARG)"
;;
esac
done
~/bin/fixrobots $filelist
return 3
}
Oh, and guys at RapidWeaver, please fix this bug :)