REL="nofollow" and Mailman
From WPKG | Open Source Software Deployment and Distribution
Was your Mailman mailing list ever devastated by spambots? If yes, you may consider adding REL="nofollow" attributes to all postings sent to your mailing list.
Usage:
- copy the script somewhere to your server
- edit $searchpath - point it to Mailman's "archives" directory
- if you have any domains you wish to whitelist - add them to @excludedomains
- start the script by hand - you will see which files were converted
- add a cronjob in /var/spool/mailman:
# add nofollow 48 * * * * perl /path/to/this/script/add_nofollow.pl &>/dev/null
That's it! The script is Mailman specific, but can be easily modified to add REL="nofollow" attributes to other HTML files, too.
add_nofollow.pl:
#!/usr/bin/perl # This script adds REL="nofollow" attribute to Mailman's HTML files use strict; use File::Copy; # This is a directory containing HTML files we want to add REL="nofollow" # (Mailman's "archives" directory) my $searchpath = "/srv/www/example.com/mailman/archives/private"; # Domains we want to exclude (whitelist) - don't add REL="nofollow" there my @excludedomains = (".*?example\.org", ".*?example\.com" ); my $excludedomain; my $htmlfiles; # all HTML files we find my @content; # content of a single HTML file $htmlfiles = `find $searchpath -name "[0-9]\*\.html"`; my @htmlfiles = split('\n',$htmlfiles); my $htmlfile; foreach $htmlfile (@htmlfiles) { open INPUTHTML, "<$htmlfile"; @content = <INPUTHTML>; my $newhtml; # new HTML contents my $oldhtml; # old HTML contents my $line; # each line of a HTML file foreach $line (@content) { $oldhtml .= "$line"; # This is where we do all replacing if ( $line =~ m/(.*)?(<A HREF=\"http.*)/ ) { # Remove any REL="nofollow" $line =~ s/(<A HREF=\"http)([^\"]*\")(\sREL="nofollow")/$1$2/gsmi; # Add REL="nofollow" $line =~ s/(<A HREF=\"http)([^\"]*\")/$1$2 REL="nofollow"/gsmi; # Remove REL="nofollow" from excluded domains foreach $excludedomain(@excludedomains) { $line =~ s/(<A HREF=\"($excludedomain))([^\"]*\")(\sREL="nofollow")/$1$3/gsmi; } $newhtml .= $line; } else { $newhtml .= "$line"; } } # If these variables differ, it means we added REL="nofollow" - commit it to a file if ( $oldhtml ne $newhtml ) { print "Added REL=\"nofollow\": $htmlfile\n"; open OUTPUTHTML, ">$htmlfile.tmp"; print OUTPUTHTML $newhtml; close OUTPUTHTML; move("$htmlfile.tmp", "$htmlfile"); } close(INPUTHTML); }