Rails 3.2 - Sitemap Generation

March 20, 2012


There are several different gems available that can generate sitemaps and ping the search engines with them. I looked into using a couple of them, but they seemed a bit heavy weight for my simple purposes -- also the source code was over my head. So instead, I ended up following the recipe here: Sitemap Generator for Ruby on Rails Applications The code is nice and simple.

There were a few configuration changes to get it working, though... The solution is in the comments to that post, but I think there were some red herrings too, so I thought I'd post what worked for me.

First, in config/application.rb we need to add the lib directory to the autoload_paths array so rake can find sitemap.rb (per WebTempest)
config.autoload_paths += Dir["#{config.root}/lib/**/"]
Because I was getting “uninitialized constant Net::HTTP”in config/application.rb, I also had to add
require 'net/http'
just before
require 'rails/all'
Next (per Ben), to fix the "undefined method 'post_path' for Sitemap:Class" error, I added
include Rails.application.routes.url_helpers
near the top of sitemap.rb, just after
require 'builder'
Then I put this script named 'sitemap' in my app directory at webfaction
#!/bin/bash
# Generate the current sitemap and ping the search engines with it.
cd /home/{your login}/webapps/{your rails app}
export PATH=$PWD/bin:$PATH
export GEM_HOME=$PWD/gems
export RUBYLIB=$PWD/lib
cd /home/{your login}/webapps/{your rails app}/current
bundle exec rake RAILS_ENV=production sitemap:generate
echo "finished generating sitemap and pinging search engines"
Then I added this to my crontab to run the script every (Ruby) Tuesday
50 12 * * Tue /home/{your login}/webapps/{your rails app}/sitemap > $HOME/cron.log 2>&1
After running the script, I checked the 'sitemap.xml' in my public directory. Looking good! Then I looked at the tail of my production.log and found that all the search engines returned HTTPOK, except Yahoo, which returned HTTPForbidden. I'm pretty sure that Yahoo SiteExplorer is part of Bing now, so it doesn't make sense to ping it any more. I just removed the Yahoo lines from the 'update_search_engines' method in sitemap.rb.

Update

After posting this, I noticed that my sitemap.xml was getting deleted each time I deployed via Capistrano. The fix was to add a deployment step to deploy.rb and call it in the after "deploy" and after "deploy:migrations" hooks.
# Regenerate and submit the sitemap
namespace :sitemap do
  desc "Update the sitemap and resubmit"
  task :update do
    run "#{deploy_to}/sitemap"
  end
end

# only keep the last 5 deployments on the server
# also update the sitemap and create the symlink to uploads directory
after "deploy", "deploy:cleanup", "sitemap:update"
after "deploy:migrations", "deploy:cleanup", "sitemap:update"
after "deploy:finalize_update", "uploads:create_symlink"

Feedback

Your feedback is welcome! If you find any errors in this post or have any additional pointers or insights, please take a moment to register and share your thoughts.



Comments

On December 22, 2013, BashCoder wrote:

Good stuff - thanks for this. I also suggest triggering the sitemap generation just before Capistrano creates the "current/" symlink using:

before "deploy:create_symlink", "sitemap:update"

To do this, in the sitemap:update task I first change directories to the Capistrano "current_release" path. This way, when the symlink to "current" is made, the file is already there and "/public/sitemap.xml.gz" file is never missing.

Best,

- Bash



You must be logged in to comment

All Posts