3 Immediate Benefits of Google Analytics for Business Owners

2021-04-30T00:00:00+00:00

Image from Google’s marketing platform blog

Where is your traffic coming from? What drew the traffic to your website? Which parts of your website are most visited? How do visits change over time? And how can the answers to these questions help you?

Answering such questions and doing something about it is called search engine optimization (SEO).

To help you with this is Google Analytics, a web analytics service that lets you track and understand your website traffic. It is a valuable tool for businesses of all sizes that are looking to grow.

Here are three ways Google Analytics can benefit your business:

Determining Site Improvements to Strengthen Website Flow

This is a great way to generate more “conversions” — visitors to your website taking a desired action. Are visitors behaving the way you expected them to? Can you observe any bottlenecks in audience flow?

Bottlenecks include traffic getting stuck on one page, when you want them to be going to a different one, like a contact page. Understanding how traffic gets stuck might point you toward the need to refresh certain web pages, which could in turn lead to more conversions.

For example, we observed that our “Deployment Automation” Expertise subpage has had a 100% bounce rate over the past three months. This is concerning because it means that the content may not be engaging or there may not be a clear visitor navigation path, the end goal being a contact submission. Analytics helped us start looking at how to strengthen this subpage.

Image from Google’s marketing platform blog

Understanding your Audience

Who is coming to your site, and how are they finding you? What referral sites, partner sites, media, and blog posts are directing the most traffic to your page? How can you leverage that?

In reviewing your inbound traffic, you will see some combination of the following types of traffic:

Direct: Traffic from directly typing the URL into the browser address bar.
Organic: Traffic from people who navigate to your website through search engines after seeing you in search results. Having a strong online presence, especially strong SEO, will help more visitors arrive on your website without the need to pay for them.
Referral: Traffic that comes to your website after being “referred” from a different website. This is when other websites link to your webpage. More backlinks and referral traffic typically leads to significant SEO benefits.
Paid: This traffic arrives from paid search campaigns on platforms such as Google Ads.
Email: Traffic from links in emails.
Social: Traffic that comes from posts on social media networks like Facebook, LinkedIn, and Twitter.
Other: All the traffic which doesn’t fit in any other category.

We recommend reviewing each type of traffic to get a better understanding of their flow through your website, and noting any trends you find within individual web traffic sources and mediums.

Data-Driven Decision Making: Stop Relying on Assumptions and Rely on Data

One great challenge to businesses is overconfidence in how much you understand about your audience. Google Analytics, and other similar analytics tools, can transform your work culture from being based on opinions and assumptions to being based on hard data. Google Analytics provides data in an organized and impactful format, and using analytics data in tandem with sales efforts can lead to more conversions and revenue for your business.

Alternatives

With Google having access to so much data and being one of the two major advertisers on the web, many people are looking for alternatives that allow them more control over their customer data, separation from Google’s advertising platforms, and a slimmer data footprint for compliance with privacy laws such as CCPA (California) and GDPR (European Union).

There have always been various options for web visitor analytics. Google Analytics was originally created by a company called Urchin Software, which Google acquired in 2005. Some current alternatives include:

Cloudflare web analytics, a new service offered by the popular CDN (Content Distribution Network) that simply shows visitor data already flowing through their systems.
GoatCounter, a SaaS or self-hosted open source application, which aims to provide simple counters rather than collecting personal data, thus avoiding any need for a privacy notice.
Matomo, formerly known as Piwik, a fully-featured SaaS or on-premises paid package with a limited open source version.
Open Web Analytics, a customizable open source analytics framework.

We at End Point have found success with these core ideas and several of these services. We are happy to provide a free consultation to discuss your website needs.

Web Projects for a Rainy Day

2020-03-25T00:00:00+00:00

Image by Yellowstone NPS on Flickr

With the COVID-19 quarantine disrupting life for many of us, I thought I’d put together a list of things you can do with your website on a rainy day. These are things to keep your business moving even if you’re at home and some of your projects are stuck waiting on things to reopen. If you’re looking for some useful things to do to fill your days over the next few months, this post is for you!

Major Version Updates

Make a list of your entire stack, from OS to database to development frameworks. Note the current version and research the current supported versions. I find Wikipedia pages to be fairly reliable for this (e.g. en.wikipedia.org/wiki/CentOS). Ok, so what things need to be updated, or will need to be in the next year? Start on those now and use some downtime to get ahead of your updates.

Sample of a client’s stack review

Software	Purpose	Our version	Release date	End of support	Next update	Newest version	Notes
CentOS	OS for e-commerce server	7	July 2014	June 2024	Not imminent	8	https://wiki.centos.org/About/Product
Nginx	Web server	1.16.0	March 2020	Unclear	Not imminent	1.16.1	https://nginx.org/
PostgreSQL	Database server	9.5.20	January 2016	Feb 2020	Medium term, to version 11	12	https://www.postgresql.org/support/versioning/
Rails	App framework for store	5.1	February 2017	Current	Long Term, to version 6	6	https://rubygems.org/gems/rails/versions
Spree	Ecommerce and admin gem	3.3	April 2017	Current	Long Term, to version 4	4	https://rubygems.org/gems/spree/versions
Elasticsearch	Search platform for product import/search	5.6.x	September 2017	March 2019	Immediate, to version 6.8	7.4	https://www.elastic.co/support/eol
WordPress	Info site	5.2.3	September 2019	Unclear	5.2.4 shipped recently	5.2	https://codex.wordpress.org/Supported_Versions

Content Cleanup & SEO Review

Everyone’s website gets cluttered with outdated content. Take a look at your pages, review, and update what needs to be changed. Pay attention to search engine optimization (SEO) concerns as you go through it. Make sure your content has headers, accurate keywords, and good meta-descriptions. Research SEO best practices if you need a refresher.

Nowadays, reducing repeated content has huge benefits for SEO so we recommend any content review includes a review of duplication. If you have a small site, you can go through your content and SEO manually. Larger projects can utilize tools on the market such as Siteliner or WPOptimize.

While you’re taking a dive into content, don’t forget to review your Google Analytics and understand what content is being used and what isn’t. Google has added many new features to Analytics and Ads, so it’s a good idea to refresh yourself on the updated documentation and new features.

Reporting

A lot of clients with big ecommerce data sets or other applications that collect data benefit from a separate reporting or business analytics tool. A rainy day can be a good time to think about what reports you want on last year’s business, what data will help you plan for the future. End Point has worked with a few different reporting tools that easily add on to your database, like Pentaho or Jasper and those can be really useful.

Documentation

I wouldn’t be a good project manager if I didn’t throw this one in the list. Documentation is so, so important, yet really we can always do more. End Point uses a few different tools, including wikis running MediaWiki and Google Docs, for keeping track of project details. Now’s a good time to set up a nice documentation system or do a big review and make sure everything is updated and back in order. Maybe dream of a vacation you might be able to take when this is over and make sure everything’s ready for you to do that.

Disaster Recovery Tests

For anyone with business-critical infrastructure, you need to ensure you know how to get everything back up and running in the case of a major failure, either with on-premises or cloud hosting. Now’s a good time to clarify with your hosting vendor things like: What are your backups like? What is your disaster recovery plan? What is the timeline for recovering the application in the event of a major failure? If you can, take time to do a simulation and make sure all the pieces are there if they need to be. Simply said, we also need to test our backups in order to ensure that they actually work.

Redesign

If you’ve been meaning to refresh your website, a rainy day is prime time to do it. Designers and developers are looking for projects and you’ll have extra time on your hands to oversee the process, spend time reviewing and testing, and get things done just the way you want.

Automated Testing

Good developers want an automated test suite as part of their application. Not all applications were built with this from the beginning and many didn’t have the budget or time to get it done. With extra time on your hands, this can be a great time to start building your test suite or to improve the coverage of your existing one.

Unit tests in particular are a good place to start. Unit tests are great not only because they help validate software correctness and protect against regressions, but also because they require a well-factored and modular system. This means that, while writing your unit tests, you will often be forced to go back to your application’s code base and refactor it to make it testable, to make it better. Investing in creating a solid unit test suite is a great bang for your buck. You can also look at implementing continuous integration—having a pipeline to let multiple developers deploy code throughout the day and configure your automated tests into the workflow.

Versioning & Deployment Tools

When you’re cleaning house, take a look at your Git version control repository and make sure everything important is in there. We have a few projects that have a main project in Git, but sometimes the smaller projects and one-offs can go astray. This is a good time to get everything organized into one repository, or make sure external repositories are connected and integrated.

Automated DevOps deployment tools can also be nice to work on. Tools like Ansible and Chef can take a lot of time to set up and test, but they have some great time-saving and accuracy advantages down the line. Our in-house security experts also recommend tools like AIDE and OSSEC which automate monitoring file changes daily.

Security Audit and Monitoring

Taking a few days to review your personal security and that of your application is something you should do regularly and now’s a good time to plan for it. Charlie’s got a great security post that’s a good top-level review. For application security, End Point uses some tools for vulnerability scanning. We also have a checklist of basic security items that include password handling, PII data, and other common security holes. For certain projects/clients we must also take HIPAA or PCI DSS compliance into account. Also, don’t neglect to review your TLS status and ensure that web applications run on TLS 1.2 and are TLS 1.3 ready. This also may relate to the underlying operating systems—whether they are able to support the latest TLS version natively.

Optimization and Performance:

Most of the time new features have higher priority than improving the performance of an existing system. It could be the right time to review core functionalities and list out the areas that need improvement in serving a better experience to customers by optimization. The areas can be focused on optimizing code, database queries, image size, data compression over network, adding cache, CDN, and so on. We’ve been moving quite a few clients to the Cloudflare DNS and CDN service and we’ve been really happy with it. Optimization work will definitely influence the customer retention rate which helps to increase profitability long term.

Refactoring

Along the same lines as optimization, code refactoring can have long term gains in performance and ease of future development. Think of this like house cleaning. It is always easier to find any item in the house when things are arranged in an orderly manner. Similarly, the organized and clean code base will play a vital role in future code changes and development, helping to reduce the chances of unexpected bugs, save time making changes at one place and improving code readability. Disciplined refactoring delivers readable, reusable, non-redundant code. Refactoring can be applied to your databases and user interfaces as well.

Want to get started on some background projects for your website? Talk to us today.

Decreasing your website load time

2020-01-07T00:00:00+00:00

Photo by Johan Larsson, used under CC BY 2.0

We live in a competitive world, and the web is no different. Improving latency issues is crucial to any Search Engine Optimization (SEO) strategy, increasing the website’s ranking and organic traffic (visitors from search engines) as a result.

There are many factors that can lead to a faster response time, including optimization of your hosting plan, server proximity to your main traffic source, or utilization of a Content Distribution Network (CDN) if you are expecting visitors on an international level. Some of these solutions and many others can be implemented with only a couple hours of coding.

Inline styles and scripts for the topmost content

Nobody enjoys waiting for long load times. When opening a Google search link, being met with a blank page or a loading GIF for several seconds can seem agonizing. That’s why optimizing the initial rendering of your page is crucial.

The content that immediately appears to the user without the need to scroll down is referred to as “above-the-fold”. This is where your optimization efforts should be aimed. So here’s a plan to load and display as quickly as possible:

First, differentiate the critical styles and scripts you need to render the topmost content, and separate them from the rest of our stylesheet and external script references.
Then, minify the separated styles and scripts, and insert them directly on our page template, right before the closing </head> tag.
Finally, take the stylesheet and scripts link references from the <head> tag (where it’s usually located) and move them to the end of the above-the-fold content.

Now, the user won’t have to wait until all references are loaded before seeing content. Tip: Remember to use the async tag on scripts whenever possible.

example.html:

<head>
    <style>{above-the-fold minified inline styles goes here}</style>
    <script type="text/javascript">{above-the-fold critical scripts goes here}</script>
</head>
<body>
    <div class="above-the-fold-content"></div>
    <link rel="stylesheet" href="{below-the-fold minified stylesheet reference goes here}" />
    <script async src="{below-the-fold minified javascript reference goes here}" />
    <div class="below-the-fold-content"></div>
</body>

Deferred loading of ads

If you’re monetizing your website through Google AdSense or another ad agency that uses scripts to load ads, consider loading ads after the content is fully rendered. This may have a small impact on your revenue, but will improve the user’s experience while optimizing the load speed.

Although there are several ways to achieve this, a technique I have successfully used on many websites is removing all of the script references to Google AdSense until your page is fully loaded. A short delay can be added in order to allow some browsing time before showing ads.

Remove script references, the comment, and extra spaces from your original ad code, to convert it from something like this…

<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Your ad name -->
<ins class="adsbygoogle"
     style="display:inline-block;width:728px;height:90px"
     data-ad-client="ca-pub-XXXXXXXXXXXXXXXXX"
     data-ad-slot="XXXXXXXXX"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script>

… to something like this:

<ins class="adsbygoogle" style="display:inline-block;width:728px;height:90px" data-ad-client="ca-pub-XXXXXXXXXXXXXXXXX" data-ad-slot="XXXXXXXXX"></ins>

A lot shorter, isn’t it? This will create an empty slot in which the ad will be displayed after the page is fully rendered. To accomplish that, a new script like the one below must be added (assuming jQuery is present on the website):

async-ads.js:

// Create a script reference
function addScript(src, async, callback) {
    var js = document.createElement("script");
    js.type = "text/javascript";
    if (async)
        js.async = true;
    if (callback)
        js.onload = callback;
    js.src = src;
    document.body.appendChild(js);
}

// Called when document is ready
$(document).ready(function() {

    // Wait for one second to ensure the user started browsing
    setTimeout(function() {
        (adsbygoogle = window.adsbygoogle || []);
        $("ins.adsbygoogle").each(function() {
            $("<script>(adsbygoogle = window.adsbygoogle || []).push({})</script>").insertAfter($(this));
        });
        addScript("https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js", true);
    }, 1000);

});

This code will wait for one second once the document is ready, and then leave instructions for Google to push a new ad for each slot. Finally, the AdSense external script will be loaded so that Google will read the instructions and start filling all the slots with ads.

Tip: Enabling balancing from your AdSense dashboard may improve the average load speed as well as the user’s experience since ads will not be shown when the expected revenue is deprecable. And if you’re still on the fence about showing fewer ads, try out an experiment like I did. A balance of 50% worked well in my case, but the right balance will depend on your niche and website characteristics.

Lazy load for images

Because the user will most likely spend the majority of the visit reading above-the-fold content (and may even leave before scrolling at all), loading all images from content below-the-fold at first is impractical. Implementing a custom lazy-loading script (also referred to as deferred-loading or loading-on-scroll) for images can be an easy process. Even though changes to the backend would be likely, the concept of this approach is simple:

Replacing the src attributes from all images that will have lazy loading with a custom attribute such as data-src (this part will probably require backend changes) and set a custom class for them, like lazy.
Creating a script that will copy the data-src content into the src attribute as we scroll through the page.

lazy-load.js:

;(function($) {

    $.fn.lazy = function(threshold, callback) {
        var $w = $(window),
        th = threshold || 0,
        attrib = "data-src",
        images = this,
        loaded;
        this.one("lazy", function() {
            var source = this.getAttribute(attrib);
            source = source || this.getAttribute("data-src");
            if (source) {
                this.setAttribute("src", source);
                if (typeof callback === "function") callback.call(this);
            }
        });

        function lazy() {
            var inview = images.filter(function() {
                var $e = $(this);
                if ($e.is(":hidden")) return;
                var wt = $w.scrollTop(),
                wb = wt + $w.height(),
                et = $e.offset().top,
                eb = et + $e.height();
                return eb >= wt - th && et <= wb + th;
            });
            loaded = inview.trigger("lazy");
            images = images.not(loaded);
        }

        $w.scroll(lazy);
        $w.resize(lazy);
        lazy();
        return this;
    };

})(window.jQuery);

$(document).ready(function() {
    $('.lazy').each(function () {
        $(this).lazy(0, function() {
        $(this).load(function() {
            this.style.opacity = 1;
        });
    });
});

// Set the correct attribute when printing
var beforePrint = function() {
    $("img.lazy").each(function() {
        $(this).trigger("lazy");
        this.style.opacity = 1;
    });
};
if (window.matchMedia) {
    var mediaQueryList = window.matchMedia('print');
    mediaQueryList.addListener(function(mql) {
        if (mql.matches)
            beforePrint();
    });
}
window.onbeforeprint = beforePrint;

This script will search for all <img> tags with class lazy, and change the data-src attribute to the src attribute once the image becomes visible due to scrolling. It also includes some additional logic to set the src attribute before printing the page.

Server-side caching

Instead of performing all the backend rendering calculations every time, server-side caching allows you to output the same content to the clients over a period of time from a temporary copy of the response. This not only results in a decreased response time but also saves some resources on the server.

There are several ways to enable server-side caching, depending on factors such as the backend language and hosting platform (e.g. Windows/IIS vs. Linux/Apache), among other things. For this example, we will use ASP.NET (C#) since I’m mostly a Windows user.

The best and most efficient way to do this is by adding a declaration in the top of our ASP.NET page:

<%@ OutputCache Duration="10" VaryByParam="id;date" %>

This declaration is telling the compiler that we want to cache the output from the server for 10 minutes, and we will save different versions based on the id and date URL parameters. So pages like:

https://www.your-url.com/cached-page/?id=1&date=2020-01-01
https://www.your-url.com/cached-page/?id=2&date=2020-01-01
https://www.your-url.com/cached-page/?id=2&date=2020-02-01

will be saved and then served from different cache copies. If we only set the id parameter as a source for caching, pages with different dates will be served from the same cache source (this can be useful as the date parameter is only evaluated on frontend scripts and ignored in the backend).

There are other configurations in ASP.NET to set our output cache policy. The output can be set to be based on the browser, the request headers, or even custom strings. This page has more useful information on this subject.

GZip compression

GZip compression—when the client supports it—allows compressing the response before sending it over the network. In this way, more than 70% of the bandwidth can be saved when loading the website. Enabling GZip compression for dynamic and static content on a Windows Server with IIS is simple: Just go to the “Compression” section on the IIS Manager and check the options “Enable dynamic/static content compression”.

However, if you are running an ASP.NET MVC/WebForms website, this won’t be enough. For all backend responses to be compressed before sending them to the client, some custom code will also need to be added to the global.asax file in the website root:

global.asax:

<%@ Application Language="C#" %>

<script runat="server">

    void Application_PreRequestHandlerExecute(object sender, EventArgs e)
    {
        HttpApplication app = sender as HttpApplication;
        string acceptEncoding = app.Request.Headers["Accept-Encoding"];
        System.IO.Stream prevUncompressedStream = app.Response.Filter;

        if (app.Context.CurrentHandler == null)
            return;

        if (!(app.Context.CurrentHandler is System.Web.UI.Page ||
            app.Context.CurrentHandler.GetType().Name == "SyncSessionlessHandler") ||
            app.Request["HTTP_X_MICROSOFTAJAX"] != null)
            return;

        if (acceptEncoding == null || acceptEncoding.Length == 0)
            return;

        if (Request.ServerVariables["SCRIPT_NAME"].ToLower().Contains(".axd")) return;
        if (Request.ServerVariables["SCRIPT_NAME"].ToLower().Contains(".js")) return;
        if (Request.QueryString.ToString().Contains("_TSM_HiddenField_")) return;

        acceptEncoding = acceptEncoding.ToLower();

        if (acceptEncoding.Contains("deflate") || acceptEncoding == "*")
        {
            app.Response.Filter = new System.IO.Compression.DeflateStream(prevUncompressedStream,
                System.IO.Compression.CompressionMode.Compress);
            app.Response.AppendHeader("Content-Encoding", "deflate");
        }
        else if (acceptEncoding.Contains("gzip"))
        {
            app.Response.Filter = new System.IO.Compression.GZipStream(prevUncompressedStream,
                System.IO.Compression.CompressionMode.Compress);
            app.Response.AppendHeader("Content-Encoding", "gzip");
        }
    }

</script>

To make sure our code is working properly, an external tool like this will inform you if GZip is enabled or not.

Summary

While there are many ways of decreasing the load time of a website, most are common and expensive. However, with a few minor tweaks, we can offer a better user experience in addition to improve our position in the search engine results. Every bit of optimization counts towards the goal with SEO. Load time is a very important factor (to both the developer and the user), especially on mobile platforms where users expect to get what they want instantly.

The image below is a Google Analytics report from one of my websites where, over several months, I implemented most of these formulas. A month ago, I made the latest change of deferring ad loading, which had an observable impact on the average loading speed of the page:

Do you have any other page load optimization techniques? Leave a comment below!

Designing for SEO from the Start

2018-03-28T00:00:00+00:00

Search engine optimization (SEO) is critical to the success of your website, and therefore critical to the success of your business. A high Google ranking means more page views—and more conversions. Google rewards websites that are user-friendly and easy-to-navigate, with fresh content and frequent updates. At End Point, we design with SEO in mind from the beginning of the project to help you get the most value from your online presence.

These are the five main areas that we focus on to improve your ranking:

Content Strategy—You want to provide your users with quality content. We can provide a content strategy to help ensure that your site stays on target, with clear copy, focused messaging, and consistent branding. We’ll help you put together a site that gets visitors where they want to be—and you’ll reap the rewards with an increase in traffic to your site.
Sitemap, Keywords, and Semantic Markup—We dive into the nuts and bolts of your site to make sure that it can be crawled and indexed easily by Google. We produce a prioritized XML sitemap, relevant, long-tail keywords and metadata, descriptive page headings, titles, and URLs. Your site’s code will utilize HTML5 semantic markup to produce a well-structured, hierarchical document that will be easily read by web crawlers—and humans, too.
Blogging and Social Media—Keeping your website fresh with new links and content is critical for SEO success. Setting up a blog platform makes it easy to quickly generate new content that can draw more visitors to your site. We can also connect you with social media platforms such as Twitter and Facebook to help carry your messages across the web.
Analytics—Google Analytics allows you to watch in real time how users are finding and interacting with your site. Figure out which pages perform the best, which are underperforming, identify trends in user behavior, and calibrate your site to help users get the most out of their experience.
Mobile Support—With just over half of all web traffic coming from mobile devices, it’s crucial to have a website that’s responsive on any device. Because of the ubiquity of mobile browsing, Google rewards sites that are mobile friendly. We optimize your site to ensure that pages load quickly and display on any laptop, tablet, or smartphone.

There’s no big secret to a strong SEO strategy. It’s all about having engaging, useful content that’s easy to find and navigate. End Point has the knowledge and resources to help your business grow by building you a site that’s clear and focused, and one that’s up to the latest web standards.

Here are a few guides that provide more details about what goes into a strong SEO strategy:

Improve SEO URLs for Interchange search pages

2016-01-27T00:00:00+00:00

This is an article aimed at beginner-to-intermediate Interchange developers.

A typical approach to a hierarchical Interchange site is:

Categories -> Category -> Product

I.e., you list all your categories as links, each of which opens up a search results page filtering the products by category, with links to the individual product pages via the flypage.

Recently I upgraded a site so the category URLs were a bit more SEO-friendly. The original category filtering search produced these lovely specimens:

/search.html?fi=products&st=db&co=1&sf=category&se=Shoes&op=rm
   &sf=inactive&se=yes&op=ne&tf=category&ml=100

but what I really wanted was:

/cat/Shoes.html

Such links are easier to communicate to users, more friendly to search engines, less prone to breakage (e.g., by getting word-wrapped in email clients), and avoid exposing details of your application (here, we’ve had to admit publicly that we have a table called “products” and that some items are “inactive”; a curious user might decide to see what happens if they change “sf=inactive&se=yes” to some other expression).

Here’s how I attacked this.

Creating a category listing page

First, I copied my “results.html” page to “catpage.html”. That way, my original search results page can continue to serve up ad hoc search results.

The search results were displayed via:

[search-region]
...
[/search-region]

I converted this to a database query:

[query sql="SELECT * FROM products WHERE NOT inactive AND category = [sql-quote][cgi category][/sql-quote]"
 type=list prefix=item]
...
[/query]

I chose to use a prefix other than the default since it would avoid having to change so many tags in the page, and now both the original search page and new catpage would look much the same internally (and thus, if desired, I could refactor them in the future).

Note that I’ve defined part of the API for this page: the category to be searched is set in a CGI variable called “category”.

In my specific case, there was additional tinkering with this tag, because I had nested [query] tags already in the page within the search-region.

Creating a “cat” actionmap

In order to translate a URL containing SEO-friendly “/cat/Shoes.html” into my search, I need an actionmap. Here’s mine; it’s very simple.

Actionmap cat <<"CODE"
sub {
  my $url = shift;
  my @url_parts = split '/' => $url;
  shift @url_parts if $url_parts[0] eq 'cat';

  $CGI->{mv_nextpage} = 'catpage.html';
  $CGI->{category} = shift @url_parts;
  return 1;
}
CODE

Actionmaps are called when Interchange detects that a URL begins with the actionmap’s name; here “cat”. They are passed a parameter containing the URL fragment (after removing all the site stuff). Here, that would be (e.g.) “/cat/Shoes”. We massage the URL to get our category code, and set up the page to be called along with the CGI parameter(s) it expects.

Cleaning up the links

At the start of this article I noted that I may have a page listing all my categories. In my original setup, this generated links using a construction like this:

<a href="[area href=search form=|fi=products
         st=db
         sf=category
         se=Shoes
         tf=category
         ml=100|]">
  Shoes
</a>

Now my links are the much simpler:

<a href="[area cat/Shoes]">Shoes</a>

In my specific case, these links were generated within a [query] loop, but the approach is the same.

Note: the Strap demo supports SEO-friendly URLs out-of-the-box, and that it is included with the latest Interchange 5.10 release.

Google Sitemap rapid deployment

2013-03-21T00:00:00+00:00

I was going to call this “Quick and Dirty Sitemaps”, but “Rapid Deployment” sounds more buzz-word-worthy. This is how to get a Google sitemap up and running quickly, using the Google sitemap generator and the Web Developer Firefox plug-in.

I had occasion to set up a sitemap using the Google sitemap generator for a site recently. Here’s what I did:

Download the generator using the excellent documentation found at the previous link. Unpack it into a convenient location and copy the example_config.xml file to something else, e.g., www.mysite.com_config.xml. Edit the new configuration file and:

Modify the “base_url” setting to your site;
Change the “store_into” setting to a file in your site’s document root;
Add a pointer to a file that will contain your list-of-links, e.g.,

<urllist path="site_urls.txt">
</urllist>

I would locate this in the same path as your new configuration file.

Now, if you don’t already have Web Developer, give yourself a demerit and go install it.

…

Okay, you’ll thank me for that. Now pick a few pages from your site: good choices, depending on your site’s design, are the home page, the sitemap (if you have one), and any of the top-level “nav links” you may have set up.

Visit each of those pages in turn. Use Web Developer to assemble the links from the pages, clicking:

Tools menu
Web Developer extension
Information
View link information

Copy and paste each informational list-of-links and append it to a text file. You can clean it up a bit when you are done, removing any links you don’t want in the sitemap, or you can let the sitemap generator tell you which ones to remove while testing.

You can sort and de-duplicate the file with something like this:

$ sort site_urls.txt | uniq > site_urls.out

Inspect the site_urls.out file and when you’re happy with it, rename it to “site_urls.txt”.

You’re ready to run the sitemap generator:

$ python sitemap_gen.py --config=www.mysite.com_config.xml --testing

Check the output for warnings, adjust the configuration and/or the site_urls.txt file, and eventually you can run this without the –testing flag. Now you just need to add it to a crontab where it will be run appropriately, and you’re done!

Slash URL

2012-12-04T00:00:00+00:00

There’s always more to learn in this job. Today I learned that Apache web server is smarter than me.

A typical SEO-friendly solution to Interchange pre-defined searches (item categories, manufacturer lists, etc.) is to put together a URL that includes the search parameter, but looks like a hierarchical URL:

/accessories/Mens-Briefs.html

/manufacturer/Hanes.html

Through the magic of actionmaps, we can serve up a search results page that looks for products which match on the “accessories” or “manufacturer” field. The problem comes when a less-savvy person adds a field value that includes a slash:

accessories: “Socks/Hosiery”

manufacturer: “Disney/Pixar”

Within my actionmap Perl code, I wanted to redirect some URLs to the canonical actionmap page (because we were trying to short-circuit a crazy Web spider, but that’s beside the point). So I ended up (after several wild goose chases) with:

my $new_path = '/accessories/' .
   Vend::Tags->filter({body => (join '%2f' => (grep { /\D/ } @path)),
       op => 'urlencode', }) .
   '.html';

By this I mean: I put together my path out of my selected elements, joined them with a URL-encoded slash character (%2f), and then further URL-encoded the result. This was counter-intuitive, but as you can see at the first link in this article, it’s necessary because Apache is smarter than you. Well, than me anyway.

An Introduction to Google Website Optimizer

2012-04-17T00:00:00+00:00

On End Point’s website, Jon and I recently discussed testing out use of Google Website Optimizer to run a few A/B tests on content and various website updates. I’ve worked with a couple of clients who use Google Website Optimizer, but I’ve never installed it from start to finish. Here are a few basic notes that I made during the process.

What’s the Point?

Before I get into the technical details of the implementation, I’ll give a quick summary of why you would want to A/B test something. A basic A/B test will test user experiences of content A versus content B. The goal is to decide which of the two (content A or content B) leads to higher conversion (or higher user interactivity that indirectly leads to conversion). After testing, one would continue to use the higher converting content. An example of this in ecommerce may be product titles or descriptions.

A/B tests in Google Website Optimizer

I jumped right into the Google Website Optimizer sign-up and wanted to set up a simple A/B test to test variations on our home page content. Unfortunately, I found right away that basic A/B tests in Google Website optimizer require two different URLs to test. In test A, the user would be see index.html, and in test B, the user would see index_alt.html. This is unfortunate because for SEO and technical reasons, I didn’t want to create an alternative index page.


Test A: Keep existing text	Test B: Remove some paragraph text in first section

Multivariate Testing

Rather than implement a basic A/B tests in Google Website optimizer, I decided to implement a multivariate test with just the two options (A and B). The basic setup required this:

Copy provided JavaScript into my test page just above </head>
Wrap <script>utmx_section("stephs_test")</script>…</noscript> around the section of text that will be modified by the test.
Copy provided JavaScript into my converting test page just above </head>

Google Website Optimizer will verify the code and provide a user interface to enter the variations of the test text.

I used multivariate testing to test the homepage changes described above. After a couple of weeks of testing, my test results were inconclusive:

Example multivariate test result in Google Website Optimizer

A Limitation with Multivariate Testing

One thing we wanted to test was a site-wide CSS change. Unfortunately, the multivariate testing in place is designed to test on page content only rather than global CSS changes. You could potentially come up with a “creative” hack and set a cookie inside the variation to specify which layout option you would use. And then the page would always look at that cookie while rendering to apply the special CSS behavior. However, this requires a bit of customization and development.

Interchange Search Caching with “Permanent More”

2012-01-02T00:00:00+00:00

Most sites that use Interchange take advantage of Interchange’s “more lists”. These are built-in tools that support an Interchange “search” (either the search/scan action, or result of direct SQL via [query]) to make it very easy to paginate results. Under the hood, the more list is a drill-in to a cached “search object”, so each page brings back a slice from the cache of the original search. There are extensive ways to modify the look and behavior of more lists and, with a bit of effort, they can be configured to meet design requirements.

Where more lists tend to fall short, however, is with respect to SEO. There are two primary SEO deficiencies that get business stakeholders’ attention:

There is little control over the construction of the URLs for more lists. They leverage the scan actionmap and contain a hash key for the search object and numeric data to identify the slice and page location. They possess no intrinsic value in identifying the content they reference.
The search cache by default is ephemeral and session-specific. This means all those results beyond page 1 the search engine has cataloged will result in dead links for search users who try to land directly on the more-listed pages.

It is the latter issue that I wish to address because there is—and has been for some time now—a simple mechanism called “permanent more” to remedy the default behavior.

You can leverage “permanent more” by adding the boolean mv_more_permanent, or the shorthand pm, to your search conditions. E.g.:

Link:

    <a href="[area search="
        co=1
        sf=category
        se=Foo
        op=rm
        more=1
        ml=5
        <b>pm=1</b>
    "]">All Foos</a>

Loop:

    [loop search="
        co=1
        sf=category
        se=Foo
        op=rm
        more=1
        ml=5
        <b>pm=1</b>
    "]
    ...loop body with [more-list]...
    [/loop]

Query:

    [query
        list=1
        more=1
        ml=10
        <b>pm=1</b>
        sql="SELECT * FROM products WHERE category LIKE '%Foo%'"
    ]
    ...same as loop but with 10 matches/page...
    [/query]

If the initial search is defined with the “permanent more” setting, it will produce the following adjustments:

The hash key used to store and identify the search cache is deterministic based on the search conditions. Many searches for Interchange are category driven. Thus, all end users who wish to browse a category end up clicking identical links, which create duplicate search caches, belonging uniquely to them. With permanent more, they all share the same cache, with the same identifier. As long as the search conditions don’t change, neither does the cache identifier. Even as the cache is refreshed with new executions of the search, the object remains in the same location. Thus, the results a search engine produced this morning reference links still valid now, tomorrow, or next week, provided they reference the same search conditions.
The cached search object has no session affinity. Any link referencing the cache with the correct hash key has access to the content.

Taken together, “permanent more” removes (for the most part, addressed later) dead links from more lists cataloged by search engines. There are, however, other benefits to “permanent more” beyond those intended as described above:

As stated in passing, standard Interchange search caching produces duplicate search objects for common search conditions. For a busy site, these caches can have an impact on storage. Typically, maintenance is implemented to clean up cache files for all such files whose age exceeds by some amount the session duration (standard is 48 hours). With permanent more, duplicate caches are eliminated. A cache location is reused by all users with the same search requirements, keeping data-storage requirements for caches to the minimum necessary. As searches change, ophaned caches can still easily be cleaned up as they will immediately start to age with no more access to them necessary for storage.
For the same reason that “permanent more” resolves search-engine links, it also resolves content management for individual sites using a reverse proxy for caching. Because most (and certainly the easiest) caching keys are based off of URL, the deterministic nature of the hash keys for “permanent more” allows assurance that the cached content in the proxy accurately reflects the search content over time, and that all users will hit the cached resource and not generate new, unique links with varying hash keys.

One shortcoming of “permanent more” to be aware of is the impact of changing data underneath the search. Even if search conditions do not change, the count and order of matching record sets may. So, e.g., enough products may be removed from a given category to cause the last page of a more list to become empty, which would cause any specific link into that page to become dead. More minor, but still a possibility, is the introduction or removal of products so that a particularly searched-for term has been “bumped” to another page within the search cache since the last time the search engine crawled the more lists. For searches backed by particularly volatile data, “permanent more” may not be sufficient to address search-engine or caching demands.

Finally, “permanent more” should be avoided for any search features that may cache data sensitive to an individual user. This is unlikely to happen as, under most circumstances, the configuration of the search itself will change based on the unique characteristics of the user executing the search (e.g., a username included in a query to review order history). However, it is still possible that context-sensitive information could be stored in the search object and, if so, all other users with access to the more lists would have access to that information.

SEO friendly redirects in Interchange

2010-10-15T00:00:00+00:00

In the past, I’ve had a few Interchange clients that would like the ability to be able to have their site do a SEO friendly 301 redirect to a new page for different reasons. It could be because either a product had gone out of stock and wasn’t going to return or they completely reworked their url structures to be more SEO friendly and wanted the link juice to transfer to the new URLs. The normal way to handle this kind of request is to set up a bunch of Apache rewrite rules.

There were a few issues with going that route. The main issue is that to add or remove rules would mean that we would have to restart or reload Apache every time a change was made. The clients don’t normally have the access to do this so it meant they would have to contact me to do it. Another issue was that they also don’t have the access to modify the Apache virtual host file to add and remove rules so again, they would have to contact me to do it. To avoid the editing issue, we could have put the rules in a .htaccess file and allow them to modify it that way, but this can present its own challenges because some text editors and FTP clients don’t handle hidden files very well. The other issue is that even though overall basic rewrite rules are pretty easy to copy, paste and reuse, they still can have nasty side effects if not done properly and can also be difficult to troubleshoot so I devised a way to allow them to be able to manage their 301 redirects using a simple database table and Interchange’s Autoload directive.

The database table is a very simple table with two fields. I called them old_url and new_url with the primary key being old_url. The Autoload directive accepts a list of subroutines as its arguments so this requires us to create two different GlobalSubs. One to actually do the redirect and one to check the database and see if we need to redirect. The redirect sub is really straight forward and looks like this:

sub redirect {
   my ($url, $status) = @_;
   $status ||= 302;
   $Vend::StatusLine = qq|Status: $status moved\nLocation: $url\n|;
   $::Pragma->{download} = 1;
   my $body = '';
   ::response($body);
   $Vend::Sent = 1;
   return 1;
}

The code for the sub that checks to see if we need to redirect looks like this:

sub redirect_old_links {
   my $db = Vend::Data::database_exists_ref('page_redirects');
   my $dbh = $db->dbh();
   my $current_url = $::Tag->env({ arg => "REQUEST_URI" });
   my $normal_server = $::Variable->{NORMAL_SERVER};
   if ( ! exists $::Scratch->{redirects} ) {
       my $sth = $dbh->prepare(q{select * from page_redirects});
       my $rc  = $sth->execute();
       while ( my ($old,$new) = $sth->fetchrow_array() ) {
           $::Scratch->{redirects}{"$old"} = $new;
       }
       $sth->finish();
   }
   if ( exists $::Scratch->{redirects}  ) {
       if ( exists $::Scratch->{redirects}{"$current_url"} ) {
           my $path = $normal_server.$::Scratch->{redirects}{"$current_url"};
           my $Sub = Vend::Subs->new;
           $Sub->redirect($path, '301');
           return;
       } else {
          return;
       }
   }
}

We normally create these as two different files and put them into our own directory structure under the Interchange directory called custom/GlobalSub and then add this, include custom/GlobalSub/*.sub, to the interchange.cfg file to make sure they get loaded when Interchange restarts. After those files are loaded, you’ll need to tell the catalog that you want it to Autoload this subroutine and to do that you use the Autoload directive in your catalog.cfg file like this:

Autoload redirect_old_links

After modifying your catalog.cfg file, you will need to reload your catalog to ensure to change takes effect. Once these things are in place, you should just be able to add data into the page_redirects table and start a new session and it will redirect you properly. When I was working on the system, I just created an entry that redirected /cgi-bin/vlink/redirect_test.html to /cgi-bin/vlink/index.html so I could ensure that it was redirecting me properly.

More Code and SEO with the Google Analytics API

2010-02-22T00:00:00+00:00

My latest blog article inspiration came from an SEOmoz pro webinar on Actionable Analytics. This time around, I wrote the article and it was published on SEOmoz’s YOUmoz Blog and I thought I’d summarize and extend the article here with some technical details more appealing to our audience. The article is titled Visualizing Keyword Data with the Google Analytics API.

In the article, I discuss and show examples of how the number of unique keywords receiving search traffic has diversified or expanded over time and that our SEO efforts (including writing blog articles) are likely resulting in this diversification of keywords. Some snapshots from the articles:

[The unique keyword (keywords receiving at least one search visit) count per month (top) compared to the number of articles available on our blog at that time (bottom).]

I also briefly examined how unique keywords receiving at least one visit overlapped between each month and saw about 10-20% of overlapping keywords (likely the short-tail of SEO).

[The keyword overlap per month, where the keywords receiving at least one visit in consecutive months are shown in the overlap section.]

Now, on to things that End Point’s audience may find more interesting. Something that might appeal more to our developer-types is the code written to use the Google Analytics API to generate the data used for this article. I researched a bit and tried writing my own ruby code (gem-less) to pull from the Google API, followed by using the Gattica gem, and finally the garb gem. After wrestling with the former two options, I settled on the garb gem, which had decent documentation here to get me up and running with a Google Analytics report quickly. Here’s an example of the code required to create your first Google Analytics API report:

#!/usr/bin/ruby

require 'rubygems'
require 'garb'

# set email, password, profile_id
Garb::Session.login(email, password)
profile = Garb::Profile.first(profile_id)

report = Garb::Report.new(profile,
        :limit => 100,
        :start_date => Date.today - 30,
        :end_date => Date.today)
report.dimensions :keyword
report.metrics :visits
report.results.each do |result|
  puts "#{result.keyword}:#{result.visits}"
end

If you aren’t familiar with the Google Analtyics API, possible dimensions and metrics are documented here. There are some Google Analytics API limitations on metric and dimension combinations, but I think if you get creative you’d be able to overcome most of those limitations (assuming you won’t be exceeding the limit of 1,000 API requests per day).

Why should you care about the Google Analytics API? Well, the API allowed me to programmatically aggregate the keyword counts in monthly increments for the SEOmoz article. One thing I consider to be pretty lame is the inability to select more than 3 custom segments and exclude the “All Visits” segment to allow a better visual comparison of the segments. In the data below, I have 3 defined custom segments. I would prefer to compare about 10 custom segments of End Point’s blog keyword groupings (e.g., “Rails Keywords”, “Postgres Keywords”), but Google Analytics limits the selected segments and includes “All Visits” when you select more than one custom segment.

Another thing I consider to be lame is the inability to merge Google Analytics profiles. Recently, End Point combined its corporate blog GA profile with its main website GA profile to better track conversion between the sites:

[Dead metrics from migrated profile.]

With the Google Analytics API, we could compute different aggregates of data, compare more than a few custom data segments, and combine two google profiles if they have merged. Of course, these things wouldn’t necessarily be easy, but working with the gem proved to be simple, so in theory this all could be done and in the meantime we’ll keep our dead profile around.

Again, please read the original article here if you are interested. :)

Blog versus Forum, Blogger versus WordPress in Ecommerce

2010-01-25T00:00:00+00:00

Today, Chris sent me an email with two questions for one of our ecommerce clients:

For ecommerce client A, should a forum or blog be added?
For ecommerce client A, should the client use Blogger or WordPress if they add a blog?

These are relevant questions to all of our clients because forums and blogs can provide value to a static site or ecommerce site. I answered Chris’ question and thought I’d expand on it a bit for a brief article.

First, a rundown comparing the pros and cons of blog versus forum:

	Blog	Forum
Pros	Content tends to be more organized. Content can be built to be more targeted for search. Content can be syndicated easily.	There can be much more content because users are contributing content. Since there is more user generated content, it has the potential to cover more of the long tail in search. There is more potential for user involvement and encouragement to build and contribute to a community.
Cons	User generated content will remain minimal if comments are the only form of user generated content in a blog. If internal staff is responsible for authoring content, you can’t write as much content as users can contribute.	A forum requires management to prevent user spam. A forum requires organization to maintain usability and search engine friendliness.

If we assume that it takes the same amount of effort to write articles as it does to manage user generated content, the decision comes down to whether or not you want to utilize user contributions as part of the content. If the effort involved to write content or manage user generated content is different, a decision should be made based on how much effort the site owners want to make. Other opportunities for user generated content include product reviews and user QnA.

Next, a rundown comparing the pros and cons of Blogger versus self-hosted WordPress:


Pros	There are a decent amount of widgets available to integrate into a Blogger instance. Fast Google indexing of content may result since the content is hosted by Google. There is decent search implementation on Blogger. A Blogger instance is very easy to create and easy to use.	There is a very large feature set available through the WordPress plugin community. Self-hosted WordPress blogs are relatively easy to set up. Many hosting platforms include WordPress installation and setup at the click of a button. WordPress gives you control over the URL structure (articles, categories, tags) through permalinks. Self-hosted WordPress can live at www.yoursite.com/blog/ which can strengthen your domain value in search through external links. WordPress has a very flexible taxonomy system.
Cons	The Blogger taxonomy system is limited (using labels) and labels pages are blocked in robots.txt to reduce indexation and search traffic of the label pages. Blogger does not allow for a flexible URL structure. Once an article is published and a title is changed, the URL does not change. Developers must be familiar with the Blogger template language to customize the template. With Blogger, a blog can’t be hosted at http://www.yoursite.com/blog/. It can be hosted at http://blog.yoursite.com/. While this results in a strong subdomain, it does not strengthen your domain for search through external links to the blog.	Self-hosted WordPress requires your own hosting, setup and installation. Self-hosted WordPress requires management of upgrades and plugins. Plugins may require code changes to the template files. Self-hosted WordPress allows you to select existing themes, but you must be familiar with the WordPress template structure if you want a custom blog look.

The decision to create a Blogger blog or install a WordPress blog will depend on resources such as engineering or designer involvement. A self-hosted blog solution will likely provide a larger feature set and more flexibility, but it also requires more time to enhance, manage and maintain the software. A hosted blog solution such as Blogger will be easy to set up and maintain, but has disadvantages because it is a less flexible solution. I didn’t discuss a WordPress-hosted solution because I’m not very familiar with this type of setup, however, I believe the WordPress-hosted solution limits the use of plugins and themes.

For our ecommerce clients, installing a self-hosted WordPress instance on top of their Spree or Interchange ecommerce site has been relatively simple. For another one of our clients, we developed a Radiant plugin to integrate Blogger article links into their site, which has worked well to fit their needs.

SEO 2010 Trends and Strategies

2010-01-22T00:00:00+00:00

Yesterday I attended SEOmoz’s webinar titled “SEO Strategies for 2010”. Some interesting factoids, comments and resources for SEO in 2010 were presented that I thought I’d highlight:

Mobile browser search
- Mobile search and ecommerce will be a large area of growth in 2010.
- Google Webmaster Tools allows you to submit mobile sitemaps, which can help battle duplicate content between non-mobile and mobile versions of site content. Another way to handle duplicate content would be to write semantic HTML that allows sites to serve non-mobile and mobile CSS.
Social Media: Real Time Search
- Real time search marked its presence in 2009. The involvement of Twitter in search is evolving.
- Tracking and monitoring on URL shortening services should be set up to measure traffic and benefit from Twitter.
- Dan Zarrella published research on The Science of Retweeting. This is an interesting resource with fascinating statistics on retweets.
Social Media: Facebook’s Dominance
- Recent research by comScore has shown that 5.5% of all time on the web is spent in Facebook.
- Facebook has very affordable advertising. Facebook has so much demographic and psychographic data that allows sites to deliver very targeted advertisements.
- Facebook shouldn’t be ignored as a potential business network, but metrics should be put in place to determine the value it brings.
Social Media: Shifting LinkGraph
- In the past, sites received links from blog resources which became a factor in the site’s popularity rankings in search. Now, linking has shifted to microblogging such as twitter or other social media platforms. Some folks are stingier about passing links through sites rather than social media. It’s interesting to observe how links and information is passed through the web and consider how this can affect search.

Bing
- Despite the fact that Google is responsible for a large percentage of search, Bing shouldn’t be ignored.
- Bing has shown some differences in ranking such as being less sensitive to TLDs (.info, .cc, .net, etc.), and giving more weight to sites with keywords in the domain than other search engines.
Other
- Personalized search is on the rise. This is something to pay attention to, but hard to measure.
- QDF (query deserves freshness), a search factor related to the freshness of content, has led to search engines indexing content faster. 2010 search strategies recommend becoming a news source to improve search performance.
- Local search is definitely something to be aware of in 2010. Google’s Place Rank algorithm is similar to the PageRank algorithm—it looks at specific location or local attributes as a factor in local search.

I found that a trend of the discussion revolved around having good metrics, not just good metrics, but the right metrics such as conversion and engagement. Testing any of the recommendations above (improving your mobile browsing, getting involved in social media, optimizing for Bing) should be measured against conversion to determine the value of the efforts. Also, multivariate or A/B testing were recommended for testing local search optimization and other efforts.

Content Syndication, SEO, and the rel canonical Tag

2009-12-17T00:00:00+00:00

End Point Blog Content Syndication

The past couple weeks, I’ve been discussing if content syndication of our blog negatively affects our search traffic with Jon. Since the blog’s inception, full articles have been syndicated by OSNews. The last couple weeks, I’ve been keeping an eye on the effects of content syndication on search to determine what (if any) negative effects we experience.

By my observations, immediately after we publish an article, the article is indexed by Google and is near the top search results for a search with keywords similar to the article’s title. The next day, OSNews syndication of the article shows up in the same keyword search, and our article disappears from the search results. Then, several days later, our article is ahead of OSNews as if Google’s algorithm has determined the original source of the content. I’ve provided visual representation of this behavior:

With content syndication of our blog articles, there is a several day lag where Google treats our blog article as the duplicate content and returns the OSNews article in search results for a search similar to our the blog article’s title. After this lag time, the OSNews article is treated as duplicate content and our article is shown in the search results.

During the lag time, a search for “google pages indexed seo”, an article I published last Thursday, the OSNews article is shown at search position #5.

After the lag time, a search for “google pages indexed seo” returned the original End Point blog article to search position #2.

Several other factors have influenced the lag time, but typically I’ve seen very similar behavior.

End Point’s content syndication has only been an issue with blog articles, since the majority of our new content comes in the form of blog articles. Examples of content syndication in the ecommerce space may include:

inner-company content syndication of products across sister sites. For example, our client Backcountry.com sells outdoor gear, while their site RealCyclist targets the road biking niche of the outdoor gear industry. Cycling products sold on both sites and may compete directly for search engine traffic.
syndication of product information through affiliate programs like Commission Junction and AvantLink. Affiliates are paid a small portion of the sales and may target traffic by building supplementary content or communities around content provided by ecommerce sites through the affiliate program.

Cross-Domain rel=canonical Tag

I’ve been planning to write this article and with impeccable timing, Google announced support for the rel=canonical tag across different domains this week. I’ve referenced the use of the rel=canonical tag in two articles (PubCon 2009 Takeaways, Search Engine Thoughts), but I haven’t gone too much into depth about its use. Support of the rel=canonical tag was introduced early this year as a method to help decrease duplicate content across a single domain. A non-canonical URL that includes this tag suggests its canonical URL to search engines. Search engines then use this suggestion in their algorithms and results to reduce the effects of duplicate content.

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />

With the cross-domain rel=canonical support announcement, the rel=canonical tag presents another tool to battle duplicate content from content syndication across domains.

Back to Content Syndication

The point of my investigation was to identify whether or not content syndication to OSNews negatively affects our search traffic. The data above suggests that after the brief lag time, Google’s algorithm sorts out the source of the original content. The value of exposure, referral traffic, and link juice from OSNews outweighs lost search traffic during this lag time.

In the example of similar product content across backcountry.com’s sites, using the rel=canonical tag across domains would allow backcountry.com to suggest prioritization of same product URLs for search results. This may be a valuable tool for directing search traffic to the desired domain.

In the example of content syndication across sites that are not owned by the same company, the use of the rel=canonical tag is more complicated. If the goals of the site that grabs content are to compete directly for search traffic, they would likely not want to use the canonical tag. However, if the goal of the site that grabs content is to focus on search traffic from aggregate content or by building a community around the valuable content, they may be more willing to implement the cross-domain rel=canonical tag to point to the original source of the content. In the case of affiliate programs, I believe it will be difficult to negotiate the cross-domain rel=canonical tag use into existing or future contracts.

The takeaways:

Content syndication of our blog does not cause negative long term effects on search. This should be monitored for sites that may have much different behavior than the data I provided above.
The announcement of support of the cross-domain rel=canonical tag may be helpful for battling duplicate content across sites, especially to sites owned by the same company.
The use of the cross-domain rel=canonical tag in affiliate programs or through sites owned by different companies will be trickier to negotiate.

List Google Pages Indexed for SEO: Two Step How To

2009-12-11T00:00:00+00:00

Whenever I work on SEO reports, I often start by looking at pages indexed in Google. I just want a simple list of the URLs indexed by the GOOG. I usually use this list to get a general idea of navigation, look for duplicate content, and examine initial counts of different types of pages indexed.

Yesterday, I finally got around to figuring out a command line solution to generate this desired indexation list. Here’s how to use the command line using http://www.endpoint.com/ as an example:

Step 1

Grab the search results using the “site:” operator and make sure you run an advanced search that shows 100 results. The URL will look something like: https://www.google.com/search?num=100&as_sitesearch=www.endpoint.com

But it will likely have lots of other query parameters of lesser importance [to us]. Save the search results page as search.html.

Step 2

Run the following command:

sed 's/<h3 class="r">/\n/g; s/class="l"/LINK\n/g' search.html | grep LINK | sed 's/<a href="\|" LINK//g'

There you have it. Interestingly enough, the order of pages can be an indicator of which pages rank well. Typically, pages with higher PageRank will be near the top, although I have seen some strange exceptions. End Point’s indexed pages:

http://www.endpoint.com/
http://www.endpoint.com/clients
http://www.endpoint.com/team
http://www.endpoint.com/services
http://www.endpoint.com/sitemap
http://www.endpoint.com/contact
http://www.endpoint.com/team/selena_deckelmann
http://www.endpoint.com/team/josh_tolley
http://www.endpoint.com/team/steph_powell
http://www.endpoint.com/team/ethan_rowe
http://www.endpoint.com/team/greg_sabino_mullane
http://www.endpoint.com/team/mark_johnson
http://www.endpoint.com/team/jeff_boes
http://www.endpoint.com/team/ron_phipps
http://www.endpoint.com/team/david_christensen
http://www.endpoint.com/team/carl_bailey
http://www.endpoint.com/services/spree
...

For the site I examined yesterday, I saved the pages as one.html, two.html, three.html and four.html because the site had about 350 results. I wrote a simple script to concatenate all the results:

#!/bin/bash

rm results.txt

for ARG in $*
do
        sed 's/<h3 class="r">/\n/g; s/class="l"/LINK\n/g' $ARG | grep LINK | sed 's/<a href="\|" LINK//g' >> results.txt
done

And I called the script above with:

./list_google_index.sh one.html two.html three.html four.html

This solution isn’t scalable nor is it particularly elegant. But it’s good for a quick and dirty list of pages indexed by the GOOG. I’ve worked with the WWW::Google::PageRank module before and there are restrictions on API request limits and frequency, so I would highly advise against writing a script that makes requests to Google repeatedly. I’ll likely use the script described above for sites with less than 1000 pages indexed. There may be other solutions out there to list pages indexed by Google, but as I said, I was going for a quick and dirty approach.

Remember not to get eaten by the Google Monster

WordPress Plugin for Omniture SiteCatalyst

2009-11-18T00:00:00+00:00

A couple of months ago, I integrated Omniture SiteCatalyst into an Interchange site for one of End Point’s clients, CityPass. Shortly after, the client added a blog to their site, which is a standalone WordPress instance that runs separately from the Interchange ecommerce application. I was asked to add SiteCatalyst tracking to the blog.

I’ve had some experience with WordPress plugin development, and I thought this was a great opportunity to develop a plugin to abstract the SiteCatalyst code from the WordPress theme. I was surprised that there were limited Omniture WordPress plugins available, so I’d like to share my experiences through a brief tutorial for building a WordPress plugin to integrate Omniture SiteCatalyst.

First, I created the base wordpress file to append the code near the footer of the wordpress theme. This file must live in the ~/wp-content/plugins/ directory. I named the file omniture.php.

  <?php /*
    Plugin Name: SiteCatalyst for WordPress
    Plugin URI: https://www.endpointdev.com/
    Version: 1.0
    Author: Steph Powell
    */
    function omniture_tag() {
    }
    add_action('wp_footer', 'omniture_tag');
  ?>

In the code above, the wp_footer is a specific WordPress hook that runs just before the tag. Next, I added the base Omniture code inside the omniture_tag function:

...

function omniture_tag() {
?>
<script type="text/javascript">
<!-- var s_account = 'omniture_account_id'; -->
</script>
<script type="text/javascript" src="/path/to/s_code.js"></script>
<script type="text/javascript"><!--
s.pageName='' //page name
s.channel='' //channel
s.pageType='' //page type
s.prop1='' //traffic variable 1
s.prop2='' //traffic variable 2
s.prop3='' //traffic variable 3
s.prop4= '' //traffic variable 4
s.prop5= '' //traffic variable 5
s.campaign= '' //campaign variable
s.state= '' //user state
s.zip= '' //user zip
s.events= '' //user events
s.products= '' //user products
s.purchaseID= '' //purchase ID
s.eVar1= '' //conversion variable 1
s.eVar2= '' //conversion variable 2
s.eVar3= '' //conversion variable 3
s.eVar4= '' //conversion variable 4
s.eVar5= '' //conversion variable 5
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
var s_code=s.t();if(s_code)document.write(s_code)
--></script>
<?php
}

...

To test the footer hook, I activated the plugin in the WordPress admin. A blog refresh should yield the Omniture code (with no variables defined) near the tag of the source code.

After verifying that the code was correctly appended near the footer in the source code, I determined how to track the WordPress traffic in SiteCatalyst. For our client, the traffic was to be divided into the home page, static page, articles, tag pages, category pages and archive pages. The Omniture variables pageName, channel, pageType, prop1, prop2, and prop3 were modified to track these pages. Existing WordPress functions is_home, is_page, is_single, is_category, is_tag, is_month, the_title, get_the_category, the_title, single_cat_title, single_tag_title, the_date were used.

...

<script type="text/javascript"><!--
<?php
if(is_home()) {    //WordPress functionality to check if page is home page
        $pageName = $channel = $pageType = $prop1 = 'Blog Home';
} elseif (is_page()) {    //WordPress functionality to check if page is static page
        $pageName = $channel = the_title('', '', false);
        $pageType = $prop1 = 'Static Page';
} elseif (is_single()) { //WordPress functionality to check if page is article
        $categories = get_the_category();
        $pageName = $prop2 = the_title('', '', false);
        $channel = $categories[0]->name;
        $pageType = $prop1 = 'Article';
} elseif (is_category()) {    //WordPress functionality to check if page is category page
        $pageName = $channel = single_cat_title('', false);
        $pageName = 'Category: ' . $pageName;
        $pageType = $prop1 = 'Category';
} elseif (is_tag()) {     //WordPress functionality to check if page is tag page
        $pageName = $channel = single_tag_title('', false);
        $pageType = $prop1 = 'Tag';
} elseif (is_month()) {     //WordPress functionality to check if page is month page
        list($month, $year) = split(' ', the_date('F Y', '', '', false));
        $pageName = 'Month Archive: ' . $month . ' ' . $year;
        $channel = $pageType = $prop1 = 'Month Archive';
        $prop2 = $year;
        $prop3 = $month;
}
echo "s.pageName = '$pageName' //page name\n";
echo "s.channel = '$channel' //channel\n";
echo "s.pageType = '$pageType'  //page type\n";
echo "s.prop1 = '$prop1' //traffic variable 1\n";
echo "s.prop2 = '$prop2' //traffic variable 2\n";
echo "s.prop3 = '$prop3' //traffic variable 3\n";
?>
s.prop4 = '' //traffic variable 4

...

The plugin allows you to freely switch between WordPress themes without having to manage the SiteCatalyst code and to track the basic WordPress page hierarchy. Here are example outputs of the SiteCatalyst variables broken down by page type:

Homepage

s.pageName = 'Blog Home' //page name
s.channel = 'Blog Home' //channel
s.pageType = 'Blog Home'  //page type
s.prop1 = 'Blog Home' //traffic variable 1
s.prop2 = '' //traffic variable 2
s.prop3 = '' //traffic variable 3

Tag Page

s.pageName = 'chocolate' //page name
s.channel = 'chocolate' //channel
s.pageType = 'Tag'  //page type
s.prop1 = 'Tag' //traffic variable 1
s.prop2 = '' //traffic variable 2
s.prop3 = '' //traffic variable 3

Category Page

s.pageName = 'Category: Food' //page name
s.channel = 'Food' //channel
s.pageType = 'Category'  //page type
s.prop1 = 'Category' //traffic variable 1
s.prop2 = '' //traffic variable 2
s.prop3 = '' //traffic variable 3

Static Page

s.pageName = 'About' //page name
s.channel = 'About' //channel
s.pageType = 'Static Page'  //page type
s.prop1 = 'Static Page' //traffic variable 1
s.prop2 = '' //traffic variable 2
s.prop3 = '' //traffic variable 3

Article

s.pageName = 'Hello world!' //page name
s.channel = 'Test Category' //channel
s.pageType = 'Article'  //page type
s.prop1 = 'Article' //traffic variable 1
s.prop2 = 'Hello world!' //traffic variable 2
s.prop3 = '' //traffic variable 3

A followup step to this plugin would be to use the wp_options table in WordPress to manage the Omniture account id, which would allow admin to set the Omniture account id through the WordPress admin without editing the plugin code. I’ve uploaded the plugin to a GitHub repository here.

Update: This plugin is included in the WordPress plugin registry and can be found at https://wordpress.org/extend/plugins/omniture-sitecatalyst-tracking/.

PubCon Vegas: 7 Takeaway Nuggets

2009-11-16T00:00:00+00:00

I’m back at work after last week’s PubCon Vegas. I published several articles about specific sessions, but I wanted to provide some nuggets on recurring themes of the conference.

Google Caffeine Update

This year Google rolled out some changes referred to as the Google Caffeine update. This change increases the speed and size of the index, moves Google search to real-time, and improves search results relevancy and accuracy. It was a popular topic at the conference, however, not much light was shed on how algorithm changes would affect your search results, if at all. I’ll have to keep an eye on this to see if there are any significant changes in End Point’s search performance.

Bing

Bing is gaining traction. They want to get [at least] 51% of the search market share.

Social media was a hot topic at the conference. An entire track was allocated to Twitter topics on the first day of the conference. However, it still pales in comparison to search. Of all referrals on the web, search still accounts for 98% and social media referrals only account for less than 1% (view referral data here). Dr. Pete from SEOmoz nicely summarized the elephant in the room at PubCon regarding social media that it’s important to measure social media response to determine if it provides business value.

Ecommerce Advice

I asked Rob Snell, author of Starting a Yahoo Business for Dummies, for the most important advice for ecommerce SEO he could provide. He explained the importance of content development and link building to target keywords based on keyword conversion. Basically, SEO efforts shouldn’t be wasted on keywords that don’t convert well. I typically don’t have access to client keyword conversion data, but this is great advice.

Internal SEO Processes

Another recurring topic I observed at PubCon was that often internal SEO processes are a much bigger obstacle than the actual SEO work. It’s important to get the entire team on your side. Alex Bennert of Wall Street Journal discussed understanding your audience when presenting SEO. Here are some examples of appropriate topics for a given audience:

IT Folks: sitemaps, duplicate content (parameter issues, pagination, sorting, crawl allocation, dev servers), canonical link elements, 301 redirects, intuitive link structure
Biz Dev & Marketing Folks: syndication of content, evaluation of vendor products & integration, assessing SEO value and link equity of partner sites, microsites, leveraging multiple assets
Content Developers: on page elements best practices, linking, anchor text best practices, keyword research, keyword trends, analytics
Management: progress, timelines, roadmaps

On the topic of internal processes, I was entertained by the various comments expressing the developer-marketer relationship, for example:

“Don’t ever let a developer control your URL structure.”
“Don’t ever let a developer control your site architecture.”
“This site looks like it was designed by a developer.”

Apparently developers are the most obvious scapegoat. Back to the point, though: It often requires more effort to get SEO understanding and support than actually explaining what needs to be done.

Search Engine Spam

Search engine spam detection is cool. During a couple of sessions with Matt Cutts, I became interested in writing code to detect search spam. For example:

Crawling the web to detect links where the anchor text is ‘.’.
Crawling the web to identify sites where robots.txt blocks ia_archiver.
Crawling the web to detect pages with keyword stuffing.

I’ve typically been involved in the technical side of SEO (duplicate content, indexation, crawlability), and haven’t been involved in link building or content development, but these discussions provoked me to start looking at search spam from an engineer’s perspective.

Google Parameter Masking

Apparently I missed the announcement of parameter masking in Google Webmaster Tools. I’ve helped battle duplicate content for several clients, and at PubCon I heard about parameter masking provided in Google Webmaster Tools. This functionality was announced in October of 2009 and allows you to provide suggestions to the crawler to ignore specific query parameters.

Parameter masking is yet another solution to managing duplicate content in addition to the rel=“canonical” tag, creative uses of robots.txt, and the nofollow tag. The ideal solution for SEO would be to build a site architecture that doesn’t require the use of any of these solutions. However, as developers we have all experienced how legacy code persists and sometimes a low effort-high return solution is the best short term option.

PubCon Vegas Day 3: User Generated Content

2009-11-13T00:00:00+00:00

On day 3 of PubCon Vegas, a great session I attended was Optimizing Forums For Search & Dealing with User Generated Content with Dustin Woodard, Lawrence Coburn, and Roger Dooley. User generated content is content generated by users in the form of message boards, customizable profiles, forums, reviews, wikis, blogs, article submission, question and answer, video media, or social networks.

Some good statistics were presented about why to tap into user generated content. Nielsen research recently released showed that 1 out of every 11 minutes spent online is on a social network and 2/3rds of customer “touch points” are user-generated.

Dustin provided some interesting details about long tail traffic. He looked at HitWise’s data of the top 10,000 search terms for a 3 month period. The top 100 terms accounted for 5.7% of all traffic, the top 1000 terms accounted for 10.6% of all traffic, and the entire 10,000 data set accounted for just 18.5% of all traffic. With this data, representing the long tail would be analogous to a lizard with a one inch head and a tail that was 221 miles long that represents the long tail traffic.

Dustin gave the following steps for developing a user generated content community:

Seed it with a few editors and really good initial content.
Give them a voice.
Make it easy to contribute.
Make it cool or trendy.
Provide ownership.
Create competition with contests, ranking or by highlighting expertise.
Build a sense of community or a sense of exclusivity.
Give the people community a purpose.

All SEO best practices apply to a user generated content, but throughout the session, I learned several specific user generated content tips:

Predefining keyword rich categories, topics and tags will go a long way with optimization. The better structure for topics that is created up front, the better the user generated content can content in the long run. Users are not inherently good at content organization, so content can be easily buried with poor information architecture.
Developing automated cross-linking between user generated content helps improve authority, build clusters of content, and enrich the internal link structure. Dustin had experience with building widgets to automatically links to 5 pieces of user generated content and another widget to allow the user to select several pieces of user generated content from a set of related content.
Examples of battling duplicate content include disallowing duplicate page titles and meta descriptions. Content that is moved, renamed or deleted should be managed well.
Finally, building a badge or widget to display user involvement helps increase external linking to your site, but this should be carefully managed to avoid appearing spammy. Widget best practices are that the widget should have excellent accessibility, widgets should be simple with light branding and always have fresh content.
Developing your own tiny URL helps pass and keep intact external links to your site with user generated content. Lawrence suggested to “gently tweet” user generated content that is the highest quality.

Several of End Point’s clients are either in the middle of or considering building a community with user generated content. In ecommerce, blogs, forums, reviews, and Q&A are the most prevalent types of user generated content that I’ve encountered. Many of the things mentioned in this session were good tips to consider throughout the development of user generated content for ecommerce.

PubCon Vegas Day 2: International and Mega Site SEO, and Tools for SEO

2009-11-12T00:00:00+00:00

On the second day of PubCon Vegas, I attended several SEO track sessions including “SEO for Ecommerce”, “International and European Site Optimization”, “Mega Site SEO”, and “SEO/SEM Tools”. A mini-summary of several of the sessions is presented below.

Derrick Wheeler from Microsoft.com spoke on Mega Site SEO about “taming the beast”. Microsoft has 1.2 billion URLs that are comprised of thousands of web properties. For mega site SEO, Derrick highlighted:

Content is NOT king. Structure is! Content is like the princess-in-waiting after structure has been mastered.
Developing an overall SEO approach and organization to getting structure, content, and authority SEO completed is more valuable or relevant to the actual SEO work. This was a common theme among many of the presentations at PubCon.
Getting metrics set up at the beginning of SEO work is a very important step to measure and justify progress.
Don’t be afraid to say no to low priority items.

Most developers deal with a large amount of legacy code. Derrick discussed primary issues when working with legacy problems:

Duplicate and undesirable pages. For Microsoft.com, managing and dealing with 1.2 billion pages results in a lot of duplicate and undesirable pages from the past.
Multiple redirects.
Improper error handling (error handling on 404s or 500s).
International URL structure can be a problem for international sites. Having an appropriate TLD (top level domain) is the best solution, but if that’s not possible, a process should be implemented to regulate the international urls.
Low Quality Page Titles and Meta Tags. For large sites with hundreds of thousands of pages, it’s really important to have unique page titles and meta descriptions or to have a template that forces uniqueness.

In summary, structure and internal processes are areas to focus on for Mega Site SEO. Legacy problems are something to be aware of when you have a site so large where changes won’t be implemented as quickly as small site changes.

In International and European Search Management, Michael Bonfils, Nelson James, and Andy Atkins-Krueger discussed international SEO and SEM tactics. Takeaways include:

In terms of international search marketing, it’s important to incorporate culture into search optimization and marketing. If it works in one country, it may not work in another country and so don’t offend a culture by not understanding it. Some examples of content differences for targeting different cultures include emphasizing price points, focusing on product quality, and asserting authority or trust on a site.
It’s also important to understand how linguistics affects your keyword marketing. Automatic translation should not be used (all the speakers mentioned this). A good example of linguistics and search targeting is the use of the search term “soccer cleats”, or “football boots”. In England, the term “football boot” has a very small portion of the traffic share, but singular terms in other languages (“scarpe de calcio”, “botas de futbal”) have a much larger percentage of the search market share. Andy shared many other examples of how direct translation would not be the best keywords to target (“car insurance”, “healthcare”, “30% off”, “cheap flights”).
Local hosting is important for metrics, linking, and to develop trust. Nelson James shared research that shows that 80% of the top 10 results of the top 30 keywords in china had a ‘.cn’ top level domain, but the other top sites that were ‘.com’ sites are all hosted in china.
Other technical areas for international search that were mentioned are using the meta language tag, pinyin, charset, and language set. Duplicate content also will become a problem across sites of the same language.
It’s important to understand the search market share. In Russia, Google shares 35% of the search market and Yandexx has 54%. In China, Baidu has 76% and Google has 22%. There are some reasons that explain these market share differences. Yandexx was written to manage the large Russian vocabulary that Google does not handle as well. Baidu handles search for media better than Google and search traffic in China is much more entertainment driven rather than business driven in the US.

In the last session of the day, about 100 tools were discussed in SEO/SEM Tools. I’m planning on writing another blog post with a summary of these tools, but here’s a short list of the tools mentioned by multiple speakers:

SEMRush
Google: Keyword Ad Tool, Webmaster Tools, Adplanner, SocialGraph API, Google Trends, Analytics, Google Insights
SpyFu: Kombat, Domain Ad History, Smart Search, Keyword Ad History
SEOBook
SEOmoz: Linkscape, Mozbar, Top Pages, etc.
MajesticSEO
Raven SEO Tools: Website Analytics, Campaign Reports

Stay tuned for a day 3 and wrap up article!

PubCon Vegas Day 1: Keyword Research Session

2009-11-11T00:00:00+00:00

On the first day of PubCon Vegas, I was bombarded by information, sessions, and people. PubCon is a SEO/SEM conference that has a variety of sessions categorized in SEO (Search Engine Optimization), SEM (Search Marketing), Social Media and Affiliates. My primary interest is in SEO, which is why I attended the SEO track yesterday that included sessions about in-house SEO, organic keyword research and selection, and hot topics in SEO.

Because my specific involvement in SEO has focused on technical SEO, I was surprised that my highlight of day one was “Smart Organic Keyword Research and Selection” which included speakers Wil Reynolds, Craig Paddock, Carolyn Shelby, and Mark Jackson.

With good organization and humor, Carolyn first presented the “ABCs of Organic Keyword Research and Selection”: A is for analytics and knowing your audience. B is for brainstorm and bonus. and C is for Cookie!, crunch the numbers, cull the lists, and create a final list of keywords.

On the analytics side, Carolyn mentioned good sources of analytics include web server logs (read my article on the value of log or bot parsing), Google Analytics “traffic generating” keyword list, and logs from internal site search.

In regards to knowing your audience, Carolyn shared her personal experience of focus group research: For a project that targeted teenage girls, she invited her daughter and several of her daughter’s friends to join her around the table with laptops. She showed them a picture and ask them to search for that image. She recorded the search terms used and used this information to help understand her target audience behavior.

On the brainstorm side, she likes to involve core web team members, product managers, marketing, developers, designers, promoters, marketers, and front liners (customer service representatives, tech support). B was also for bonus, which was to get input from the “suits” of a company to get a list of ideal keywords to understand how they measure keyword success.

Craig Paddock spoke on “Organic Keyword Research and Selection” next. He touched on some of the following SEO keyphrase concepts:

keyphrase research: Keyword research is based on keyword popularity, click through rate, quality (measured by conversion and engagement), keyword competitiveness, and current ranking
keyphrase expanders and variations: Broad keyword phrases should include variations of keywords that include words like ‘best’, ‘online’, ‘buy’, ‘cheap’, ‘discount’, ‘wholesale’, ‘accessories’, ‘supplies’, ‘reviews’, and abbreviations of words like states. For End Point’s ecommerce clients, targeting keyphrases with customer reviews is a great way to generate traffic from user generated content
keyphrase discovery: It shouldn’t be assumed that clients know the industry. Craig shared an example that his boxing retailer client made the mistake of targeting specific boxing terms that had low traffic. They expanded to include more popular terms like “lose weight” and “burn calories”. Another tactic to discover keyphrase is to ask what kind of problems the website service offered solve and choose keywords that target these questions and answers.
keyphrase quality: Keyphrase quality is typically measured by conversion rate (revenue / visitor) or engagement. Engagement is measured by the time on site, pages/visit, and bounce rate, which are commonly included in analytics packages.
keyphrase selection: Using exact match and broad match on keywords is helpful and let the customers guide the keyword selection. Craig mentioned that data shows that there is a higher conversion rate on more specific keyphrases, which isn’t surprising.
keyphrase targeting: Keyphrase targeting should match competitiveness with link popularity. An example of this being that more competitive words on your site should be higher up in the hierarchy of the site such as on the home page. For End Point, this would involve us targeting competitive phrases terms like “ecommerce”, “ruby on rails development”, and “web application development” on our homepage and targeting less competitive phrases such as “interchange development” or “ruby on rails ecommerce” on pages lower in the hierarchy.
keyphrase analysis: One area of interest was how analytics tools attribute “credit” to keyphrases. In Google Analytics, if a customer searches “interchange consulting” and visits endpoint.com, then a week later searches “end point”, the conversion or credit of the keyphrase is attributed to the “end point” keyword rather than “interchange consulting”. This is important in ecommerce because this attribution doesn’t accurately credit targeted keywords for revenue. Craig did mention that other tools (including Omniture) provide the ability to select last click attribute versus first click attribute to fix this attribution problem. Another solution to this problem mentioned was to set a user defined variable in Google Analytics equal to a cookie that has the first click search term (“interchange consulting” in the example above) and set the cookie to not expire.

Wil Reynolds spoke next on “Keyword Analysis AFTER the rankings”. He touched on an important concept that SEO (specifically keyphrase research and targeting) is never done because keywords are constantly evolving because people change the way they search, blended search (video, image) is on the rise, and there are social or economic influences on the keyword popularity. Some good examples of keyphrase trending include:

“Shopping” was a good keyword in 1999 because ecommerce was growing on the web and users didn’t know what to search for.
“Handheld device” transitioned to “Smartphone”
“Eco-Friendly” has grown while “Environmentally Friendly” has declined — view this trend here
“Netbooks” and “Ultraportables” are popular search terms on the rise that were non-existent two years ago — view netbook trends here
Brands in the gear industry evolve at a much faster pace than the plumbing or wood floor industry

Wil’s examples and advice apply directly to our clients who should to be aware of social and economic influences that may require they change they keyphrase targeting over time.

Finally, Mark Jackson spoke on focusing your keywords for better results. He discussed the importance of analyzing the keyword competitiveness to determine which keywords to target to get the most value out of keyword SEO work.

In summary, I still don’t love keyword and keyphrase research and selection :), but I found that the speakers presented a great overview of keyword research and selection with a good mixture of personal experience, expertise and examples. In summary, some great concepts to keep in mind in regards to keyword research are:

There are always missed opportunities in keyword targeting.
There are lotsa tools! Tools are good for measuring keyphrase competitiveness, user engagement, and identifying missed opportunities.
SEO keyphrase research and selection is an ongoing process.

Now, back to day 2 activities…

Performance optimization of icdevgroup.org

2009-10-23T00:00:00+00:00

Some years ago Davor Ocelić redesigned icdevgroup.org, Interchange’s home on the web. Since then, most of the attention paid to it has been on content such as news, documentation, release information, and so on. We haven’t looked much at implementation or optimization details. Recently I decided to do just that.

Interchange optimizations

There is currently no separate logged-in user area of icdevgroup.org, so Interchange is primarily used here as a templating system and database interface. The automatic read/write of a server-side user session is thus unneeded overhead, as is periodic culling of the old sessions. So I turned off permanent sessions by making all visitors appear to be search bots. Adding to interchange.cfg:

RobotUA *

That would not work for most Interchange sites, which need a server-side session for storing mv_click action code, scratch variables, logged-in state, shopping cart, etc. But for a read-only content site, it works well.

By default, Interchange writes user page requests to a special tracking log as part of its UserTrack facility. It also outputs an X-Track HTTP response header with some information about the visit which can be used by a (to my knowledge) long defunct analytics package. Since we don’t need either of those features, we can save a tiny bit of overhead. Adding to catalog.cfg:

UserTrack No

Very few Interchange sites have any need for UserTrack anymore, so this is commonly a safe optimization to make.

HTTP optimizations

Today I ran the excellent webpagetest.org test, and this was the icdevgroup.org test result. Even though icdevgroup.org is a fairly simple site without much bloat, two obvious areas for improvement stood out.

First, gzip/deflate compression of textual content should be enabled. That cuts down on bandwidth used and page delivery time by a significant amount, and with modern CPUs adds no appreciable extra CPU load on either the client or the server.

We’re hosting icdevgroup.org on Debian GNU/Linux with Apache 2.2, which has a reasonable default configuration of mod_deflate that does this, so it’s easy to enable:

a2enmod deflate

That sets up symbolic links in /etc/apache2/mods-enabled for deflate.load and deflate.conf to enable mod_deflate. (Use a2dismod to remove them if needed.)

I added two content types for CSS & JavaScript to the default in deflate.conf:

AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript

That used to be riskier when very old browsers such as Netscape 3 and 4 claimed to support compressed CSS & JavaScript but actually didn’t. But those browsers are long gone.

The next easy optimization is to enable proxy and browser caching of static content: images, CSS, and JavaScript files. By doing this we eliminate all HTTP requests for these files; the browser won’t even check with the server to see if it has the current version of these files once it has loaded them into its cache, making subsequent use of those files blazingly fast.

There is, of course, a tradeoff to this. Once the browser has the file cached, you can’t make it fetch a newer version unless you change the filename. So we’ll set a cache lifetime of only one hour. That’s long enough to easily cover most users’ browsing sessions at a site like this, but short enough that if we need to publish a new version of one of these files, it will still propagate fairly quickly.

So I added to the Apache configuration file for this virtual host:

ExpiresActive On
ExpiresByType image/gif  "access plus 1 hour"
ExpiresByType image/jpeg "access plus 1 hour"
ExpiresByType image/png  "access plus 1 hour"
ExpiresByType text/css   "access plus 1 hour"
ExpiresByType application/x-javascript "access plus 1 hour"
FileETag None
Header unset ETag

This adds the HTTP response header “Cache-Control: max-age=3600” for those static files. I also have Apache remove the ETag header which is not needed given this caching and the Last-modified header.

There are cases where the above configuration would be too broad, for example, if you have:

images that differ with the same filename, such as CAPTCHAs
static files that vary based on logged-in state
dynamically-generated CSS or JavaScript files with the same name

If the website is completely static, including the HTML, or identical for all users at the same time even though dynamically generated, we could also enable caching the HTML pages themselves. But in the case of icdevgroup.org, that would probably cause trouble with the Gitweb repository browser, live documentation searches, etc.

After those changes, we can see the results of a new webpagetest.org run and see that we reduced the bytes transferred, and the delivery time. It’s especially dramatic to see how much faster subsequent page views of the Hall of Fame are, since it has many screenshot thumbnail images.

Optimizing a simple non-commerce site such as icdevgroup.org is easy and even fun. With caution and practicing on a non-production system, complex ecommerce sites can be optimized using the same techniques, with even more dramatic benefits.

SEO: External Links and PageRank

2009-09-17T00:00:00+00:00

I had a flash of inspiration to write an article about external links in the world of search engine optimization. I’ve created many SEO reports for End Point’s clients with an emphasis on technical aspects of search engine optimization. However, at the end of the SEO report, I always like to point out that search engine performance is dependent on having high quality fresh and relevant content and popularity (for example, PageRank). The number of external links to a site is a large factor in popularity of a site, and so the number of external links to a site can positively influence search engine performance.

After wrapping up a report yesterday, I wondered if the external link data that I provide to our clients is meaningful to them. What is the average response when I report, “You should get high quality external links from many diverse domains”?

So, I investigated some data of well known and less well known sites to display a spectrum of external link and PageRank data. Here is the origin of some of the less well known domains referenced in the data below:

www.petfinder.com: This is where my dogs came from.
www.endpoint.com: That’s Us!
www.sonypictures.com/movies/district9/: The site for the movie District 9 — I saw it last weekend.
marketstreetgrill.com: Market Street Grill is a great seafood restaurant in Salt Lake City.
divascupcakes.com: This is a great gourmet cupcake place in Salt Lake City.
rediguana.com: A great Mexican food restaurant in Salt Lake City.

And here is the data:

I retrieved the PageRank from a generic PageRank tool. SEOmoz was used to collect external link counts and external linking subdomains. Finally, Yahoo Site Explorer was used to retrieve external link counts to the domain in question. I chose to examine both external link counts from SEOmoz and Yahoo Site Explorer to get a better representation of data. SEOmoz compiles their data about once a month and does not have as many urls indexed as Yahoo, which explains why their numbers may be lagging behind the Yahoo Site Explorer external link counts.

Out of curiosity, I went on to plot the Page Rank data vs. Log (base 10) of the other data.

PageRank vs. Log of SEOmoz external link count:

PageRank vs. Log of SEOmoz external linking subdomain count:

PageRank vs. Log of Yahoo SiteExplorer external link count:

PageRank is described as a theoretical probability value on a logarithmic scale and it’s based on inbound links, PageRank of inbound links, and other factors such as Google visit data, search click-through rates, etc. The true popularity rank is a rank between 1 and X, where X is equal to the total number of webpages crawled by search engine A. After pages are individually ranked between 1 and X, they are scaled logarithmically between 0 and 10.

The takeaway from this data is when an “SEO report” gives advice to “get more external links”, it means:

If your site has a PageRank of < 4, getting external links on the scale of hundreds may impact your existing PageRank or popularity
If your site has a PageRank of >= 4 and < 6, getting external links on the scale of thousands may impact your existing PageRank or popularity
If your site has a PageRank of >= 6 and < 8, getting external links on the scale of tens to hundreds of thousands may impact your existing PageRank or popularity
If your site has a PageRank of >= 8, you probably are already doing something right…

Furthermore, even if a site improves external link counts, other factors will play into the PageRank algorithm. Additionally, keyword relevance and popularity play key roles in search engine results.

Site Search on Rails

2009-08-14T00:00:00+00:00

I was recently tasked with implementing site search using a commercially available site search application for one of our clients (Gear.com). The basic implementation requires that a SOAP request be made and the XML data returned be parsed for display. The SOAP request contains basic search information, and additional information such as product pagination and sort by parameters. During the implementation in a Rails application, I applied a few unique solutions worthy of a blog article. :)

The first requirement I tackled was to design the web application in a way that produced search engine friendly canonical URLs. I used Rails routing to implement a basic search:

map.connect ':id', :controller => 'basic', :action => 'search'

Any simple search path would be sent to the basic search query that performed the SOAP request followed by XML data parsing. For example, https://www.gear.com/s/climb is a search for “climb” and https://www.gear.com/s/bike for “bike”.

After the initial search, a user can refine the search by brand, merchant, category or price, or choose to sort the items, select a different page, or modify the number of items per page. I chose to force the order of refinement, for example, brand and merchant order were constrained with the following Rails routes:

map.connect ':id/brand/:rbrand', :controller => 'basic', :action => 'search'
map.connect ':id/merch/:rmerch', :controller => 'basic', :action => 'search'
map.connect ':id/brand/:rbrand/merch/:rmerch', :controller => 'basic', :action => 'search'

Rather than allow different order of refinement parameters in the URLs, such as http://www.gear.com/s/climb/brand/Arcteryx/merch/Altrec and http://www.gear.com/s/climb/merch/Altrec/brand/Arcteryx, the order of search refinement is always limited to the Rails routes specified above and the former URL would be allowed in this example.

For example, http://www.gear.com/s/climb/brand/Arcteryx/merch/Altrec is a valid URL for Arcteryx Altrec climb, http://www.gear.com/s/climb/brand/Arcteryx for Arcteryx climb, and http://www.gear.com/s/climb/merch/Altrec for Altrec climb.

All URLs on any given search result page are built with a single Ruby method to force the refinement and parameter order. The method input requires the existing refinement values, the new refinement key, and the new refinement value. The method builds a URL with all previously existing refinement values and adds the new refinement value. Rather than generating millions of URLs with the various refinement combinations of brand, merchant, category, price, items per page, pagination number, and sort method, this logic minimizes duplicate content. The use of Rails routes and the chosen URL structure also creates search engine friendly URLs that can be targeted for traffic. Below is example pseudocode with the URL-building method:

def build_url(parameters, new_key, new_value)
  # set url to basic search information
  # append brand info to url if parameters[:brand] exists or if new_key is brand
  # append merchant info to url if parameters[:merchant] exists or if new_key is merchant
  # append category info to url if parameters[:cat] exists or if new_key is cat
  # ...
end

The next requirement I encountered was breadcrumb functionality. Breadcrumbs are an important usability feature that provide the ability to navigate backwards in search and refinement history. Because of the canonical URL solution described above, the URL could not be used to indicate the search refinement history. For example, http://www.gear.com/s/climb/brand/Arcteryx/merch/Altrec does not indicate whether the user had refined by brand then merchant, or by merchant then brand. I investigated a few solutions having implemented similar breadcrumb functionality for other End Point clients, including appending the ‘#’ (hash or relative url) to the end of the URL with details of the user refinement path, using JavaScript to set a cookie containing the user refinement path whenever a link was clicked, and using a session variable to track the user refinement path. In the end, I found it easiest to use a single session variable to track the user refinement path. The session variable contained all information needed to display the breadcrumb with a bit of parsing.

For example, for the URL mentioned above, the session variable of ‘brand-Arcteryx:merch-Altrec’ would yield the breadcrumb: Your search: climb > Arcteryx > Altrec And the session variable ‘merch-Altrec:brand-Arcteryx’ would yield the breadcrumb: Your search: climb > Altrec > Arcteryx. I could have used more than one session variable, but this solution worked out to be simple and comprised less than 10 lines of code.

Another interesting necessity was determining the best way to parse the XML data. I researched several XML parsers including XmlSimple, Hpricot, ReXML, and libxml. About a year ago, John Nunemaker reported on some benchmark testing of several of these packages (Parsing XML with Ruby). After some investigative work, I chose Hpricot because it was very easy to implement complex selectors that reminded me of jQuery selectors (which are also easy to use). The interesting thing that I noticed throughout the implementation was that the refinement parsing took much more time than the actual product parsing and formatting. For Gear.com, the number of products returned ranges from 20-60 and products were quickly parsed. The number of refinements returned ranged from very small for a distinct search Moccasym (4 refinement options) to a general search jacket (50+ refinement options). If performance is an issue in the future, I can further investigate the use of libxml-ruby or other Ruby XML parsing tools that may improve the performance.

A final point of interest was the decision to tie the Rails application to the same database that drives the product pages (which was easily done). This decision was made to allow access of frontend taxonomy information for the product categorization. For example, if a user chooses to refine a specific by a category (jacket in Kids Clothing), the Rails app can retrieve all the taxonomy information for that category such as the display name, the number of products in that category, subcategories, and subsubcategories. This may be important information required for additional features, such as providing the ability to view the subcategories in this category or view other products in this category that aren’t shown in the search results.

I was happy to see the success of this project after working through the deliverables. Future work includes integration of additional search features common to many site search packages, such as implementing refinement by color and size, or retrieving recommended products or best sellers.

nofollow in PageRank Sculpting

2009-06-24T00:00:00+00:00

Last week the SEO world reacted to Matt Cutts’ article about the use of nofollow in PageRank sculpting.

Google uses the PageRank algorithm to calculate popularity of pages in the web. Popularity is only one factor in determining which pages are returned in search results (relevance to search terms is the other major factor). Other major search engines use similar popularity algorithms. Without describing the algorithm in detail, the important takeaways are:

PageRank of a single page is influenced by all inbound (external links) links
PageRank of a single page is passed on to all outgoing links after being normalized and divided by the total number of outgoing links

So, given page C with an inbound links from page A and B, where page A and B have equal page rank X, page A has 3 total external links and B has 5 total external links, page C receives more PageRank from page A than page B.

From an external link perspective, it’s great to get as many links as possible from a variety of sources that rank high and have a low number of external links. From an internal site perspective, it’s important to examine how PageRank is passed throughout a site to apply the best site architecture. In addition to designing a site architecture that pleases users and passes link juice throughout a site effectively, the rel=“nofollow” tag was adopted by several major search engines and was used as an additional tool to stop the flow of link juice from one page to another. The nofollow tag can also be used to identify paid links (early implementation) or to avoid passing links to external sites completely.

In the example above, rel=“nofollow” could be added to 2 links on page B which would result in the same PageRank passed from page B to page C as from page A to page C.

Then, at a recent SEO conference, Matt Cutts (head of the Google spam team) made a comment about how the PageRank algorithm changed its use of nofollow and just last week, it was announced that the PageRank algorithm would no longer use the nofollow attribute in PageRank sculpting. Any link with the nofollow attribute will no longer reduce the count of outgoing page links to improve link juice passed on to other pages, but link juice will still not be passed from one link to another with the nofollow attribute.

In the ongoing example, the link juice passed from page B to page C will be less than from page A to C because it has more outgoing links, even if they are nofollow links.

One SEOmoz article I read suggests that SEO best practices will now be to recommend blog owners to disallow comments that may contain external links to prevent the dilution of link juice. Other potential solutions would be to filter out links from user generated content (comments or qna specifically), use iframes to display any user generated content, or embed flash or java with external links. The nofollow attribute may be used to stop the flow of link juice to external pages, however, it may no longer be used for internal PageRank sculpting.

SEO Ecommerce

2009-04-20T00:00:00+00:00

I recently read an article that discusses Magento SEO problems and solutions. This got me to think about common search engine optimization issues that I’ve seen in e-commerce. Below are some highlighted e-commerce search engine optimization issues. The Spree Demo, Interchange Demo, and Magento Demo are used as references.

Duplicate Home Pages (www, non-www, index.html)

Duplicate home pages can come in the form of a homepage with www and without www, a homepage in the form of http://www.domain.com/ and a homepage with some variation of “index” appended to the url, or a combination of the two. In the Interchange demo, http://demo.icdevgroup.org/i/demo1 and http://demo.icdevgroup.org/i/demo1/index.html are duplicate, http://demo.spreecommerce.com/ and http://demo.spreecommerce.com/products/ in the Spree demo, and finally http://demo.magentocommerce.com/ and http://demo.magentocommerce.com/index.php in the Magento demo.

External links positively influence search engine performance more if they are pointing to one index page rather than being divided between two or three home pages. Since the homepage most likely receives the most external links, this issue can be more problematic than other generated duplicate content. I’ve also seen this happen in several content management systems.

This article provides directions on mod_rewrite use to apply a 301 redirect from the www.domain.com/index.php homepage to www.domain.com. This solution or other redirect solutions can be applied to Spree, Interchange, and other ecommerce platforms.

Irrelevant Product URLs

A search engine optimization best practice is to provide relevant and indicative text in the product urls. In the Interchange demo, the default catalog uses the product sku in the product url (http://demo.icdevgroup.org/i/demo1/os28073.html). In Magento and Spree, product permalinks with relevant text are used in the product url. In wordpress, the author has the ability to set permalinks for articles. I am unsure if Magento gives you the ability to customize product urls. Spree does not currently give you the ability to manage custom product permalinks. However, for all of these ecommerce platforms, these fixes may all be in the works since it is important for ecommerce platforms to implement search engine optimization best practices.

Duplicate product content

I’ve observed several situations where products divided into multiple taxonomies results in duplicate content creation via different user navigation paths. For example, in the Spree demo, the “Ruby Baseball Jersey” can be reached through the Ruby brand page, the Clothing page, or the homepage. The three generated duplicate content urls are http://demo.spreecommerce.com/products/ruby-on-rails-ringer-t-shirt, http://demo.spreecommerce.com/t/brands/ruby/p/ruby-baseball-jersey, and http://demo.spreecommerce.com/t/categories/clothing/shirts/p/ruby-baseball-jersey.

Another example of this can be found in the Interchange demo. The left navigation taxonomy tree provides links to any product url with “?open=X,Y,Z” appended to the url. The “open” query string indicates how the DHTML tree should be displayed. For example, the “Digger Hand Trencher” has a base url of http://demo.icdevgroup.org/i/demo1/os28076.html. Depending on which tree nodes are exploded, the product can be reached at http://demo.icdevgroup.org/i/demo1/os28076.html?open=0,11,13,19, http://demo.icdevgroup.org/i/demo1/os28076.html?open=0,11,13, etc. This standard demo functionality yields a lot of duplicate content.

In Magento, products are the in the form of www.domain.com/product-name, although the article I mentioned above mentions that www.domain.com/category/product.html product urls were generated. Perhaps this was a recent fix, or perhaps the demo is configured to avoid generating this type of duplicate content.

Duplicate product page content is often used to indicate which breadcrumb should display or to track user click-through behavior (for example, did a user click on a “featured product”? a “best seller”? a specific “product advertisement”?). In Interchange, session ids are appended to urls which is another source of duplicate content. Instead of using the url to track user navigation or behavior, several other solutions such as using cookies, using a ‘#’ (hash), or using session data can be used to avoid duplicate content generation.

Performance

Performance should not be overlooked in ecommerce for search engine optimization. In March of 2008, Google wrote about how landing page load time will be incorporated into the Quality Score for Google Adwords—which is also believed to apply to regular search results. And github recently released some data on how performance improvements influenced http://www.github.com/ Googlebot visits.

Keeping a high content to text ratio, consolidation, minification, and gzipping CSS and JavaScript, and minimizing the use of JavaScript-based suckerfish can all improve search engine performance.

The Interchange default catalog has a simple template with minimal css and javascript includes, so the developer is responsible for sticking to best performance practices. The Magento demo appears to have decent content to text ratio, but still requires 5 css files that should be consolidated and minified if they are included on every page. Finally, Spree has undergone some changes in the last month and is moving in the direction of including one consolidated javascript file plus any javascript required for extensions on every page, and the upcoming release of Spree 0.8.0 will have considerable frontend view improvements.

Ecommerce platforms should have decent performance—yslow or this book on high performance website essentials are good resources.

Lacking basic CMS management

Basic CMS management such as the ability to manage and update page titles and page meta data is something that has been overlooked by ecommerce platforms in the past, but appears to have been given more attention recently. An ecommerce solution should also have functionality to create and manage static pages.

The Interchange demo does not have meta description and keyword functionality, however, page titles are equal to product names which is an acceptable default. It’s also very simple to add a static content page (as a developer) and would require just a bit more effort to have this content managed by a database in Interchange. The Spree core is missing some basic CMS management such as page title and meta data management, but this functionality is currently in development. One Spree contributer developed a Spree extension that provides management of simple static pages using a WYSIWYG editor. At the moment, Magento appears to have the most traditional content management system functionality out of the box.

Another area to improve CMS within Ecommerce is to determine a solution to integrate a blog. A quick Google search of “magento add blog” revealed how to set up a wordpress blog in Magento with an extension. One of End Point’s clients, CCI Beauty, also has wordpress integrated into their Interchange setup. Finally, there has been discussion about the development of “Spradiant”, or mixing spree and radiant.

Another missed opportunity in ecommerce platforms is finding a solution to elegantly blend content and product listings to target specific keywords. A “landing page” can have a page title, meta data, and content targeted towards a specific terms. http://www.backcountry.com/store/gear/arcteryx-vests.html and http://www.backcountry.com/store/gear/cargo-pant.html are examples of targeted terms with corresponding products. Going one step farther, search pages themselves can have managed content to attract keywords, such as a page title, and meta data for specific high traffic keywords with the related products. For example, http://www.domain.com/s/ruby_shirt could be a search page for “Ruby Shirt” which contains meaningful content and relevant products.

Mishandled Product Pagination

Finding a search engine optimization solution for pagination can be a difficult problem in ecommerce. When there are less than 100 products for a site, this shouldn’t be an issue because a simple taxonomy can appropriately group the products with low crawl depth. A website with 10,000 products must balance between keeping a low taxonomy depth to minimize crawl depth and ensure that all products are listed and indexable.

For example, products may be divided and fit into three levels of navigation: category, subcategory, and group. If there are 10,000 products, divided into 10 categories, 10 subcategories per category, and 10 product groups per category, 10 products can be shown on each group per page with no pagination. However, product taxonomy is not always so ideal. In some groups there may be 2 products and in others there may be 30. Pagination, or pages with an offset of product listings are generated to accommodate these product listings (for example, http://www.backcountry.com/store/group/61/Sun-Hats-Rain-Hats-Safari-Hats.html, http://www.backcountry.com/store/group/61/Sun-Hats-Rain-Hats-Safari-Hats-p1.html).

A few problems can arise from the pagination solution. First, by web 2.0 standards, the content should be generated via ajax. An SEO friendly ajax solution must be implemented—where the onclick event refreshes the content, but the links are still crawlable via search engine bots. Second, page 1 with no product offset will have 1 level less of crawl depth, therefore it will receive the most link juice from it’s parent page (subcategory). As a result, there must be thoughtful analysis of which products to present on that page: should high traffic pages get the traffic? should popular items be listed on the first page? should low traffic products be listed to try to bump the traffic on those pages? should products with the most “user interaction” (reviews, qna, ratings) be shown on that page? Another problem that comes up is that the page meta data and title will most likely be very similar since the content is a list of similar products. These two pages can essentially be competing for traffic and may be counted as duplicate content if the page titles and meta data are equal.

Interchange uses the more list to handle pagination, but this functionality is not search engine friendly as it generates urls such as http://demo.icdevgroup.org/i/demo1/scan/MM=3ffffa066192cba677e1428d7461ddc9:10:19:10.html?mv_more_ip=1&mv_nextpage=results&mv_arg=, http://demo.icdevgroup.org/i/demo1/scan/MM=3ffffa066192cba677e1428d7461ddc9:20:27:10.html?mv_more_ip=1&mv_nextpage=results&mv_arg=, etc. The Spree demo had some pagination implementation, but upon recent frontend changes, it is no longer included in the demo. The Magento demo was carefully arranged so that product group pages have no more than 9 products to avoid showing any pagination functionality. However, when modifying the number of products displayed per group or using the “Sort By” mechanism, ?limit=Y and &order=X&dir=asc is appended to the url—which can produce a large volume of duplicate content (try filters on this page).

It is difficult to determine which of the above problems is the most problematic. From personal experience, I have been involved in tackling all duplicate content issues, and then moving on to “optimization” opportunities such as enhancing the content management system. At the very least, developers and users of any ecommerce platform should be aware of common search engine optimization issues.