Content Syndication, SEO, and the rel canonical Tag
End Point Blog Content Syndication
The past couple weeks, I’ve been discussing if content syndication of our blog negatively affects our search traffic with Jon. Since the blog’s inception, full articles have been syndicated by OSNews. The last couple weeks, I’ve been keeping an eye on the effects of content syndication on search to determine what (if any) negative effects we experience.
By my observations, immediately after we publish an article, the article is indexed by Google and is near the top search results for a search with keywords similar to the article’s title. The next day, OSNews syndication of the article shows up in the same keyword search, and our article disappears from the search results. Then, several days later, our article is ahead of OSNews as if Google’s algorithm has determined the original source of the content. I’ve provided visual representation of this behavior:
With content syndication of our blog articles, there is a several day lag where Google treats our blog article as the duplicate content and returns the OSNews article in search results for a search similar to our the blog article’s title. After this lag time, the OSNews article is treated as …
seo
Editing large files in place
Running out of disk space seems to be an all too common problem lately, especially when dealing with large databases. One situation that came up recently was a client who needed to import a large Postgres dump file into a new database. Unfortunately, they were very low on disk space and the file needed to be modified. Without going into all the reasons, we needed the databases to use template1 as the template database, and not template0. This was a very large, multi-gigabyte file, and the amount of space left on the disk was measured in megabytes. It would have taken too long to copy the file somewhere else to edit it, so I did a low-level edit using the Unix utility dd. The rest of this post gives the details.
To demonstrate the problem and the solution, we’ll need a disk partition that has little-to-no free space available. In Linux, it’s easy enough to create such a thing by using a RAM disk. Most Linux distributions already have these ready to go. We’ll check it out with:
$ ls -l /dev/ram*
brw-rw---- 1 root disk 1, 0 2009-12-14 13:04 /dev/ram0
brw-rw---- 1 root disk 1, 1 2009-12-14 22:27 /dev/ram1From the above, we see that there are some RAM disks available (there are …
database postgres tips emacs vim
Live by the sword, die by the sword
In an amazing display of chutzpah, Monty Widenius recently asked on his blog for people to write to the EC about the takeover of Sun by Oracle and its effect on MySQL, saying:
I, Michael “Monty” Widenius, the creator of MySQL, is asking you urgently to help save MySQL from Oracle’s clutches. Without your immediate help Oracle might get to own MySQL any day now. By writing to the European Commission (EC) you can support this cause and help secure the future development of the product MySQL as an Open Source project.
“Help secure the future development”? Sorry, but that ship has sailed. Specifically, when MySQL was sold to Sun. There were many other missed opportunities over the years to keep MySQL as a good open source project. Some of the missteps:
- Bringing in venture capitalists
- Selling to Sun instead of making an IPO (Initial Public Offering)
- Failing to check on the long-term health of Sun before selling to them
- Choosing the proprietary dual-licensing route
- Making the documentation have a restricted license
- Failing to acquire InnoDB (which instead was bought by Oracle)
- Failing to acquire SleepyCat (which was instead bought by Oracle)
- Spreading FUD about the dual license and …
community database mysql open-source postgres
List Google Pages Indexed for SEO: Two Step How To
Whenever I work on SEO reports, I often start by looking at pages indexed in Google. I just want a simple list of the URLs indexed by the GOOG. I usually use this list to get a general idea of navigation, look for duplicate content, and examine initial counts of different types of pages indexed.
Yesterday, I finally got around to figuring out a command line solution to generate this desired indexation list. Here’s how to use the command line using http://www.endpoint.com/ as an example:
Step 1
Grab the search results using the “site:” operator and make sure you run an advanced search that shows 100 results. The URL will look something like: https://www.google.com/search?num=100&as_sitesearch=www.endpoint.com
But it will likely have lots of other query parameters of lesser importance [to us]. Save the search results page as search.html.
Step 2
Run the following command:
sed 's/<h3 class="r">/\n/g; s/class="l"/LINK\n/g' search.html | grep LINK | sed 's/<a href="\|" LINK//g'There you have it. Interestingly enough, the order of pages can be an indicator of which pages rank well. Typically, pages with higher PageRank will be near …
seo
Multiple links to files in /etc
I came across an unfamiliar error in /var/log/messages on a RHEL 5 server the other day:
Dec 2 17:17:23 <em>X</em> restorecond: Will not restore a file with more than one hard link (/etc/resolv.conf) No such file or directorySure enough, ls showed the inode pointed to by /etc/resolv.conf having 2 links. What was the other link?
# find /etc -samefile resolv.conf
/etc/resolv.conf
/etc/sysconfig/networking/profiles/default/resolv.conf
# ls -lai /etc/resolv.conf /etc/sysconfig/networking/profiles/default/resolv.conf
1526575 -rw-r--r-- 2 root root 69 Nov 30 2008 /etc/resolv.conf
1526575 -rw-r--r-- 2 root root 69 Nov 30 2008 /etc/sysconfig/networking/profiles/default/resolv.confI’ve worked with a lot of RHEL/CentOS 5 servers and hadn’t ever dealt with these network profiles. Kiel guessed it was probably a system configuration tool that we never use, and he was right: Running system-config-network (part of the system-config-network-tui RPM package) creates the hardlinks for the default profile.
/etc/hosts gets the same treatment as /etc/resolv.conf.
I suppose SELinux’s restorecond doesn’t want to apply any context changes because its rules are based on filesystem paths, …
hosting redhat security
CakePHP Infinite Redirects from Auto Login and Force Secure
Lately, Ron, Ethan, and I have been blogging about several of our CakePHP learning experiences, such as incrementally migrating to CakePHP, using the CakePHP Security component, and creating CakePHP fixtures for HABTM relationships. This week, I came across another blog-worthy topic while troubleshooting for JackThreads that involved auto login, requests that were forced to be secure, and infinite redirects.
Ack! Users were experiencing infinite redirects!
The Problem
Some users were seeing infinite redirects. The following use cases identified the problem:
- Auto login true, click on link to secure or non-secure homepage => Whammy: Infinite redirect!
- Auto login false, click on link to secure or non-secure homepage => No Whammy!
- Auto login true, type in secure or non-secure homepage in new tab => No Whammy!
- Auto login false, type in secure or non-secure homepage in new tab => No Whammy!
So, the problem boiled down to an infinite redirect when auto login customers clicked to the site through a referer, such as a promotional email or a link to the site.
Identifying the Cause of the Problem
After I applied initial surface-level debugging without success, I decided to add …
php
Cisco PIX mangled packets and iptables state tracking
Kiel and I had a fun time tracking down a client’s networking problem the other day. Their scp transfers from their application servers behind a Cisco PIX firewall failed after a few seconds, consistently, with a connection reset.
The problem was easily reproducible with packet sizes of 993 bytes or more, not just with TCP but also ICMP (bloated ping packets, generated with ping -s 993 $host). That raised the question of how this problem could go undetected for their heavy web traffic. We determined that their HTTP load balancer avoided the problem as it rewrote the packets for HTTP traffic on each side.
Kiel narrowed the connect resets down to iptables’ state-tracking considering packets INVALID, not ESTABLISHED or RELATED as they should be.
Then he found via tcpdump that the problem was easily visible in scp connections when TCP window scaling adjustments were made by either side of the connection. We tried disabling window scaling but that didn’t help.
We tried having iptables allow packets in state INVALID when they were also ESTABLISHED or RELATED, and that reduced the frequency of terminated connections, but still didn’t eliminate them entirely. (And it was a kludge we …
hosting redhat security
Iterative Migration of Legacy Applications to CakePHP
As Steph noted, we recently embarked on an adventure with a client who had a legacy PHP app. The app was initially developed in rapid fashion, with changing business goals along the way. Some effort was made at the outset with this vanilla PHP app to put key business logic in classes, but as often happens over time the cleanliness of those classes degraded. While much of the business rules and state management (i.e. database manipulation, session wrangling, authentication/access-control, etc.) were kept separate from the “views” (the PHP entry pages), the classes themselves became tightly coupled, overburdened with myriad responsibilities, etc.
This was a far cry from the stereotypical spaghetti PHP app, but nevertheless it needed some reorganization; all but the smallest changes inevitably required touching a wide range of classes and pages, and the code would only grow more brittle unless some serious refactoring took place.
We determined at the outset that getting the application moved into an established MVC framework would be of great benefit, and further determined that CakePHP would be a good choice. (This is the point where anybody reading will inevitably ask in comments …
php

