Determining dominant image color
This grew out of a misunderstanding of a client’s request, so it never saw the light of day, but I thought it was an interesting problem.
The request was to provide a “color search” of products, i.e., “show me all the orange products”. Finding the specific products was not a challenge since it was just a database query. Instead I was interested in how to choose a “representative image” from among the available images for that product. (And as it turns out, the image filename gave me that information, but let’s assume you don’t have that luxury: how do you tell, from a group of images, which one is “more orange” than the others?)
Of course, this depends on the composition of the image. In this case, I knew that the majority were of solid-color (or two- or three-color at most) products on a white background. The approach that was settled on was to severely pixellate the image into something like 20x20 (arbitrary; this could be very dependent on the images under study, or the graphics library in use). If you also supply a color palette restricted to the colors you are interested in matching (e.g., primary, and secondary colors, plus perhaps black, white, and gray), you would have a …
graphics
Use ZIP+4, except when you shouldn’t
The USPS provides a handy API for looking up postal rates on the fly. Recently it started failing for code that had been working for a while, so I investigated. I found a couple of different problems with it:
- First, the “service description” field had been “augmented” by including copyright symbols via HTML mark-up. That meant internal comparisons started to fail, so I “canonicalized” all the responses by stripping out various things from both sides of my comparison.
$string =~ s{&(?:[a-z/;&])+}{}gis;
$string =~ s/[^a-z]//gis;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
$string =~ s/\s+/ /gis;- Second, I found that the API inexplicably rejects 9-digit ZIP codes, the “ZIP+4” format. That’s right, you can’t look up a domestic shipping rate for a 9-digit ZIP. The documentation linked above specifically calls for 5-digit ZIPs. If you pass a 9-digit ZIP to the API, it doesn’t smartly recognize that you’ve given it too much info and just use what it needs. Instead, it throws an error.
So the API got too clever in one regard, and not clever enough where it counts.
perl
Virtual Page Tracking and Goals with Google Analytics
Sometimes I come across websites that don’t use RESTful URLs or have too unique (with an order number) URLs during checkout and I need to implement Goal Tracking in Google Analytics on these user interactions. I’ve also had to implement Goal Tracking in a non-ecommerce web application where tabbed on-page browsing guides users through a 3-step process. Examples of situations that pose challenges to traditional page tracking in Google Analytics include:
- Throughout Interchange’s checkout, URLs are posts to “/process”, which makes the user interactions difficult to distinguish.
- Throughout Spree’s checkout, URLs are posts to “/order/:id/edit”, which are distinct and can be difficult to aggregate.
- In a Sinatra application we developed recently, the single page URL is “/locate.html”, but tabbed browsing occurs through three unique steps.
Google Analytics Tagging
To add Goal Tracking by URL, pages must first be tagged as “virtual pages”. To implement virtual page tracking in Google, it’s as simple as including a new virtual page URL in the _trackPageview action:
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', …analytics ecommerce javascript
DBD::Pg query cancelling in Postgres
A new version of DBD::Pg, the Perl driver for PostgreSQL, has just been released. In addition to fixing some memory leaks and other minor bugs, this release (version 2.18.0) introduces support for the DBI method known as cancel(). A giant thanks to Eric Simon, who wrote this new feature. The new method is similar to the existing pg_cancel() method, except it works on synchronous rather than asynchronous queries. I’ll show an example of both below.
DBD::Pg has been able to handle asynchronous queries for a while now. Basically, that means you don’t have to wait around for the database to finish a query. Your application can do other things while the query runs, then check back later to see if it has completed and grab the results. The way to cancel an already kicked-off asynchronous query is with the pg_cancel() method (the other asynchronous methods are pg_ready and pg_result, which have no synchronous equivalents).
The prefix “pg_” is used because there is no corresponding built-in DBI method to override, and the convention is to prefix everything custom to a driver with the driver’s prefix, in our case ‘pg’. Here’s an example showing one possible use of asynchronous …
database dbdpg perl postgres
Ruby, Rails, and Ecommerce
I’m a big fan of Ruby. And I like Rails too. Lately, I’ve been investigating several Ruby, Rails, and Rails ecommerce framework options (follow-up to discussing general ecommerce options). I’ve also recently written about developing ecommerce on Sinatra (one, two, and three). Most of End Point’s clients are ecommerce clients, so we’ve seen it all in terms of feature requests (third party integration like QuickBooks, search, PayPal, product features like best sellers, recommended items, related items, checkout features like one-page checkout, guest checkout, backend features like advanced reporting, sales management, inventory management). Our services also include hosting and database consulting for many of our ecommerce clients, so we have a great understanding of what it takes to run an ecommerce solution.
When it comes to ecommerce development, someone who likes coding in Ruby (like me) has a few options:
- Ruby DSL (e.g. Sinatra)
- Pure Rails
- Open Source Ecommerce on Rails: Spree, ROR-Ecommerce, Substruct. End Point admittedly has the most experience with Spree.
Here’s a run down of some pros and cons of each option:
| Pros | Cons | |
Ruby DSL |
|
ecommerce ruby rails
Annotating Your Logs
We recently did some PostgreSQL performance analysis for a client with an application having some scaling problems. In essence, they wanted to know where Postgres was getting bogged down, and once we knew that we’d be able to target some fixes. But to get to that point, we had to gather a whole bunch of log data for analysis while the test software hit the site.
This is on Postgres 8.3 in a rather locked down environment, by the way. Coordinated pg_rotate_logfile() was useful, but occasionally it would seem to devolve to something resembling: “Okay, we’re adding 60 more users … now!” And I’d write down the time stamp, and figure out an appropriate place to slice the log file later.
Got me thinking, what if we could just drop an entry into the log file, and use it to filter things out later? My first instinct was to start looking at seeing if a patch would be accepted, maybe a wrapper for ereport(), something easy. Turns out, it’s even easier than that…
pubsite=# DO $$BEGIN RAISE LOG 'MARK: 60 users'; END;$$;
DO
Time: 0.464 ms
pubsite=# DO $$BEGIN RAISE LOG 'MARK: 120 users'; END;$$;
DO
Time: 0.378 ms
pubsite=# DO $$BEGIN RAISE LOG 'MARK: 360 …database performance postgres
Interactive Git: My New Found Friend(s)
As a software engineer I’m naturally inclined to be at least somewhat introverted :-), combine that with the fact that End Point is PhysicalWaterCooler challenged and you have a recipe for two things to naturally occur, 1) talking to oneself (but then who doesn’t do that really? no, really.), 2) finding friends in unusual places. Feeling a bit socially lacking after a personal residence move, I was determined to set out to find new friends, so I found one, his name is “–interactive”, or Mr. git add –interactive.
“How did we meet?” You ask. While working on a rather “long winded” project I started to notice myself sprinkling in TODOs throughout the source code, not a bad habit really (presuming they do actually eventually get fixed), but unfortunately the end result is having a lot of changed files in git that you don’t really need to commit, but at the same time don’t really need to see every time you want to review code. I’m fairly anal about reviewing code and so I was generally in the habit of running a git status followed by a git diff on every file that was mentioned by status. These are two great friends, but of late they just don’t seem to be providing the …
git
Postgres Build Farm Animal Differences
I’m a big fan of the Postgres Build Farm, a distributed network of computers that are constantly installling, building, and testing Postgres to detect any problems in the code. The build farm works best when there is a wide variety of operating systems and architectures testing. Thus, while I have a rather common x86_64 Linux box available for testing, I try to make it a little unique to get better test coverage.
One thing I’ve been working on is clang support (clang is an alternative to gcc). Unfortunately, the latest version of clang has a bug that prevents it from building Postgres on Linux boxes. I submitted a small patch to the Postgres source to fix this, but it was decided that we’ll wait until clang fixes their bug. Supposedly they have in their svn head, but I’ve not been able to get that to compile successfully.
So I also just installed gcc 4.6.0, the latest and greatest. Installing it was not easy (nasty problems with the mfpr dependencies), but it’s done now and working. It probably won’t make any difference as far as the results, but at least my box is somewhat different from all the other x86_64 Linux boxes in the farm. :)
I’ve asked before on the list (with no …
community database open-source postgres testing