Efficiency of find -exec vs. find | xargs
This is a quick tip for anyone writing a cron job to purge large numbers of old files.
Without xargs, this is a pretty common way to do such a purge, in this case of all files older than 31 days:
find /path/to/junk/files -type f -mtime +31 -exec rm -f {} \;
But that executes rm once for every single file to be removed, which adds a ton of overhead just to fork and exec rm so many times. Even on modern operating systems that are so efficient with fork, it can easily increase the I/O and load and runtime by 10 times or more than just running a single rm command with a lot of file arguments.
Instead do this:
find /path/to/junk/files -type f -mtime +31 -print0 | xargs -0 -r rm -f
That will run xargs once for each very long list of files to be removed, so the overhead of fork & exec is incurred very rarely, and the job can spend most of its effort actually unlinking files. (The xargs -r option says not to run the command if there is no input to xargs.)
How long can the argument list to xargs be? It depends on the system, but xargs –show-limits will tell us. Here’s output from a RHEL 5 x86_64 system (using findutils 4.2.27):
% xargs --show-limits
Your environment …
hosting optimization
PostgreSQL: per-version .psqlrc
File this under “you learn something new every day.” I came across this little tidbit while browsing the source code for psql: you can have a per-version .psqlrc file which will be executed only by the psql associated with that major version. Just name the file .psqlrc-$version, substituting the major version for the $version token. So for PostgreSQL 8.4.4, it would look for a file named .psqlrc-8.4.4 in your $HOME directory.
It’s worth noting that the version-specific .psqlrc file requires the full minor version, so you cannot currently define (say) an 8.4-only version which applies to all 8.4 psqls. I don’t know if this feature gets enough mileage to make said modification worth it, but it would be easy enough to just use a symlink from the .psqlrc-$majorversion to the specific .psqlrc file with minor version.
This seems of most interest to developers, who may simultaneously run many versions of psql which may have incompatible settings, but also could come in handy to regular users as well.
postgres
PostgreSQL: Dynamic SQL Function
Sometimes when you’re doing something in SQL, you find yourself doing something repetitive, which naturally lends itself to the desire to abstract out the boring parts. This pattern is often prevalent when doing maintenance-related tasks such as creating or otherwise modifying DDL in a systematic kind of way. If you’ve ever thought, “Hey, I could write a query to handle this,” then you’re probably looking for dynamic SQL.
The standard approach to using dynamic SQL in PostgreSQL is plpgsql’s EXECUTE function, which takes a text argument as the SQL statement to execute. One technique fairly well-known on the #postgresql IRC channel is to create a function which essentially wraps the EXECUTE statement, commonly known as exec(). Here is the definition of exec():
CREATE FUNCTION exec(text) RETURNS text AS $$ BEGIN EXECUTE $1; RETURN $1; END $$ LANGUAGE plpgsql;
Using exec() then takes the form of a SELECT query with the appropriately generated query to be executed passed as the sole argument. We return the generated query text as an ease in auditing the actually executed results. Some examples:
SELECT exec('CREATE TABLE partition_' || generate_series(1,100) || ' (LIKE …
postgres
Localize $@ in DESTROY
I have been conditioned now for many years in Perl to trust the relationship of $@ to its preceding eval. The relationship goes something like this: if you have string or block eval, immediately after its execution, $@ will either be false or it will contain the die message of that eval (or the generic “Died at …” message if none is provided). Implicit here is that evals contained within an eval have their effects on $@ concealed, unless the containing eval “passes on” the inner eval’s die.
To quickly demonstrate:
use strict;
use warnings;
eval {
print "Some stuff\n";
eval {
die 'Oops. Bad inner eval';
};
printf '$@ in outer eval: %s', $@;
};
printf '$@ after outer eval: %s', $@;
print $/;
produces the following output:
[mark@sokt ~]$ perl demo.pl
Some stuff
$@ in outer eval: Oops. Bad inner eval at demo.pl line 7.
$@ after outer eval:
[mark@sokt ~]$
Only if the containing eval itself dies do we find any data in $@:
use strict;
use warnings;
eval {
print "Some stuff\n";
eval {
die 'Oops. Bad inner eval';
};
printf '$@ in outer eval: %s', $@;
die 'Uh oh. Bad …
perl tips
PostgreSQL: Migration Support Checklist
A database migration (be it from some other database to PostgreSQL, or even from an older version of PostgreSQL to a nice shiny new one) can be a complicated procedure with many details and many moving parts. I’ve found it helpful to construct a list of questions in order to make sure that you’re considering all aspects of the migrations and gauge the scope of what will be involved. This list includes questions we ask our clients; feel free to contribute your own additional considerations or suggestions.
Technical questions:
-
Database servers: How many database servers do you have? For each, what are the basic system specifications (OS, CPU architecture, 32- vs 64-bit, RAM, disk, etc)? What kind of storage are you using for the existing database, and what do you plan to use for the new database? Direct-attached storage (SAS, SATA, etc.), SAN (what vendor?), or other? Do you use any configuration management system such as Puppet, Chef, etc.?
-
Application servers and other remote access: How many application servers do you have? For each, what are the basic system specifications (OS, CPU architecture, 32- vs 64-bit, RAM, disk, etc)? Do you use any configuration management system …
database postgres scalability
Spree: Working with Sample Product Data
It’s taken me a bit of time to gain a better understanding of working with Spree sample data or fixtures, but now that I am comfortable with it I thought I’d share some details. The first thing you might wonder is why should you even care about sample data? Well, in our project, we had a few motivations for creating sample data:
- Multiple developers, consistent sample data provides consistency during development. End Point offers SpreeCamps, a hosting solution that combines the open source Spree technology with devcamps to allow multiple development and staging instances of a Spree application. In a recent project, we had a two developers working on different aspects of the custom application in SpreeCamps; creating meaningful sample data allowed each developer to work from the same data starting point.
- Unit testing. Another important element of our project includes adding unit tests to test our custom functionality. Consistent test sample data gave us the ability to test individual methods and functionality with confidence.
- Application testing. In addition to unit testing, adding sample data gives the ability to efficiently test the application repeatedly with fresh sample data. …
ecommerce rails spree
Why is my load average so high?
One of the most common ways people notice there’s a problem with their server is when Nagios, or some other monitoring tool, starts complaining about a high load average. Unfortunately this complaint carries with it very little information about what might be causing the problem. But there are ways around that. On Linux, where I spend most of my time, the load average represents the average number of process in either the “run” or “uninterruptible sleep” states. This code snippet will display all such processes, including their process ID and parent process ID, current state, and the process command line:
#!/bin/sh
ps -eo pid,ppid,state,cmd |\
awk '$3 ~ /[RD]/ { print $0 }'
Most of the time, this script has simply confirmed what I already anticipated, such as, “PostgreSQL is trying to service 20 times as many simultaneous queries as normal.” On occasion, however, it’s very useful, such as when it points out that a backup job is running far longer than normal, or when it finds lots of “[pdflush]” operations in process, indicating that the system was working overtime to write dirty pages to disk. I hope it can be similarly …
environment hosting monitoring optimization
Spree vs. Magento: Feature List Revisited
A little over a month ago, I wrote an article on Spree vs. Magento Features. Recently, a client asked me to describe the features mentioned in that article. I thought this was another great opportunity to expand on my response to the client. So, here I am, revisiting ecommerce features in Spree and Magento. The original article can be referenced to compare availability of these features in Spree and Magento.
Features on a Single Product or Group of Product
- Product reviews and/or ratings: functionality to allow customers to review and rate products. See a Backcountry.com product page for an example.
- Product QnA: functionality allow customers to ask and answer questions on products. See a Backcountry.com product page for an example.
- Product SEO (URL, title, meta data control): functionality to allow site administrators to manage product URLs, product page titles, and product meta data.
- Advanced/flexible taxonomy: functionality to build a custom taxonomy / navigation structure for product browsing. For example, build multiple categories and subcategories with complex hierarchy. The taxonomy at Spree’s demo includes two categories of brand and category and subcategories in each.
- SEO …
ecommerce rails spree cms magento localization