Parallel Inventory Access using PostgreSQL
Inventory management has a number of challenges. One of the more vexing issues with which I’ve dealt is that of forced serial access. We have a product with X items in inventory. We also have multiple concurrent transactions vying for that inventory. Under any normal circumstance, whether the count is a simple scalar, or is comprised of any number of records up to one record/quantity, the concurrent transactions are all going to hone in on the same record, or set of records. In doing so, all transactions must wait and get their inventory serially, even if doing so isn’t of interest.
If inventory is a scalar value, we don’t have much hope of circumventing the problem. And, in fact, we wouldn’t want to under that scenario because each transaction must reflect the part of the whole it consumed so that the next transaction knows how much is left to work with.
However, if we have inventory represented with one record = one quantity, we aren’t forced to serialize in the same way. If we have multiple concurrent transactions vying for inventory, and the sum of the need is less than that available, why must the transactions wait at all? They would normally line up serially because, no …
postgres ecommerce
Why is my function slow?
I often hear people ask “Why is my function so slow? The query runs fast when I do it from the command line!” The answer lies in the fact that a function’s query plans are cached by Postgres, and the plan derived by the function is not always the same as shown by an EXPLAIN from the command line. To illustrate the difference, I downloaded the pagila test database. To show the problem, we’ll need a table with a lot of rows, so I used the largest table, rental, which has the following structure:
pagila# \d rental
Table "public.rental"
Column | Type | Modifiers
--------------+-----------------------------+--------------------------------
rental_id | integer | not null default nextval('rental_rental_id_seq')
rental_date | timestamp | not null
inventory_id | integer | not null
customer_id | smallint | not null
return_date | timestamp |
staff_id | smallint | not null
last_update | timestamp | not null default now()
Indexes:
"rental_pkey" PRIMARY KEY (rental_id)
"idx_unq_rental" UNIQUE (rental_date, inventory_id, customer_id)
"idx_fk_inventory_id" …postgres
Best practices for cron
Cron is a wonderful tool, and a standard part of all sysadmins toolkit. Not only does it allow for precise timing of unattended events, but it has a straightforward syntax, and by default emails all output. What follows are some best practices for writing crontabs I’ve learned over the years. In the following discussion, “cron” indicates the program itself, “crontab” indicates the file changed by “crontab -e”, and “entry” begin a single timed action specified inside the crontab file. Cron best practices:
Version control
This rule is number one for a reason. Always version control everything you do. It provides an instant backup, accountability, easy rollbacks, and a history. Keeping your crontabs in version control is slightly more work than normal files, but all you have to do is pick a standard place for the file, then export it with crontab -l > crontab.postgres.txt. I prefer RCS for quick little version control jobs like this: no setup required, and everything is in one place. Just run: ci -l crontab.postgres.txt and you are done. The name of the file should be something like the example shown, indicating what it is (a crontab file), which one it is (belongs to the user …
sysadmin
Creating a PL/Perl RPM linked against a custom Perl build (updated)
I recently needed to refer to a post I made on March 7, 2007, showing how to build a PL/Perl RPM linked against a custom Perl build. A few things have changed since that time, so I’ve reworked it here, updated for local Perl 5.10.0 built into RPMs:
We sometimes have to install a custom Perl build without thread support, and to have some specific newer and/or older versions of CPAN modules, and we don’t want to affect the standard distribution Perl that lives in /usr/bin/perl and /usr/lib/perl5. We use standard PGDG RPMs to install PostgreSQL. We also use PL/Perl, and want PL/Perl to link against our custom Perl build in /usr/local/bin and /usr/local/lib/perl5.
It’s easy to achieve this with a small patch to the source RPM spec file:
--- postgresql-8.3.spec 2008-10-31 17:34:34.000000000 +0000
+++ postgresql-8.3.custom.spec 2008-11-30 02:10:09.000000000 +0000
@@ -315,6 +315,7 @@
CFLAGS=`echo $CFLAGS|xargs -n 1|grep -v ffast-math|xargs -n 100`
export LIBNAME=%{_lib}
+export PATH=/usr/local/bin:$PATH
%configure --disable-rpath \
%if %beta
--enable-debug \
@@ -322,6 +323,7 @@
%endif
%if %plperl
--with-perl \
+ …perl redhat sysadmin postgres
Multiple reverse DNS pointers per IP address
I recently ran across an IP address that had two PTR (reverse DNS) records in DNS. I’ve always thought that each IP address is limited to only a single PTR record, and I’ve seen this rule enforced by many ISPs, but I don’t remember ever seeing it conclusively stated.
I was going to note the problem to the responsible person but thought it’d be good to test my assumption first. Lo and behold, it’s not true. The Wikipedia “Reverse DNS lookup” page and a source it cites, an IETF draft on reverse DNS, note that multiple PTR records per IP address have always been allowed.
There is apparently plenty of software out there that can’t properly deal with more than one PTR record per IP address, and with too many PTR records, a DNS query response will no longer fit inside a single UDP packet, forcing a TCP response instead, which can cause trouble of its own. And as I noted, many ISPs won’t allow more than one PTR record, so in those cases it’s an academic question.
But it’s not invalid, and I saved myself and someone else a bit of wasted time by doing a quick bit of research. It was a good reminder of the value of checking assumptions.
networking
OpenSQL Camp 2008
I attended the OpenSQL Camp last weekend, which ran Friday night to Sunday, November 14-16th. This was the first “unconference” I had been to, and Baron Schwartz did a great job in pulling this all together. I drove down with Bruce Momjian who said that this is the first cross-database conference of any kind since at least the year 2000.
The conference was slated to start at 6 pm, and Bruce and I arrived at our hotel a few minutes before then. Our hotel was at one end of the Charlottesville Downtown Mall, and the conference was at the other end, so we got a quick walking tour of the mall. Seems like a great place—lots of shops, people walking, temporary booths set out, outdoor seating for the restaurants. It reminded me a lot of Las Ramblas, but without the “human statue” performance artists. Having a hotel within walking distance of a conference is a big plus in my book, and I’ll go out of my way to find one.
The first night was simply mingling with other people and designing the next day’s sessions. There was a grid of talk slots on a wall, with large sticky notes stuck to some of them to indicate already-scheduled sessions. Next to the grid were two sections, where people …
conference database postgres mysql
Google Sponsored AFS Hack-A-Thon
Day One:
Woke up an hour early, due to having had a bit of confusion as to the start time (the initial email was a bit optimistic as to what time AFS developers wanted to wake up for the conference).
Met up with Mike Meffie (an AFS Developer of Sine Nomine) and got a shuttle from the hotel to the ‘Visitors Lobby’; only to find out that each building has a visitors lobby. One neat thing, Google provides free bikes (beach cruisers) to anyone who needs them. According to the receptionist, any bike that isn’t locked down is considered public property at Google. However, it’s hard to pedal a bike and hold a briefcase; so off we went hiking several blocks to the correct building. Mike was smart enough to use a backpack, but hiked with me regardless.
The food was quite good, a reasonably healthy breakfast including fresh fruit (very ripe kiwi, and a good assortment). The coffee was decent as well! After much discussion, it was decided that Mike & I would work towards migrating the community CVS repository over to Git. Because Git sees the world as ‘patch sets’ instead of just individual file changes, migrating it from a view of the ‘Deltas’ makes the most sense. The new Git repo. …
openafs git community conference
10,000 databases on a PostgreSQL cluster
One of our clients wanted confirmation that PostgreSQL will have no problem handling 1000 or 2000 databases on a single database cluster. I remember testing some years ago, probably on Postgres 7.2 or 7.3, creating 1000 or so databases and finding that it worked fine. But that was a long time ago, the software has changed, and I thought I should make sure my old experiment results still stand.
There’s a PostgreSQL FAQ question, “What is the maximum size for a row, a table, and a database?” but no mention of the maximum number (or more importantly, maximum practical number) of databases per cluster. So I threw together a test script to create 10,000 databases, each with between (randomly) 1-5 tables with 2 columns each (INTEGER and TEXT), each getting randomly between 1-10 inserts with random data up to 100 or so characters in the TEXT field.
I ran the test on PostgreSQL 8.1, the default that ships with Red Hat Enterprise Linux 5 (x86_64). The hardware was a desktop-class HP with an Intel Core 2 @ 1.86 GHz that wasn’t always idle.
The short answer: Postgres 8.1 handles 10,000 databases just fine. \l in psql generates a long list of databases, of course, but returns quickly enough. …
database postgres