PostgreSQL template databases to restore to a known state
Someone asked on the mailing lists recently about restoring a PostgreSQL database to a known state for testing purposes. How to do this depends a little bit on what one means by “known state”, so let’s explore a few scenarios and their solutions.
First, let’s assume you have a Postgres cluster with one or more databases that you create for developers or QA people to mess around with. At some point, you want to “reset” the database to the pristine state it was in before people starting making changes to it.
The first situation is that people have made both DDL changes (such as ALTER TABLE … ADD COLUMN) and DML changes (such as INSERT/UPDATE/DELETE). In this case, what you want is a complete snapshot of the database at a point in time, which you can then restore from. The easiest way to do this is to use the TEMPLATE feature of the CREATE DATABASE command.
Every time you run CREATE DATABASE, it uses an already existing database as the “template”. Basically, it creates a copy of the template database you specify. If no template is specified, it uses “template1” by default, so that these two commands are equivalent:
CREATE DATABASE foobar;
CREATE DATABASE foobar …
database postgres testing
Using PostgreSQL Hooks
PostgreSQL is well known for its extensibility; users can build new functions, operators, data types, and procedural languages, among others, without having to modify the core PostgreSQL code. Less well known is PostgreSQL’s extensive set of “hooks”, available to the more persistent coder. These hooks allow users to interrupt and modify behavior in all kinds of places without having to rebuild PostgreSQL.
Few if any of these hooks appear in the documentation, mostly because the code documents them quite well, and anyone wanting to use them is assumed already to be sufficiently familiar with the code to find the information they’d need to use one. For those interested in getting started using hooks, though, an example can be useful. Fortunately, the contrib source provides one, in the form of passwordcheck, a simple contrib module that checks users’ passwords for sufficient strength. These checks include having a length greater than 8 characters, being distinct from the username, and containing both alphabetic and non-alphabetic characters. It can also use CrackLib for more intense password testing, if built against the CrackLib code.
In general, these hooks consist of global …
postgres
Tail_n_Mail does Windows (log file monitoring)
I’ve just released version 1.10.1 of tail_n_mail.pl, the handy script for watching over your Postgres logs and sending email when interesting things happen.
Much of the recent work on tail_n_mail has been in improving the parsing of statements in order to normalize them and give reports like this:
[1] From files A to Q Count: 839
First: [A] 2010-05-08T05:10:46-05:00 alpha postgres[13567]
Last: [Q] 2010-05-09T05:02:27-05:00 bravo postgres[19334]
ERROR: duplicate key violates unique constraint "unique_email_address"
STATEMENT: INSERT INTO email_table (id, email, request, token) VALUES (?)
[2] From files C to E (between lines 12523 of A and 268431 of B, occurs 6159 times)
First: [C] 2010-05-04 16:32:23 UTC [22504]
Last: [E] 2010-05-05 05:04:53 UTC [23907]
ERROR: invalid byte sequence for encoding "UTF8": 0x????
HINT: This error can also happen if the byte sequence does not
match the encoding expected by the server, which is controlled
by "client_encoding".
## The above examples are from two separate instances, the first
## of which has the "find_line_number" option turned off
However, I’ve only ever used tail_n_mail on Linux-like systems, so it …
database monitoring postgres
Cassandra, Thrift, and Fibers in EventMachine
I’ve been working with Cassandra and EventMachine lately, in an attempt to maximize write throughput for bulk loading situations (and I would prefer to not abandon the pretty Ruby classes I have fronting Cassandra, hence EventMachine rather than hopping over to Java or Scala).
The Thrift client transport for EventMachine requires the use of fibers. The documentation available for how fibers and EventMachine interact is not all that clear just yet, so perhaps documenting my adventures will be of use to somebody else.
A single fiber is traditionally imperative
EventMachine puts the I/O on background threads, but your use of the I/O interface will interact with it as if it’s a traditional blocking operation.
#!/usr/bin/env ruby
require 'eventmachine'
require 'thrift_client'
require 'thrift_client/event_machine'
require 'cassandra'
def get_client
Cassandra.new('Keyspace1',
'127.0.0.1:9160',
:transport_wrapper => nil,
:transport => Thrift::EventMachineTransport)
end
def write(client, key, hash)
puts "Writing #{key}."
client.insert('Standard1', key, hash …
ruby scalability tips
PostgreSQL startup Debian logging failure
I ran into issues with debugging why a fresh PostgreSQL replica wasn’t starting on Debian. This was with a highly-customized postgresql.conf file with custom logging location, data_directory, etc. set.
The system log files were not showing any information about the failed pg_ctlcluster output, nor was there any information in /var/log/postgresql/ or the defined log_directory.
I was able to successfully create a new cluster with pg_createcluster and see logs for the new cluster in /var/log/postgresql/. The utility pg_lsclusters showed both clusters in the listing, but the initial cluster was still down, showing up with a custom log location. After reviewing the Debian wrapper scripts (fortunately written in Perl) I disabled log_filename, log_directory, and logging_collector, leaving log_destination = stderr. I was then finally able to get log information spit out to the terminal.
In this case, it was due to a fresh Amazon EC2 instance lacking appropriate sysctl.conf settings for kernel.shmmax and kernel.shmall. This particular error occurred before the logging was fully set up, which is why we did not get logging information in the postgresql.conf-designated location.
Once I had the …
postgres
Tickle me Postgres: Tcl inside PostgreSQL with pl/tcl and pl/tclu
Although I really love Pl/Perl and find it the most useful language to write PostgreSQL functions in, Postgres has had (for a long time) another set of procedural languages: Pl/Tcl and Pl/TclU. The Tcl language is pronounced “tickle”, so those two languages are pronounced as “pee-el-tickle” and “pee-el-tickle-you”. The pl/tcl languages have been around since before any others, even pl/perl; for a long time in the early days of Postgres using pl/tclu was the only way to do things “outside of the database”, such as making system calls, writing files, sending email, etc.
Sometimes people are surprised when they hear I still use Tcl. Although it’s not as widely mentioned as other procedural languages, it’s a very clean, easy to read, powerful language that shouldn’t be overlooked. Of course, with Postgres, you have a wide variety of languages to write your functions in, including:
The nice thing about Tcl is that not only is it an easy language to write in, it’s fully supported by Postgres. Only three languages are maintained inside the Postgres tree itself: Perl, Tcl, …
database postgres
LinuxFest Northwest: PostgreSQL 9.0 upcoming features
Once again, LinuxFest Northwest provided a full track of PostgreSQL talks during their two-day conference in Bellingham, WA.
Gabrielle Roth and I presented our favorite features in 9.0, including a live demo of Hot Standby with streaming replication! We also demonstrated features like:
- the newly improved ‘set storage MAIN’ behavior (TOAST related)
- ‘samehost’ and ‘samenet’ designations to pg_hba.conf (see CIDR-address section)
- Log changed parameter values when postgresql.conf is reloaded
- Allow EXPLAIN output in XML, JSON, and YAML formats (which our own Greg Sabino Mullane worked on!
- Allow NOTIFY to pass an optional string to listeners
- And of course—Hot Standby and Streaming Replication
The full feature list is available at on the developer site right now!
postgres
Viewing Postgres function progress from the outside
Getting visibility into what your PostgreSQL function is doing can be a difficult task. While you can sprinkle notices inside your code, for example with the RAISE feature of plpgsql, that only shows the notices to the session that is currently running the function. Let’s look at a solution to peek inside a long-running function from any session.
While there are a few ways to do this, one of the most elegant is to use Postgres sequences, which have the unique property of living “outside” the normal MVCC visibility rules. We’ll abuse this feature to allow the function to update its status as it goes along.
First, let’s create a simple example function that simulates doing a lot of work, and taking a long time to do so. The function doesn’t really do anything, of course, so we’ll throw some random sleeps in to emulate the effects of running on a busy production machine. Here’s what the first version looks like:
DROP FUNCTION IF EXISTS slowfunc();
CREATE FUNCTION slowfunc()
RETURNS TEXT
VOLATILE
SECURITY DEFINER
LANGUAGE plpgsql
AS $BC$
DECLARE
x INT = 1;
mynumber INT;
BEGIN
RAISE NOTICE 'Start of function';
WHILE x <= 5 LOOP
-- Random number from 1 to 10 …
database postgres