A few PostgreSQL tricks
We ran into a couple of interesting situations recently, and used some helpful tricks to solve them, which of course should be recorded for posterity.
Unlogged tables
One of our customers needed a new database, created as a copy of an existing one but with some data obscured for privacy reasons. We could have done this with a view or probably any of several other techniques, but in this case, given the surrounding infrastructure, a new database, refreshed regularly from the original, was the simplest method. Except that with this new database, the regular binary backups became too large for the backup volume in the system. Since it seemed silly to re-provision the backup system (and talk the client into paying for it) to accommodate data we could throw away and recalculate at any time, we chose unlogged tables as an alternative.
“Unlogged,” in this case, means changes to this table aren’t written in WAL logs. This makes for better performance, but also means if the database crashes, these tables can’t be recovered in the usual way. As a side effect, it also means these tables aren’t copied via WAL-based replication, so the table won’t show up in a hot standby system, for instance, …
postgres
Shrink XFS partition: Almost possible with LVM
If you happen to have reached this page because you’re trying to shrink an XFS filesystem let’s put things straight: sorry, that’s not possible.
But before you go away you should know there’s still hope and I’d like to show you how I used a little workaround to avoid reinstalling a Red Hat Enterprise Linux 7 or CentOS 7 VM using XFS dump/restore on-the-fly and LVM capabilities, both standard choices for the regular RHEL/CentOS 7 setup.
First of all let’s clarify the situation I found myself in. For various reasons I had a CentOS 7 VM with everything already configured and working, installed not many days ago to test new software we’re evaluating.
The VM itself is hosted on a dedicated server we manage on our own, so I had a certain degree of freedom to what I could do without the need of paying any additional fee. You may not be in this same situation, but you can probably try some similar solution for little money if you’re using an “hourly-billed” VPS provider.
The problem was that, even if everything was working and configured, the virtual hard disk device attached to the machine was too big and on the wrong storage area of the virtualization hosting server.
There was also …
redhat linux storage sysadmin virtualization
Getting realtime output using Python Subprocess
The Problem
When I launch a long running unix process within a python script, it waits until the process is finished, and only then do I get the complete output of my program. This is annoying if I’m running a process that takes a while to finish. And I want to capture the output and display it in the nice manner with clear formatting.
Using the subprocess and shlex library
Python has a “batteries included” philosophy. I have used 2 standard libraries to solve this problem.
import subprocess
import shlex- subprocess—Works with additional processes.
- shlex—Lexical analysis of shell-style syntaxes.
subprocess.popen
To run a process and read all of its output, set the stdout value to PIPE and call communicate().
import subprocess
process = subprocess.Popen(['echo', '"Hello stdout"'], stdout=subprocess.PIPE)
stdout = process.communicate()[0]
print 'STDOUT:{}'.format(stdout)The above script will wait for the process to complete and then it will display the output. So now we are going to read the stdout line by line and display it in the console untill it completes the process.
output = process.stdout.readline()This will read a line from the stdout. …
shell python
Postgres session_replication role - Bucardo and Slony’s powerful ally
One of the lesser known Postgres parameters is also one of the most powerful: session_replication_role. In a nutshell, it allows you to completely bypass all triggers and rules for a specified amount of time. This was invented to allow replication systems to bypass all foreign keys and user triggers, but also can be used to greatly speed up bulk loading and updating.
The problem with disabling triggers
Once upon a time, there were two replication systems, Slony and Bucardo, that both shared the same problem: triggers (and rules) on a “target” table could really mess things up. In general, when you are replicating table information, you only want to replicate the data itself, and avoid any side effects. In other words, you need to prevent any “post-processing” of the data, which is what rules and triggers may do. The disabling of those was done in a fairly standard, but very ugly method: updating the system catalogs for the tables in question to trick Postgres into thinking that there were no rules or triggers. Here’s what such SQL looks like in the Bucardo source code:
$SQL = q{
UPDATE pg_class
SET …bucardo postgres replication
Social Innovation Summit 2014
In November, Josh Ausborne and I set up a Liquid Galaxy at the Sofitel Hotel in Redwood City for the 2014 Social Innovation Summit. Prior to the event the End Point content team worked together with Chris Busselle and Sasha Buscho from Google.org to create presentations featuring 10 grantee organizations.
With the Liquid Galaxy we were able to feature “Street View” panoramas of people enjoying the High Line in New York City, penguins standing on the shoreline for Penguin Foundation, and seals swimming underwater for Conservation International. The touchscreen and space navigator control device enabled users to view 360 degrees of the landscape as if they had been teleported to each location.
image credit: Jensen Sutta
I was thrilled to see the Google.org team in front of the system sharing the larger narrative associated with each project. This highlights one of the many opportunities of the Liquid Galaxy—the opportunity to share, explore and collaborate in real time, in front of an immersive array of screens.
image credit: Chris Busselle
One of the prepared presentations highlighted a data collaboration with Polaris to fight human trafficking. With Google Earth, Chris …
conference visionport
Elastic Beanstalk Whenever
I recently got the opportunity to pick up development on a Ruby on Rails application that was originally set up to run on AWS using their Elastic Beanstalk deployment tools. One of our first tasks was to move some notification hooks out of the normal workflow into scripts and schedule those batch scripts using cron.
Historically, I’ve had extremely good luck with Whenever. In my previous endeavors I’ve utilized Capistrano which Whenever merged with seamlessly. With how simple it was to integrate Whenever with Capistrano, I anticipated a similar experience dealing with Elastic Beanstalk. While the integration was not as seamless as Capistrano, I did manage to make it work.
My first stumbling block was finding documentation on how to do after or post hooks. I managed to find this forum post and this blog post which helped me out a lot. The important detail is that there is a “post” directory to go along with “pre” and “enact”, but it’s not present by default, so it can be easy to miss.
I used Marcin’s delayed_job config as a base. The first thing I had to address was an apparent change in Elastic Beanstalk’s configuration structure. Marcin’s config has
. …rails sysadmin aws
CentOS 7 on Hetzner server with more than 2 TB disk
We use a variety of hosting providers for ourselves and our clients, including Hetzner. They provide good servers for a great price, have decent support, and we’ve been happy with them for our needs.
Recently I was given the task of building out a new development server for one of our clients, and we wanted it to be set up identically to another one of their servers but with CentOS 7. I placed the order for the hardware with Hetzner and then began the procedure for installing the OS.
Hetzner provides a scripted install process that you can kick off after booting the machine into rescue mode. I followed this process and selected CentOS 7 and proceeded through the whole process without a problem. After rebooting the server and logging in to verify everything, I noticed that the disk space was capped at 2 TB, even though the machine had two 3 TB drives in it (in hardware RAID 1). I looked at the partitions and found the partition table was “msdos”. Ah ha!
At this point painful memories of running into this problem before hit me. I reviewed our notes of what we had done last time, and felt like it was worth a shot even though this time I’m dealing with CentOS 7. I went through the …
redhat devops hosting
PGConf.US NYC: "Choosing a Logical Replication System"
I’m excited to have my talk “Choosing a Logical Replication System” accepted to PGConf.US! I’ll be speaking on Friday, March 27th from 2:00 - 2:50, as part of the Strategy track.
In this talk I will cover a variety of existing Logical Replication systems for PostgreSQL and go over some of the differences between requirements, supported capabilities, and why you might choose one system over another. I’ll also cover some of the changes in PostgreSQL 9.4.
conference postgres
