OSCON so far! Filesystem information bonanza on Wednesday
Wednesday was the first official day of OSCON, and I spent it elbow deep in filesystems. The morning was kicked off with Val Aurora delivering a great overview of Btrfs, a new fileystem currently in development. Some of the features include:
- Copy on write filesystem
- Cheap, easy filesystem snapshots
- Dynamically resizable partitions
- Indexed directory structure
- Very simple administration
Val demonstrated basic functionality, including creating snapshots and creating a Btrfs filesystem on top of an ext3 filesystem. Cool stuff! The filesystem is still under heavy development, but seems very promising.
Next I saw Theodore Ts’o, the primary developer behind ext4, talk about the future of filesystems and storage. He referenced a great paper that dives deep into the economics behind SSD (solid state drives) and platter hard drive manufacturing. One interesting calculation was that even if we could convert all the silicon fabs to manufacture flash, would only be able to covert about 12% of the world-wide capacity of hard drive production. Because of this, Theodore believes that it is going to be challenging for the cost of SSDs to drop to the point where it becomes cost competitive with …
conference postgres
Gmail Contacts Notes Converter
As I mentioned previously, I recently got a Google Ion phone running Android. I recently began using it as my main mobile phone, and thus needed to finally migrate the contacts from my Nokia 6126 phone to Android.
This is apparently easy to do by first copying all the contacts from the Nokia 6126 internal memory to the SIM card, then moving the SIM card to the Ion and importing the contacts. But that only works if all your contacts fit on the SIM card. If not, they’re truncated, and you have to delete many contacts on the Nokia to fit more, which would be a nonreversable move.
Several posts describe ways to do the export and import, such as this one that didn’t really apply to my phone, and this one that involves VCF export & import which I didn’t see a way to do.
Ultimately I found an article that described Nokia’s PC Suite software that I’d never heard of before, which I downloaded on an old Windows machine and used to download the contacts from the phone via Bluetooth, then export to a CSV file and import into Gmail. So far, so good.
Except as this post and another post describe, then all the contact data showed up in a single Notes field, useless for dialing or emailing.
I …
mobile
pgGearman 0.1 release!
Yesterday, Brian Aker and Eric Day presented pgGearman: A distributed worker queue for PostgreSQL during the OSCON/SFPUG PgDay.
Gearman is a distributed worker queuing system that allows you to farm work out to a collection of servers, and basically run arbitrary operations. The example they presented was automating and distributing the load of image processing for Livejournal. For example, everyone loves to share pictures of their kittens, but once an image is uploaded, it may need to be scaled or cropped in different ways to display in different contexts. Gearman is a tool you can use to farm these types of jobs out.
So, in anticipation of the talk, I worked with Eric Day on a set of C-language user defined functions for Postgres that allow client connections to a Gearman server.
You can try out the pgGearman 0.1 release on Launchpad!
postgres
CSS @font-face in Firefox 3.5
This has been frequently mentioned around the web already, but it’s important enough that I’ll bring it up again anyway. Firefox 3.5 adds the CSS @font-face rule, which makes it possible to reference fonts not installed in the operating system of the browser, just as is done with images or other embedded content.
Technically this is not a complicated matter, but font foundries (almost all of whom have a proprietary software business model) have tried to hold it back hoping for magical DRM to keep people from using fonts without paying for them, which of course isn’t possible. As one of the original Netscape developers mentioned, if they had waited for such a thing for images, the web would still be plain-text only.
The quickest way to get a feel for the impact this change can have is to look at Ian Lynam & Craig Mod’s article demonstrating @font-face in Firefox 3.5 side-by-side with any of the other current browsers. It is exciting to finally see this ability in a mainstream browser after all these years.
browsers css
Bucardo and truncate triggers
Version 8.4 of Postgres was recently released. One of the features that hasn’t gotten a lot of press, but which I’m excited about, is truncate triggers. This fixes a critical hole in trigger-based PostgreSQL replication systems, and support for these new triggers is now working in the Bucardo replication program.
Truncate triggers were added to Postgres by Simon Riggs (thanks Simon!), and unlike other types of triggers (UPDATE, DELETE, and INSERT), they are statement-level only, as truncate is not a row-level action.
Here’s a quick demo showing off the new triggers. This is using the development version of Bucardo—a major new version is expected to be released in the next week or two that will include truncate trigger support and many other things. If you want to try this out for yourself, just run:
$ git clone git-clone http://bucardo.org/bucardo.git/
Bucardo does three types of replication; for this example, we’ll be using the ‘pushdelta’ method, which is your basic “master to slaves” relationship. In addition to the master database (which we’ll name A) and the slave database (which we’ll name B), we’ll create a third database for Bucardo itself.
$ initdb -D bcdata
$ initdb -D …
database open-source perl postgres bucardo
MDX
Recently I’ve been working with Mondrian, an open source MDX engine. MDX stands for “multi-dimensional expressions”, and is a query language used in analytical databases. In MDX, data are considered in “cubes” made up of “dimensions”, which are concepts analogous to “tables” and “columns”, respectively, in a relational database. And in MDX, much as in SQL, queries written in a special query language tell the MDX engine to return a data set by describing filters in terms of the various dimensions.
But MDX and SQL return data sets in very different ways. Whereas a SQL query will return individual rows (unless aggregate functions are used), MDX always aggregates rows. In MDX, dimensions aren’t simple fields that contain arbitrary values; they’re hierarchical objects that can be queried at different levels. And finally, in MDX only certain dimensions can be returned in a query. These dimensions are known as “Measures”.
Without an example this doubtless makes little sense at first glance. In my case, the underlying data come from a public health application. Among other responsibilities, public health departments have as their task to prevent the spread of disease. Some diseases, such …
database open-source pentaho reporting casepointer
Subverting PostgreSQL Aggregates for Pentaho
In a recent post I described MDX and a project I’m working on with the Mondrian MDX engine. In this post I’ll describe a system I implemented to overcome one of Mondrian’s limitations.
Each Mondrian measure has an associated aggregate function defined. For instance, here’s a measure from the sample data that ships with Pentaho:
<Measure name="Quantity" column="QUANTITYORDERED" aggregator="sum" />
The schema defines the database connection properties and the table this cube deals with elsewhere; this line says there’s a column called QUANTITYORDERED which Mondrian can meaningfully aggregate with the sum() function. Mondrian knows about six aggregates: count, avg, sum, min, max, and distinct-count. And therein lies the problem. In this case, the client wanted to use other aggregates such as median and standard deviation, but Mondrian didn’t provide them[1].
Mondrian uses the aggregator attribute of the measure definition to generate SQL statements exactly as you might expect. In the case of the measure above, the SQL query involving that measure would read “sum(QUANTITYORDERED)”. In our case, Mondrian is backed by a PostgreSQL database, which offers a …
postgres pentaho casepointer reporting
MTU tweak: a fix for upload pain
While traveling and staying at Hostel Tyn in Prague’s city center, I ran into a strange problem with my laptop on their wireless network.
When many people were using the network (either on the hostel’s public computers or on the wireless network), sometimes things bogged down a bit. That wasn’t a big deal and required merely a little patience.
But after a while I noticed that absolutely no “uploads” worked. Not via ssh, not via browser POST, nothing. They always hung. Even when only a file upload of 10 KB or so was involved. So I started to wonder what was going on.
As I considered trying some kind of rate limiting via iptables, I remembered somewhere hearing that occasionally you can run into mismatched MTU settings between the Ethernet LAN you’re on and your operating system’s network settings.
I checked my setup and saw something like this:
ifconfig wlan0
wlan0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
inet addr:10.x.x.x Bcast:10.x.x.x Mask:255.255.255.0
inet6 addr: fe80::xxx:xxxx:xxxx:xxxx/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1239 errors:0 dropped:0 overruns:0 frame:0
TX …
browsers environment networking