PostgreSQL


Optimization & Tuning


pglogd

I was looking at my Webalizer report, when for some random reason I wondered, "Hey, I wonder if Crystal Reports can do a better job?". Seems like CR8.5 can read Apache log files. The advantage is that it becomes very easy to do multiple groupings, formulas, sweet formating, pie charts, bar graphs, etc. The only thing it didn't want to find in the log file was the user agent, and possibly the referrer. Would be very nice to get those stats as well, so I did a very quick google search, and found pglogd. Basically the concept is that you tell Apache it's logs (via the CustomLog directive) to write to a socket in /tmp. plogd listens on the other end of this socket, parses each log line, and inserts it into PostgreSQL. Ends up being very fast, especially once fsync is turned off in PostgreSQL. And, it'll write to an overflow log file on disk, so that even if PostgreSQL can't keep up, the webserver can go back to work.

http://www.digitalstratum.com/pglogd/

I'd set this up, except I first need a script to parse my existing log files, and dump them into PostgreSQL.

--Wim


I've been working on a little project that's using Embperl2, mod_perl, PostgreSQL, DBI, and Apache::SessionX. It's been so long since I've used these that every once in a while I'll get rehungup on stuff I've fought with before.

Just now I was fighting with getting a foreign key constraint working. Ended up that I totally forget that in one of the PostgreSQL releases from the last year or two, the following behaviour was changed. The SERIAL datatype no longer means: * INTEGER PRIMARY KEY NOT NULL default nextval('some_sequence'::text)

But instead:

Which simply means that you need to manually make it a primary key. Not a big deal, the habit of taking the simple quick route is still ingrained from long ago.


Remember to VACUUM properly

One big thing I figured out tonight is how to drastically reduce the amount of space required by a table. In the database world, you typically always want things to take up as little space as possible in order to speed up any database operation. There are some situations where you don't mind leaving some free space scattered here and there to allow for future inserts. That said, for 99% of the situations, being able to go through and clean house periodically is a good thing. This means defragmenting the database, as lots of inserts/updates/deletes can leave things quite messy/slow.

PostgreSQL has a command called VACUUM that runs periodically and rebuilds things on the fly. Depending on your operating system or Linux distribution, this may run weekly, daily, or several times a day, or worse yet, not at all.

A normal VACUUM updates the table in such a way that the tables are not locked and are still operational. This does a pretty good job... Anyways, what I discovered is that their's an extra FULL option that forces it to lock the table (so that other users can't update/view it), then does a much more thorough job. This is quite useful when you have a database with 2+ million records that you need to update.


Back to Databases