Update:
In response, Garrett Albright suggested that one could turn on auto_vacuum through a bit of a hack (nice one, too — see the comments).
I wouldn’t do that and I have explained why in a followup post.
The weblogging world is all abuzz with this simple hack which makes Mail.app faster, more or less depending on how your Mail happens to be configured and your coincidental usage history.
There seem to be quite a few questions about what is going on. I figured I’d throw out a bit of clarity to the situation.
To invoke the command, do the following:
(1) Quit Mail. This will prevent any lock contention issues.
(2) Go to Terminal and copy/paste the following line:
/usr/bin/sqlite3 ~/Library/Mail/Envelope\\ Index vacuum
The details:
/usr/bin/sqlite3 refers to the SQLite command line tool. Nir Soffer suggested that I use the absolute path to avoid any conflict with sqlite3 binaries installed by things like Fink, MacPorts, or the like. He is correct and I updated the command. It is an interactive shell that can be used to interact with a database or with multiple databases.
~/Library/Mail/Envelope\ Index refers to Mail’s envelope index, a SQLite database that Mail uses as a cross reference for all your messages, mailboxes, and accounts. The ~ means “in my home directory” and the \ is shell speak for “no, really, I meant for the next character to be treated as data, not as some special invisible Unix goop”.
vacuum is a SQLite command that will remove all unused pages and generally compact the SQLite database. You don’t need to specify a table name as the original post suggests. It has been ignored since the vacuum command came back in SQLite 2.8.1 (it was originally in 1.x and removed in 2.x, to return in 2.8.1).
So, what is really going on? The Envelope Index file dropped from 10s of megabytes down to only a couple of megabytes??? What data was deleted?
OK — here starts a bit of technical spewage regarding the details of SQLite’s internals. Not in any depth, but enough to understand what is happening.
SQLite is, as its name implies, a SQL based database storage solution. The “lite” part is that it is designed to be embedded into an application and to store data pretty much directly to/from disk, not electing to use the more common client/server two or three-tiered architecture of a traditional relational store.
Unlike most database solutions, there is no server as a part of SQLite. Internally to your application you call — or you tell Core Data to call — some SQLite functions that go off and deal with the database. These are designed to be very fast and the whole thing is designed to have minimal impact on your application. So, just as there is no random external server process, SQLite doesn’t spawn threads — worker bees that can operate independently in an app — to do various tasks.
Now, a database is designed to retrieve data very quickly. As such, the data written to disk is generally written in a highly structured fashion such that scanning and retrieving subsets of the data can be done quickly and can be optimized– or indexed– to make certain kinds of queries very fast.
Whenever new data is written into such a data store or existing data is updated, the structure of the data in the store has to be adjusted to optimally represent the data now contained to maintain query performance.
SQLite happens to use a storage structure derived from B-Trees that uses pages — equally sized chunks of the overall data file — to store the data. As the data in a database is updated, the pages are re-organized and, occasionally, an update of the database will cause a page to be orphaned. That is, a page will no longer be used but, because of the need to optimize the queries, it won’t be reused either.
So, the vacuum command effectively copies each and every in-use page in the database file to a new copy of the database file. As it does so, it omits all unused pages and it optimizes the fresh copy such that query performance will be as fast as possible.
If this is the case, why doesn’t Mail periodically perform a vacuum command?
There are a number of reasons why you would avoid it. Most importantly, it can take a really long time and there is no way to predict how long it might run or what the savings might be. Not exactly the kind of thing you want to subject the user too if the general performance profile is “good enough” (and it mostly is).
And this is a problem that really should be addressed in the database itself. SQLite should be responsible for appropriately recycling unused pages and generally optimizing the database file as it goes. By doing so, it should be able to amortize the potentially large cost of a periodic vacuum across each INSERT or UPDATE of the database.
And that is exactly what SQLite has done in more recent versions. SQLite now has an auto-vacuum feature that will cause the database to effectively stay “vacuumed” of free pages automatically and without need to periodically do a rather intrusive copy-and-fix operation.
It can be turned on or off on a per-database basis for optimization purposes. But only at database creation time. If you have a database into which data will only ever be added — a logging application, for example — you would likely turn off auto-vacuum as it will just incur overhead without ever being able to shrink the database.
There is another useful command you can use at the command line in a similar fashion. During the development of Tiger, I lived by this command as there were a couple of periods of time where the filesystem had embraced chaos when it came to writing data.
/usr/bin/sqlite3 ~/Library/Mail/Envelope\\ Index 'PRAGMA integrity_check'
'PRAGMA integrity_check' causes SQLite to walk the entire internal structure of the database and make sure everything makes sense (as much as possible). The 's at the beginning and end of the SQL stuff tell the shell (the thing in Terminal that interprets what you type, for all intents and purposes) to pass 'PRAGMA integrity_check' as a single argument to sqlite3 command, instead of breaking it into two arguments separated by a space. Now, sqlite3 could be smart enough to deal with this. But it isn’t. Such is the unix way.
SQLite’s integrity checker will look for out-of-order records, missing pages, malformed records, and/or corrupt indices. If there are none, the command emits a single line saying “ok” (sorry about the mistake before — chalk it up to an unfortunate Terminal related issue).
If you happen to execute the above command and it emits any output save for “ok”, recovering from the problem is really quite terribly easy for Mail. Just delete the Envelope Index:
rm ~/Library/Mail/Envelope\\ Index
When deleted, Tiger’s Mail will re-import all of the messages as if you just migrated from Panther. Nothing lost save for a bit of time, potentially quite a bit of time if you have a particularly large set of Mail.
Obviously, if a SQLite database contains unrecoverable data, the above would be a very bad idea. Recovering data in such a situation is very difficult.
Update: Yeah– I hate Unix. When I wrote the post, I had included the appropriate “\” escaping of all the commands and even called out why they are necessary. Apparently, either Wordpress or MarsEdit ate the damned slashes. Grr. Fixed. Thanks for pointing out the error.
BoingBoing has picked it up, perpetuating the incorrect command (works, but the table name is not necessary — no big deal). The hawk wing site now has a couple of scripts that nicely automate the process.
Of course, this will work for any SQLite application that uses the built in SQLite on Tiger and does not periodically vacuum it. For example, Steve Weller has tossed together a script to update the sqlite database in Aperture’s library.