Vacuuming Mail’s Envelope Index to make Mail faster (and Aperture, too)
Update:
In response, Garrett Albright suggested that one could turn on auto_vacuum through a bit of a hack (nice one, too — see the comments).
I wouldn’t do that and I have explained why in a followup post.
The weblogging world is all abuzz with this simple hack which makes Mail.app faster, more or less depending on how your Mail happens to be configured and your coincidental usage history.
There seem to be quite a few questions about what is going on. I figured I’d throw out a bit of clarity to the situation.
To invoke the command, do the following:
(1) Quit Mail. This will prevent any lock contention issues.
(2) Go to Terminal and copy/paste the following line:
/usr/bin/sqlite3 ~/Library/Mail/Envelope\\ Index vacuum
The details:
/usr/bin/sqlite3 refers to the SQLite command line tool. Nir Soffer suggested that I use the absolute path to avoid any conflict with sqlite3 binaries installed by things like Fink, MacPorts, or the like. He is correct and I updated the command. It is an interactive shell that can be used to interact with a database or with multiple databases.
~/Library/Mail/Envelope\ Index refers to Mail’s envelope index, a SQLite database that Mail uses as a cross reference for all your messages, mailboxes, and accounts. The ~ means “in my home directory” and the \ is shell speak for “no, really, I meant for the next character to be treated as data, not as some special invisible Unix goop”.
vacuum is a SQLite command that will remove all unused pages and generally compact the SQLite database. You don’t need to specify a table name as the original post suggests. It has been ignored since the vacuum command came back in SQLite 2.8.1 (it was originally in 1.x and removed in 2.x, to return in 2.8.1).
So, what is really going on? The Envelope Index file dropped from 10s of megabytes down to only a couple of megabytes??? What data was deleted?
OK — here starts a bit of technical spewage regarding the details of SQLite’s internals. Not in any depth, but enough to understand what is happening.
SQLite is, as its name implies, a SQL based database storage solution. The “lite” part is that it is designed to be embedded into an application and to store data pretty much directly to/from disk, not electing to use the more common client/server two or three-tiered architecture of a traditional relational store.
Unlike most database solutions, there is no server as a part of SQLite. Internally to your application you call — or you tell Core Data to call — some SQLite functions that go off and deal with the database. These are designed to be very fast and the whole thing is designed to have minimal impact on your application. So, just as there is no random external server process, SQLite doesn’t spawn threads — worker bees that can operate independently in an app — to do various tasks.
Now, a database is designed to retrieve data very quickly. As such, the data written to disk is generally written in a highly structured fashion such that scanning and retrieving subsets of the data can be done quickly and can be optimized– or indexed– to make certain kinds of queries very fast.
Whenever new data is written into such a data store or existing data is updated, the structure of the data in the store has to be adjusted to optimally represent the data now contained to maintain query performance.
SQLite happens to use a storage structure derived from B-Trees that uses pages — equally sized chunks of the overall data file — to store the data. As the data in a database is updated, the pages are re-organized and, occasionally, an update of the database will cause a page to be orphaned. That is, a page will no longer be used but, because of the need to optimize the queries, it won’t be reused either.
So, the vacuum command effectively copies each and every in-use page in the database file to a new copy of the database file. As it does so, it omits all unused pages and it optimizes the fresh copy such that query performance will be as fast as possible.
If this is the case, why doesn’t Mail periodically perform a vacuum command?
There are a number of reasons why you would avoid it. Most importantly, it can take a really long time and there is no way to predict how long it might run or what the savings might be. Not exactly the kind of thing you want to subject the user too if the general performance profile is “good enough” (and it mostly is).
And this is a problem that really should be addressed in the database itself. SQLite should be responsible for appropriately recycling unused pages and generally optimizing the database file as it goes. By doing so, it should be able to amortize the potentially large cost of a periodic vacuum across each INSERT or UPDATE of the database.
And that is exactly what SQLite has done in more recent versions. SQLite now has an auto-vacuum feature that will cause the database to effectively stay “vacuumed” of free pages automatically and without need to periodically do a rather intrusive copy-and-fix operation.
It can be turned on or off on a per-database basis for optimization purposes. But only at database creation time. If you have a database into which data will only ever be added — a logging application, for example — you would likely turn off auto-vacuum as it will just incur overhead without ever being able to shrink the database.
There is another useful command you can use at the command line in a similar fashion. During the development of Tiger, I lived by this command as there were a couple of periods of time where the filesystem had embraced chaos when it came to writing data.
/usr/bin/sqlite3 ~/Library/Mail/Envelope\\ Index 'PRAGMA integrity_check'
'PRAGMA integrity_check' causes SQLite to walk the entire internal structure of the database and make sure everything makes sense (as much as possible). The 's at the beginning and end of the SQL stuff tell the shell (the thing in Terminal that interprets what you type, for all intents and purposes) to pass 'PRAGMA integrity_check' as a single argument to sqlite3 command, instead of breaking it into two arguments separated by a space. Now, sqlite3 could be smart enough to deal with this. But it isn’t. Such is the unix way.
SQLite’s integrity checker will look for out-of-order records, missing pages, malformed records, and/or corrupt indices. If there are none, the command emits a single line saying “ok” (sorry about the mistake before — chalk it up to an unfortunate Terminal related issue).
If you happen to execute the above command and it emits any output save for “ok”, recovering from the problem is really quite terribly easy for Mail. Just delete the Envelope Index:
rm ~/Library/Mail/Envelope\\ Index
When deleted, Tiger’s Mail will re-import all of the messages as if you just migrated from Panther. Nothing lost save for a bit of time, potentially quite a bit of time if you have a particularly large set of Mail.
Obviously, if a SQLite database contains unrecoverable data, the above would be a very bad idea. Recovering data in such a situation is very difficult.
Update: Yeah– I hate Unix. When I wrote the post, I had included the appropriate “\” escaping of all the commands and even called out why they are necessary. Apparently, either Wordpress or MarsEdit ate the damned slashes. Grr. Fixed. Thanks for pointing out the error.
BoingBoing has picked it up, perpetuating the incorrect command (works, but the table name is not necessary — no big deal). The hawk wing site now has a couple of scripts that nicely automate the process.
Of course, this will work for any SQLite application that uses the built in SQLite on Tiger and does not periodically vacuum it. For example, Steve Weller has tossed together a script to update the sqlite database in Aperture’s library.


March 2nd, 2007 at 6:15 pm
These commands should probably be rewritten with either quotes or backslashes for the “~/Library/Mail/Envelope Index” part. As it stands, just running them won’t work.
March 2nd, 2007 at 7:07 pm
Yikes, my Envelope Index database is 272MB right now!
I saw that post this morning but wanted to wait for confirmation before doing it. Thanks for that!
March 3rd, 2007 at 7:26 am
My index size barely changed. But then again I never delete email messages, ever. I think the potential for cleaning up the databases is proportional to how much you delete your mail, which causes subjects to become obsolete and thus a candidate for getting rid of.
In terms of the speed-up people are seeing, I think it’s likely that you would get a speed up if you’re index was reduced dramatically in size, for obvious reasons.
But if you experience a speed up when your index size has not been dramatically reduced, I suggest that it might be related to the fact that you just ran the vacuum command and the entire index is “in memory”, in particular, the systems i/o cache. A real test of speed-up requires a reboot, and I wonder how many people have tried that?
Still, it is fun to find a heavily used database and poke around to see how it works.
March 3rd, 2007 at 11:39 am
Jonathan–
You are correct in that the unused pages are generally created by records being deleted from the database. Additionally, when existing records change size, it can also cause changes to the database that vacuum will optimize.
While first queries might be impacted by the system’s in-memory disk cache having the database pages within it after a vacuum, that would not impact query performance over time.
The
vacuumcommand rewrites the entire database file. Every single page is copied from the old file to a new copy of the database. In the process, unused pages are eliminated and the entire set of pages are re-ordered for maximal query performance. The combination of the two yields the performance boost.March 3rd, 2007 at 2:28 pm
[...] Everyone’s talking about this tip to speed up Mail.app. I’ve linked to Bbum’s article as he provides a better overview of what to do and how it works. It’s surprisingly effective! [...]
March 3rd, 2007 at 2:36 pm
Here with 10.4.8/Intel, it seems to print “ok”.
March 3rd, 2007 at 8:54 pm
[...] There have been a few posts about vacuuming your Mail and Aperture databases to speed up operations. Great. Yay for vacuum. [...]
March 3rd, 2007 at 9:08 pm
Vacuuming Mail’s Envelope Index to make Mail faste
Bill Bumgarner has a great explanation of an easy hack to speed up Mail.app, which originated a few days ago in Hawk Wings [...]
March 4th, 2007 at 5:56 am
[...] CSSEdit master Jan told me about this neat little Terminal code he found at ‘bbum’s weblog-o-mat’. It makes your Mail.app a little faster. [...]
March 4th, 2007 at 6:05 am
[...] Also via Daring Fireball Linked List: Bill Bumgarner explains the how’s and why’s in detail. [...]
March 4th, 2007 at 6:09 am
so it’s basically manual garbage collection?
March 4th, 2007 at 6:47 am
[...] OPDATERING: Jeg har rettet lidt i instruktionerne for at gøre dem lettere at læse og følge. Bill Baumgartner forklarer, hvorfor tricket virker. [...]
March 4th, 2007 at 7:12 am
Hmm.
Quit mail…
sqlite3 ~/Library/Mail/Envelope\ Index ".dump" > EnvelopeIndex.sqlAdd the line
pragma auto_vacuum=1;to the TOP of EnvelopeIndex.sqlrm ~/Library/Mail/Envelope\ Indexsqlite3 ~/Library/Mail/Envelope\ Index ".read EnvelopeIndex.sql"
Seems to work…
March 4th, 2007 at 10:34 am
Hack zur Mail.app-Beschleunigung
Warum nicht zur Abwechslung einfach Mail.app staubsaugen und damit unter Umständen erheblich beschleunigen? Der Trick ist denkbar einfach: 1. Mail.app beenden (und falls nicht sowieso vorhanden ein Backup anlegen) 2. Terminal öffnen 3. sqlite3 ~/Libr…
March 4th, 2007 at 11:30 am
Yes. It is manual garbage collection.
No– I wouldn’t turn on auto_vacuum in Mail under Tiger. It was off for a reason. I’ll update the post.
March 4th, 2007 at 11:53 am
[...] A couple of days ago, I commented on the vacuum-your-mail-to-make-it-go-faster meme passing through the Mac weblogging community. [...]
March 4th, 2007 at 12:41 pm
[...] not crazy. Mail.app uses SQLite, and its database can stand a good vacuuming from time to time. Simple instructions on optimizing your Mail.app indices to re-optimize internal data structures. No metrics on this, [...]
March 4th, 2007 at 12:47 pm
I’ve seen this kind of thing in other mail apps before. For instance, Lotus Notes has a ‘compact’ function you can run against your local databases that removes the deleted junk.
March 4th, 2007 at 5:19 pm
Which other Applications use SQLlite and could profit from the same trick? I imagine that iSync (or it’s services) also use a SQL database.
March 4th, 2007 at 6:10 pm
[...] does the sqllite vacuum command actually [...]
March 5th, 2007 at 2:42 am
[...] des explications détaillées sont données ici [...]
March 5th, 2007 at 9:47 am
I had to terminate my command with a semi-colon to get it to work:
sqlite3 ~/Library/Mail/Envelope\ Index vacuum;As a C programmer, that occurred to me after nothing happened for a while when I typed the non-terminated command. People without a programming background are going to be mighty frustrated when the enter this command as written and nothing happens.
Remember the semi-colon (;) !!!
March 5th, 2007 at 10:26 am
The semi-colon is not necessary and, in your command above, won’t do anything. In the shell, a semi colon means “end this command and start a new one” (like C, more or less). So, without quoting, SQLite3 doesn’t even see the semi-colon!
For example:
vacuumcan take a long time to run.The semi-colon is necessary when running sqlite3 in interactive command mode:
March 5th, 2007 at 10:46 am
I don’t follow your logic for why Mail doesn’t automatically do this.
It would only take a really long time if it had not been done for a long time. The sensible time to vacuum would be on shutting down Mail. Unless you keep the program open for weeks on end, it’s hard to imagine the clutter getting too large. If I could figure out a way to get Mail to run an applescript on exit, this would be the way I would do it.
March 5th, 2007 at 11:18 am
Vacuum always rewrites the database file. In so doing, it always takes up to 2x the database size in disk space and you always pay the price of copying the file, page by page.
If vacuum were always to be run at termination of Mail, then you will typically be delaying logout or reboot by whatever time it takes the vacuum to run. Worse, if your account is against a remote filesystem, you might actually take long enough to logout to terminate the logout itself (or reboot).
Not a good user experience.
But, worse still, if the user decides to hard reboot or power cycle their machine because the damned power down / logout is taking too long, you can leave the database in an inconsistent state. SQLite generally does a very good job of recovering from such a situation, but it isn’t perfect –nothing can be in this day and age of screwy disk caches and the like.
March 5th, 2007 at 12:11 pm
This tip seems to be Tiger specific, right? Wikipedia claims that SQLite was only built in as of Tiger. Did Apple Mail use something else prior to Tiger?
March 5th, 2007 at 12:13 pm
That is correct. Tiger only. Panther and prior, I’m not sure what Mail used.
March 5th, 2007 at 2:19 pm
You should fix the command to /usr/bin/sqlite3 …, because if you installed sqlite3 from MacPorts, vacum will cause Mail to import all your mail as if you deleted the envelope index file.
For example, on my machine:
$ which sqlite3
/opt/local/bin/sqlite3
March 6th, 2007 at 8:25 am
Nir, I’m using sqlite3 from MacPorts as well, and I did not have this issue. Are you sure you didn’t somehow delete or corrupt the database file?
March 6th, 2007 at 9:54 am
[...] just stumbled upon this nice little hack for speeding up Mail. Although it didn’t really change the size of my [...]
March 6th, 2007 at 5:16 pm
Garret, I’m sure sqlite3 from MacPorts somehow corrupted my envelope index
I tried this twice.
March 11th, 2007 at 7:39 pm
I second the query regarding what apps use SQLite. Specifically does iTunes use it? And if not, why not?
iTunes seems to have changed its database engine in iTunes 7, when it went from being a really zippy app to being a freaking slow as molasses app.
I salute the desire to use a real database that, presumably, provides database integrity, since I have seen quite a few complaints in earlier versions about the iTunes database getting corrupted. But I question why the iTunes database has to be so damn slow. For all my complaints about the slowness of Mail, iT seems far worse, especially in the apparent random mindlessness of when it just times out for a minute or two doing god knows what, but not accepting any input.
I tried, in an act of bravery, to run this same SQLite vacuum command on the iTunes. sqlite3 told me that the database was in an encrypted format — which for all I know may mean what it says, that iTunes, for god knows what DRM reasons is encrypting the database, or may be the generic error SQLite throws up when given any file that isn’t a SQLite database.
So what’s the score here? Does iT use SQLite? Have they at least made an effort to use a real database engine that is reasonably fast, or did they just cobble together their own engine because god forbid they code like wimps and actually admit that someone else might be better able to solve a problem than themselves? Did they say “Core Data is nice, but we have to run on Windows, so screw you” rather than at least looking into what it would take to move the relevant parts of Core Data to Windows?
March 11th, 2007 at 9:09 pm
What apps use SQLite? Not sure. Look for ’sqlite3′ in the header.
None of the iTunes slowness you describe is due to any kind of database issue. As a matter of fact, iTunes’ database needs are really minor. It only has to search on a fairly limited and highly compressable set of metadata to perform the database oriented operations of the UI. iTunes database, by and large, can easily fit in memory without putting undue stress on the system.
Could iTunes use SQLite and gain a performance improvement? Maybe so, probably not though, certainly, it would gain scalability.
The kinds of “refine as user types” searches that iTunes uses really do not lend themselves to SQL based searches unless you do a lot of optimization work. If your data set is small enough to fit into memory (with some likely pruning of less commonly searched information), you are far better off optimizing your search-as-type around in memory data structure traversal.
March 14th, 2007 at 4:57 am
[...] optimizes the database of your Mail.app. You can find a more detailed description of how it is done here. Bookmark this: These icons link to social bookmarking sites where readers can share and [...]
May 11th, 2007 at 9:16 am
[...] master Jan told me about this neat little Terminal code he found at ‘bbum’s weblog-o-mat‘. It makes your Mail.app a little [...]
June 12th, 2007 at 4:14 pm
Why vacuuming Mail's envelope index makes Mail faster
[...][...]
September 3rd, 2007 at 11:19 am
Size of my Envelope Index file in unchanged but my Mail loads much faster now, thank you
January 2nd, 2008 at 10:37 am
Here’s a script to do the whole thing. Hole all the formatting survives.
#!/bin/sh
integrity_check() {
/usr/bin/sqlite3 “$HOME/Library/Mail/Envelope Index” “PRAGMA integrity_check;” 2>&1
}
case “`integrity_check`” in
ok) ;;
*)
echo “Integrity check failed.” >&2
echo “$integrity_check” >&2
exit 1
;;
esac
/usr/bin/sqlite3 “$HOME/Library/Mail/Envelope Index” “vacuum;”
May 6th, 2008 at 3:01 am
[...] more details see this blog post or hawk [...]
April 22nd, 2009 at 7:15 pm
Does anyone know if this trick works in 10.5 and beyond?