Vacuuming Mail’s Envelope Index to make Mail faster (and Aperture, too)

Update:

In response, Garrett Albright suggested that one could turn on auto_vacuum through a bit of a hack (nice one, too — see the comments).

I wouldn’t do that and I have explained why in a followup post.


The weblogging world is all abuzz with this simple hack which makes Mail.app faster, more or less depending on how your Mail happens to be configured and your coincidental usage history.

There seem to be quite a few questions about what is going on. I figured I’d throw out a bit of clarity to the situation.

To invoke the command, do the following:

(1) Quit Mail. This will prevent any lock contention issues.

(2) Go to Terminal and copy/paste the following line:

/usr/bin/sqlite3 ~/Library/Mail/Envelope\\ Index vacuum

The details:

/usr/bin/sqlite3 refers to the SQLite command line tool. Nir Soffer suggested that I use the absolute path to avoid any conflict with sqlite3 binaries installed by things like Fink, MacPorts, or the like. He is correct and I updated the command. It is an interactive shell that can be used to interact with a database or with multiple databases.

~/Library/Mail/Envelope\ Index refers to Mail’s envelope index, a SQLite database that Mail uses as a cross reference for all your messages, mailboxes, and accounts. The ~ means “in my home directory” and the \ is shell speak for “no, really, I meant for the next character to be treated as data, not as some special invisible Unix goop”.

vacuum is a SQLite command that will remove all unused pages and generally compact the SQLite database. You don’t need to specify a table name as the original post suggests. It has been ignored since the vacuum command came back in SQLite 2.8.1 (it was originally in 1.x and removed in 2.x, to return in 2.8.1).


So, what is really going on? The Envelope Index file dropped from 10s of megabytes down to only a couple of megabytes??? What data was deleted?

OK — here starts a bit of technical spewage regarding the details of SQLite’s internals. Not in any depth, but enough to understand what is happening.

SQLite is, as its name implies, a SQL based database storage solution. The “lite” part is that it is designed to be embedded into an application and to store data pretty much directly to/from disk, not electing to use the more common client/server two or three-tiered architecture of a traditional relational store.

Unlike most database solutions, there is no server as a part of SQLite. Internally to your application you call — or you tell Core Data to call — some SQLite functions that go off and deal with the database. These are designed to be very fast and the whole thing is designed to have minimal impact on your application. So, just as there is no random external server process, SQLite doesn’t spawn threads — worker bees that can operate independently in an app — to do various tasks.

Now, a database is designed to retrieve data very quickly. As such, the data written to disk is generally written in a highly structured fashion such that scanning and retrieving subsets of the data can be done quickly and can be optimized– or indexed– to make certain kinds of queries very fast.

Whenever new data is written into such a data store or existing data is updated, the structure of the data in the store has to be adjusted to optimally represent the data now contained to maintain query performance.

SQLite happens to use a storage structure derived from B-Trees that uses pages — equally sized chunks of the overall data file — to store the data. As the data in a database is updated, the pages are re-organized and, occasionally, an update of the database will cause a page to be orphaned. That is, a page will no longer be used but, because of the need to optimize the queries, it won’t be reused either.

So, the vacuum command effectively copies each and every in-use page in the database file to a new copy of the database file. As it does so, it omits all unused pages and it optimizes the fresh copy such that query performance will be as fast as possible.

If this is the case, why doesn’t Mail periodically perform a vacuum command?

There are a number of reasons why you would avoid it. Most importantly, it can take a really long time and there is no way to predict how long it might run or what the savings might be. Not exactly the kind of thing you want to subject the user too if the general performance profile is “good enough” (and it mostly is).

And this is a problem that really should be addressed in the database itself. SQLite should be responsible for appropriately recycling unused pages and generally optimizing the database file as it goes. By doing so, it should be able to amortize the potentially large cost of a periodic vacuum across each INSERT or UPDATE of the database.

And that is exactly what SQLite has done in more recent versions. SQLite now has an auto-vacuum feature that will cause the database to effectively stay “vacuumed” of free pages automatically and without need to periodically do a rather intrusive copy-and-fix operation.

It can be turned on or off on a per-database basis for optimization purposes. But only at database creation time. If you have a database into which data will only ever be added — a logging application, for example — you would likely turn off auto-vacuum as it will just incur overhead without ever being able to shrink the database.


There is another useful command you can use at the command line in a similar fashion. During the development of Tiger, I lived by this command as there were a couple of periods of time where the filesystem had embraced chaos when it came to writing data.

/usr/bin/sqlite3 ~/Library/Mail/Envelope\\ Index 'PRAGMA integrity_check'

'PRAGMA integrity_check' causes SQLite to walk the entire internal structure of the database and make sure everything makes sense (as much as possible). The 's at the beginning and end of the SQL stuff tell the shell (the thing in Terminal that interprets what you type, for all intents and purposes) to pass 'PRAGMA integrity_check' as a single argument to sqlite3 command, instead of breaking it into two arguments separated by a space. Now, sqlite3 could be smart enough to deal with this. But it isn’t. Such is the unix way.

SQLite’s integrity checker will look for out-of-order records, missing pages, malformed records, and/or corrupt indices. If there are none, the command emits a single line saying “ok” (sorry about the mistake before — chalk it up to an unfortunate Terminal related issue).

If you happen to execute the above command and it emits any output save for “ok”, recovering from the problem is really quite terribly easy for Mail. Just delete the Envelope Index:

rm ~/Library/Mail/Envelope\\ Index

When deleted, Tiger’s Mail will re-import all of the messages as if you just migrated from Panther. Nothing lost save for a bit of time, potentially quite a bit of time if you have a particularly large set of Mail.

Obviously, if a SQLite database contains unrecoverable data, the above would be a very bad idea. Recovering data in such a situation is very difficult.

Update: Yeah– I hate Unix. When I wrote the post, I had included the appropriate “\” escaping of all the commands and even called out why they are necessary. Apparently, either WordPress or MarsEdit ate the damned slashes. Grr. Fixed. Thanks for pointing out the error.

BoingBoing has picked it up, perpetuating the incorrect command (works, but the table name is not necessary — no big deal). The hawk wing site now has a couple of scripts that nicely automate the process.

Of course, this will work for any SQLite application that uses the built in SQLite on Tiger and does not periodically vacuum it. For example, Steve Weller has tossed together a script to update the sqlite database in Aperture’s library.



41 Responses to “Vacuuming Mail’s Envelope Index to make Mail faster (and Aperture, too)”

  1. Ryan Brown says:

    These commands should probably be rewritten with either quotes or backslashes for the “~/Library/Mail/Envelope Index” part. As it stands, just running them won’t work.

  2. Jim Roepcke says:

    Yikes, my Envelope Index database is 272MB right now!

    I saw that post this morning but wanted to wait for confirmation before doing it. Thanks for that!

  3. Jonathan says:

    My index size barely changed. But then again I never delete email messages, ever. I think the potential for cleaning up the databases is proportional to how much you delete your mail, which causes subjects to become obsolete and thus a candidate for getting rid of.

    In terms of the speed-up people are seeing, I think it’s likely that you would get a speed up if you’re index was reduced dramatically in size, for obvious reasons.

    But if you experience a speed up when your index size has not been dramatically reduced, I suggest that it might be related to the fact that you just ran the vacuum command and the entire index is “in memory”, in particular, the systems i/o cache. A real test of speed-up requires a reboot, and I wonder how many people have tried that?

    Still, it is fun to find a heavily used database and poke around to see how it works.

  4. bbum says:

    Jonathan–

    You are correct in that the unused pages are generally created by records being deleted from the database. Additionally, when existing records change size, it can also cause changes to the database that vacuum will optimize.

    While first queries might be impacted by the system’s in-memory disk cache having the database pages within it after a vacuum, that would not impact query performance over time.

    The vacuum command rewrites the entire database file. Every single page is copied from the old file to a new copy of the database. In the process, unused pages are eliminated and the entire set of pages are re-ordered for maximal query performance. The combination of the two yields the performance boost.

  5. Tip to speed up Mail.app at stefpause.com says:

    [...] Everyone’s talking about this tip to speed up Mail.app. I’ve linked to Bbum’s article as he provides a better overview of what to do and how it works. It’s surprisingly effective! [...]

  6. Chris Combs says:

    SQLite’s integrity checker will look for out-of-order records, missing pages, malformed records, and/or corrupt indices. If there are none, the command emits no output.

    Here with 10.4.8/Intel, it seems to print “ok”.

  7. SQLite Vacuum | NSLog(); says:

    [...] There have been a few posts about vacuuming your Mail and Aperture databases to speed up operations. Great. Yay for vacuum. [...]

  8. guillermoesteves.com says:

    Vacuuming Mail’s Envelope Index to make Mail faste

    Bill Bumgarner has a great explanation of an easy hack to speed up Mail.app, which originated a few days ago in Hawk Wings [...]

  9. Jonno Riekwel || Make Mail.app faster says:

    [...] CSSEdit master Jan told me about this neat little Terminal code he found at ‘bbum’s weblog-o-mat’. It makes your Mail.app a little faster. [...]

  10. soeren says » Blog Archive » Pruning Mail’s Envelope Index says:

    [...] Also via Daring Fireball Linked List: Bill Bumgarner explains the how’s and why’s in detail. [...]

  11. anon coward says:

    so it’s basically manual garbage collection?

  12. Sådan gør du Apple Mail hurtigere // mikkelwinther.dk says:

    [...] OPDATERING: Jeg har rettet lidt i instruktionerne for at gøre dem lettere at læse og følge. Bill Baumgartner forklarer, hvorfor tricket virker. [...]

  13. Garrett Albright says:

    It can be turned on or off on a per-database basis for optimization purposes. But only at database creation time. If you have a database into which data will only ever be added — a logging application, for example — you would likely turn off auto-vacuum as it will just incur overhead without ever being able to shrink the database.

    Hmm.

    Quit mail…
    sqlite3 ~/Library/Mail/Envelope\ Index ".dump" > EnvelopeIndex.sql
    Add the line pragma auto_vacuum=1; to the TOP of EnvelopeIndex.sql
    rm ~/Library/Mail/Envelope\ Index
    sqlite3 ~/Library/Mail/Envelope\ Index ".read EnvelopeIndex.sql"

    Seems to work…

  14. fscklog says:

    Hack zur Mail.app-Beschleunigung

    Warum nicht zur Abwechslung einfach Mail.app staubsaugen und damit unter Umständen erheblich beschleunigen? Der Trick ist denkbar einfach: 1. Mail.app beenden (und falls nicht sowieso vorhanden ein Backup anlegen) 2. Terminal öffnen 3. sqlite3 ~/Libr…

  15. bbum says:

    Yes. It is manual garbage collection.

    No– I wouldn’t turn on auto_vacuum in Mail under Tiger. It was off for a reason. I’ll update the post.

  16. bbum’s weblog-o-mat » Blog Archive » What about turning on auto_vacuum on Mail’s envelope index? says:

    [...] A couple of days ago, I commented on the vacuum-your-mail-to-make-it-go-faster meme passing through the Mac weblogging community. [...]

  17. scot hacker’s foobar blog » Speed Up Mail.app says:

    [...] not crazy. Mail.app uses SQLite, and its database can stand a good vacuuming from time to time. Simple instructions on optimizing your Mail.app indices to re-optimize internal data structures. No metrics on this, [...]

  18. Scott says:

    I’ve seen this kind of thing in other mail apps before. For instance, Lotus Notes has a ‘compact’ function you can run against your local databases that removes the deleted junk.

  19. Thomas says:

    Which other Applications use SQLlite and could profit from the same trick? I imagine that iSync (or it’s services) also use a SQL database.

  20. Sqllite vacuum command | David Paul Robinson says:

    [...] does the sqllite vacuum command actually [...]

  21. ménage dans mail at caisse à dires ?! says:

    [...] des explications détaillées sont données ici [...]

  22. macFanDave says:

    I had to terminate my command with a semi-colon to get it to work:

    sqlite3 ~/Library/Mail/Envelope\ Index vacuum;

    As a C programmer, that occurred to me after nothing happened for a while when I typed the non-terminated command. People without a programming background are going to be mighty frustrated when the enter this command as written and nothing happens.

    Remember the semi-colon (;) !!!

  23. bbum says:

    The semi-colon is not necessary and, in your command above, won’t do anything. In the shell, a semi colon means “end this command and start a new one” (like C, more or less). So, without quoting, SQLite3 doesn’t even see the semi-colon!

    For example:

    % cd /tmp/ \; ls
    tcsh: cd: Too many arguments.
    % cd /tmp/ ; ls
    cs_cache_lock_502       cs_cache_lock_92        ics231                  icssuis502
    

    vacuum can take a long time to run.

    The semi-colon is necessary when running sqlite3 in interactive command mode:

    % sqlite3 ~/Library/Mail/Envelope\ Index 
    SQLite version 3.1.3
    Enter ".help" for instructions
    sqlite> vacuum;
    sqlite> vacuum
       ...> ;
    sqlite> 
    
  24. Stephen says:

    I don’t follow your logic for why Mail doesn’t automatically do this.

    There are a number of reasons why you would avoid it. Most importantly, it can take a really long time and there is no way to predict how long it might run or what the savings might be. Not exactly the kind of thing you want to subject the user too if the general performance profile is “good enough” (and it mostly is).

    It would only take a really long time if it had not been done for a long time. The sensible time to vacuum would be on shutting down Mail. Unless you keep the program open for weeks on end, it’s hard to imagine the clutter getting too large. If I could figure out a way to get Mail to run an applescript on exit, this would be the way I would do it.

  25. bbum says:

    Vacuum always rewrites the database file. In so doing, it always takes up to 2x the database size in disk space and you always pay the price of copying the file, page by page.

    If vacuum were always to be run at termination of Mail, then you will typically be delaying logout or reboot by whatever time it takes the vacuum to run. Worse, if your account is against a remote filesystem, you might actually take long enough to logout to terminate the logout itself (or reboot).

    Not a good user experience.

    But, worse still, if the user decides to hard reboot or power cycle their machine because the damned power down / logout is taking too long, you can leave the database in an inconsistent state. SQLite generally does a very good job of recovering from such a situation, but it isn’t perfect –nothing can be in this day and age of screwy disk caches and the like.

  26. Stephen W. Carson says:

    This tip seems to be Tiger specific, right? Wikipedia claims that SQLite was only built in as of Tiger. Did Apple Mail use something else prior to Tiger?

  27. bbum says:

    That is correct. Tiger only. Panther and prior, I’m not sure what Mail used.

  28. Nir Soffer says:

    You should fix the command to /usr/bin/sqlite3 …, because if you installed sqlite3 from MacPorts, vacum will cause Mail to import all your mail as if you deleted the envelope index file.

    For example, on my machine:
    $ which sqlite3
    /opt/local/bin/sqlite3

  29. Garrett Albright says:

    Nir, I’m using sqlite3 from MacPorts as well, and I did not have this issue. Are you sure you didn’t somehow delete or corrupt the database file?

  30. dPompa» Blog Archive » Speeding up Mail says:

    [...] just stumbled upon this nice little hack for speeding up Mail. Although it didn’t really change the size of my [...]

  31. Nir Soffer says:

    Garret, I’m sure sqlite3 from MacPorts somehow corrupted my envelope index :-) I tried this twice.

  32. Maynard Handley says:

    I second the query regarding what apps use SQLite. Specifically does iTunes use it? And if not, why not?
    iTunes seems to have changed its database engine in iTunes 7, when it went from being a really zippy app to being a freaking slow as molasses app.

    I salute the desire to use a real database that, presumably, provides database integrity, since I have seen quite a few complaints in earlier versions about the iTunes database getting corrupted. But I question why the iTunes database has to be so damn slow. For all my complaints about the slowness of Mail, iT seems far worse, especially in the apparent random mindlessness of when it just times out for a minute or two doing god knows what, but not accepting any input.

    I tried, in an act of bravery, to run this same SQLite vacuum command on the iTunes. sqlite3 told me that the database was in an encrypted format — which for all I know may mean what it says, that iTunes, for god knows what DRM reasons is encrypting the database, or may be the generic error SQLite throws up when given any file that isn’t a SQLite database.

    So what’s the score here? Does iT use SQLite? Have they at least made an effort to use a real database engine that is reasonably fast, or did they just cobble together their own engine because god forbid they code like wimps and actually admit that someone else might be better able to solve a problem than themselves? Did they say “Core Data is nice, but we have to run on Windows, so screw you” rather than at least looking into what it would take to move the relevant parts of Core Data to Windows?

  33. bbum says:

    What apps use SQLite? Not sure. Look for ‘sqlite3′ in the header.

    None of the iTunes slowness you describe is due to any kind of database issue. As a matter of fact, iTunes’ database needs are really minor. It only has to search on a fairly limited and highly compressable set of metadata to perform the database oriented operations of the UI. iTunes database, by and large, can easily fit in memory without putting undue stress on the system.

    Could iTunes use SQLite and gain a performance improvement? Maybe so, probably not though, certainly, it would gain scalability.

    The kinds of “refine as user types” searches that iTunes uses really do not lend themselves to SQL based searches unless you do a lot of optimization work. If your data set is small enough to fit into memory (with some likely pruning of less commonly searched information), you are far better off optimizing your search-as-type around in memory data structure traversal.

  34. Ruggero.blog » Blog Archive » Speed up your Mail.app by optimizing the datamase index says:

    [...] optimizes the database of your Mail.app. You can find a more detailed description of how it is done here. Bookmark this: These icons link to social bookmarking sites where readers can share and [...]

  35. Jonno Riekwel | Make Mail.app faster says:

    [...] master Jan told me about this neat little Terminal code he found at ‘bbum’s weblog-o-mat‘. It makes your Mail.app a little [...]

  36. Web 2.0 Announcer says:

    Why vacuuming Mail's envelope index makes Mail faster

    [...][...]

  37. Roma says:

    Size of my Envelope Index file in unchanged but my Mail loads much faster now, thank you

  38. Peter da Silva says:

    Here’s a script to do the whole thing. Hole all the formatting survives.

    #!/bin/sh

    integrity_check() {
    /usr/bin/sqlite3 “$HOME/Library/Mail/Envelope Index” “PRAGMA integrity_check;” 2>&1
    }

    case “`integrity_check`” in
    ok) ;;
    *)
    echo “Integrity check failed.” >&2
    echo “$integrity_check” >&2
    exit 1
    ;;
    esac

    /usr/bin/sqlite3 “$HOME/Library/Mail/Envelope Index” “vacuum;”

  39. Optimizing Mail.app index » Torsten Curdt’s weblog says:

    [...] more details see this blog post or hawk [...]

  40. Sulayman says:

    Does anyone know if this trick works in 10.5 and beyond?

  41. Mauricio Wolff says:

    Working on Apple mail 4.3. If you copy and paste, just be sure to change the quotes, save the file and chmod it as +x.

    Wolff

Leave a Reply

Line and paragraph breaks automatic.
XHTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>