When and how to rebuild Spotlight indexes

0

Forcing Spotlight’s indexes to be rebuilt has become a panacea, popularly used when anything appears amiss with Spotlight. In many cases, it’s exactly the wrong response to what’s likely to be normal indexing activity. This article explains when it can sometimes be useful, and how to make more effective use of it.

Spotlight works by searching the indexes it maintains on each volume, stored in the hidden .Spotlight-V100 folder at the top level of each searchable volume. Within that folder is a property list containing the volume configuration details, and the Store-V2 folder containing a folder named using a UUID, within which are all the files composing the indexes. As those are opaque to the user they shouldn’t be tampered with.

Indexing

The process of indexing is simplest to understand when considered for a single newly created or changed file. That change is recorded in the volume’s FSEvents database, which in turn triggers an XPC call to process that file if it’s in a location within Spotlight’s scope.

Provided the changed file isn’t in an excluded location, an mdworker process should then start adding its contents to the volume indexes. To do this, it first checks what type of file it is, in terms of its UTI. If that’s incorrect, then the remainder of the steps won’t work properly. In most cases now, that means the file must have the correct extension for its type. If it doesn’t, then mdworker won’t be able to index it correctly.

Spotlight then looks up the correct mdimporter for that type. For many file types, those are provided as part of macOS and stored in the system, in /System/Library/Spotlight on the SSV. Importers for third-party apps may be in /Library/Spotlight or ~/Library/Spotlight, or in the /Library/Spotlight folder inside the app itself. To check all mdimporter plugins currently installed, use the command
mdimport -L

Spotlight importers and mdworker itself can crash when there’s a bug, or the mdimporter encounters a malformed file. If that happens, the log normally records repeated crashes and restarts of that mdworker process. If you can identify and remove the file that’s causing those, that should allow indexing to return to normal.

Once the mdworker has extracted the data from the file, that’s added to the volume’s indexes, typically reflected in a log entry from mds_stores containing the message
compressing 5686 bytes to <private>
or similar, for each file that has had content extracted and added to the indexes. Other services involved include mdsync and mdwrite.

Recent versions of macOS can extract additional information from certain types of files such as images and PDFs. This includes text recognised within images using Live Text, and object recognition and classification, performed in the background by other services such as photoanalysisd. Those are slower and take significantly more processing, and are normally run after mdimporter extraction.

When might it be useful to rebuild indexes?

The only indication for rebuilding Spotlight indexes on a volume is when they are known to be damaged or corrupted, in which case rebuilding offers the best chance of restoring normal search function to that volume.

One way to check the functional integrity of indexes is to perform searches for known targets, a feature available in my free Mints. I added this when the system mdimporter for Rich Text had a bug that effectively made searching the contents of any RTF file impossible, thankfully a rare situation, although third-party mdimporters may not be as reliable.

In the past rebuilding indexes was often used when mdworker processes repeatedly crashed when trying to extract index data from a file. However, that relied on the assumption that those crashes wouldn’t recur. If they did, then rebuilding wouldn’t solve the problem. One way to investigate this further is to discover from the log which file is causing mdworker workers to crash, and removing the cause. This isn’t as straightforward now, as log entries no longer identify the file(s) causing the problem unless privacy protection is removed from the log.

Recently, it has become popular to force indexes to be rebuilt whenever Spotlight’s indexing processes appear to be taking a long time maintaining current indexes, on the assumption that starting that from scratch is going to be quicker than leaving them to complete. This isn’t likely to help, as it’s likely to force rebuilding of indexes that are already fully up to date, and indexing provides no information as to its progress. So there’s no way of telling whether allowing current indexing activity to complete would take another few seconds or days. However, it’s most unlikely that forcing full reindexing would ever be faster than allowing the completion of indexing that’s already in progress.

Rebuilding the index

To force a volume to be re-indexed, open Spotlight (or Siri & Spotlight) in System Settings and click the Search Privacy… or Spotlight Privacy… button at the bottom. Click the + button at the foot, select the volume and add it to the list, then click Done. Pause thirty seconds or so, click the Search Privacy… button again, select that volume in the list, and click the – button to remove it from the list. You don’t normally need to close System Settings or restart between adding and removing the volume.

If you prefer, you can instead use the mdutil command in Terminal. The command
mdutil -E /
erases the indexes on the Data volume and forces them to be rebuilt, and you can use the same option on other volumes.

As Spotlight indexes are maintained and stored on each volume for its contents, you’ll need to include each volume on which you want to be able to search files by their contents. Unless you have been able to identify which volume’s indexes merit rebuilding, you may have to rebuild indexes on every mounted volume, which can take many hours or days to complete.

The simplest way to check that re-indexing is taking place is to open Activity Monitor, and in its CPU view check that processes with names starting with md are taking plenty of CPU. These should include mds_stores, mdworker (often multiple copies) and mds itself. On Apple silicon, those processes run almost exclusively on E cores, and are usually obvious in Activity Monitor’s CPU History window.

Key points

Each volume may have its own Spotlight indexes, used when searching that volume’s contents.
Modern macOS indexes more extensive data, including text recognised within images, and types of object found within them. More advanced metadata take longer to analyse and index.
Rebuilding indexes is indicated if they are known to be damaged or corrupted.
If you don’t know which volume’s indexes require to be rebuilt, rebuilding them on all volumes can take many hours or days.
If the problem is in a file or mdimporter then it’s likely to recur during rebuilding indexes unless the file is identified and removed from indexing.
As there’s no way to determine progress in building indexes, forcing a rebuild is likely to take longer than allowing current indexing to complete.
Rebuilding is best triggered by adding the volume to Spotlight exclusions, then removing it again.
Alternatively, use mdutil -E [volume].
Check rebuilding is taking place using Activity Monitor.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.