How to tell if Spotlight has indexed a file correctly

0

When you search but Spotlight doesn’t find what you’re looking for, you might wonder whether that’s because the file containing it hasn’t been indexed correctly. This article suggests some methods you can use to investigate this.

Spotlight relies on hidden indexes it maintains at the top level of each volume, in the .Spotlight-V100 folder. When an app or process creates a new file or changes the contents of an existing file, that’s recorded in the volume’s FSEvents database. If that file is in a location that isn’t excluded from Spotlight’s index, an mdworker process should then start adding its contents to the volume indexes. To do this, it first checks what type of file it is, by its UTI. Spotlight then looks up the correct mdimporter for that type, and uses that to generate data that Spotlight adds to that volume’s indexes as a set of attributes.

Recent versions of macOS can extract additional information from certain types of files such as images and PDFs. This includes text recognised within images using Live Text, and object recognition and classification, performed in the background by other services such as photoanalysisd. Those are slower and take significantly more processing, and are normally run after traditional mdimporter extraction.

mdimporter plugins

For many file types, these are provided as part of macOS and stored in the system, in /System/Library/Spotlight on the SSV. Importers for third-party apps may be in /Library/Spotlight or ~/Library/Spotlight, or in the /Library/Spotlight folder inside the app bundle. To discover all mdimporter plugins currently installed, use the command
mdimport -L
although that doesn’t inform you which file types they support.

To get more information, identify a file of the correct document type, normally with that type’s standard filename extension, and use mdimport to discover which mdimporter is used for that type, and what it can extract from that specific file, in a command of the form
mdimport -t -d3 [filepath]
where [filepath] is the path of that file. You can vary the digit used in the -d3 option from 1-3, with larger numbers delivering more detail. For -d3 you’ll first be given the file type and mdimporter used, following which all the data extracted is given according to its Spotlight attributes:
Imported ‘/Users/hoakley/Documents/SpotTestA.rtf’ of type ‘public.rtf’ with plugIn /System/Library/Spotlight/RichText.mdimporter.
37 attributes returned
followed by a long list.

If you’re still in the dark, the nuclear option is to perform a diagnostic dump similar to sysdiagnose but concentrating on Spotlight, using the command
mddiagnose -f [path]
to write the diagnostic output to the specified path. Further details are given in man mddiagnose.

Mints

My free utility Mints has a whole section devoted to testing and observing Spotlight. This includes creation of a folder of nine test files in the user’s Home folder, carrying out a test search that should find those test files, custom log analysis, and cleaning up test files afterwards. This is all fully documented in the Spotlight check and test section of its Help book.

To start the test, click on the Create Test button at a known clock time, such as 12:30:00. Wait at least ten seconds, say until 12:30:10, then click the Check Search button. Give your Mac a good 20 seconds to complete that, then open the Log Window and view entries from 12:30:00 for a period of 30 seconds, to cover both the indexing of the test files and the search. Once you’re happy that all is well, you can delete the test files.

Test files include:

RTF rich text
PDF with embedded text
plain UTF-8 text
HTML
Microsoft Word DOCX
JPEG image with the search term in its EXIF metadata
Numbers spreadsheet
Pages document
plain UTF-8 text with the search term in a kMDItemKeywords extended attribute.

You can also customise the test files with other formats, provided that they contain the search text syzygy999 that Mints will look for.

Exclusions

Perhaps the most common reason for Spotlight not indexing a file’s contents is that it has been excluded from doing so. This is always worth checking before making the assumption the problem is in Spotlight.

There have been three general methods of excluding folders from indexing and search, although only two of them still work reliably:

appending the extension .noindex to the folder name (this previously worked using .no_index instead);
making the folder invisible to the Finder by prefixing a dot ‘.’ to its name;
putting an empty file named .metadata_never_index inside the folder; that no longer works in recent macOS.

System Settings offers Spotlight Privacy settings in two sections. Items listed under Search results won’t normally prevent indexing of those items, but will block them from appearing in search results. Spotlight’s indexing exclusion list is accessed from the Search Privacy… button, where listed items won’t be indexed at all.

Summary of useful commands

mdimport -L lists known Spotlight mdimporter plugins.
mdimport -t -d3 [filepath] reports the mdimporter used for a file, and data extracted from that file.
mddiagnose -f [path] runs full Spotlight diagnostics.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.