Get more from your metadata: reversing Spotlight

Spotlight, particularly when enhanced by a powerful search app like HoudahSpot, is powerful when you know what you’re looking for, and that search term is already indexed. What it doesn’t reveal, though, is what search terms it has indexed. Say you have hundreds of images of the countryside, and you want to find some featuring cows. Should you search for cow, cattle, livestock or ungulate?

When we used to spend hours in Aperture tagging our images, we got to choose those keywords, although I don’t know of anyone who kept a dictionary of all those that they used. Today we’re more likely to leave image recognition to macOS and other software to pick keywords for us. Go beyond images to keywords saved for other documents, like those of Word, Pages, and PDFs, and we’re less likely to know which might have been used.

Here’s a simple example in my everyday image processor, GraphicConverter. Open an image, even a photo of a painting, and click on the Analyze tool. After a few moments of analysis, you’re offered a set of keywords that the app will conveniently save for you as that image’s IPTC keywords.

You can inspect those using Command-I, and when you save that image those keywords are embedded in the file’s data.

Keywords metadata is also available in many other document formats, including RTFD as used by Pages, Nisus Writer Pro and other apps, Word’s .docx files, and PDF. In each of those, you can add keywords in the document information editor. You can even add keywords to plain text and other formats in the com.apple.metadata:kMDItemKeywords extended attribute, made accessible in my free Metamer and other utilities, and normally preserved even when passed through iCloud Drive.

Having got your metadata there, how then can you tell which keywords have been used in your collection of thousands of images or PDFs? That’s where my new app Spotcord comes in.

Spotcord will scan the folders of your choice, inspect and analyse all the Keywords used, and generate an alphabetical list complete with their frequencies, just as you might get in a concordance.

You can either type in the path to the folder you want it to scan, or just click on the Scan button and select it there. In this early testing, Spotcord happily scans through over 50,000 files, although as this is an exhaustive crawl, it will take its time. Because it’s inspecting what’s indexed by Spotlight, you’ll also notice that processes like mds_stores take plenty of CPU when running a scan.

What you get is an alphabetical list of all the keywords it found for files in the folder it has just scanned, together with the number of files in which each appears. These are currently sorted case-sensitively, with A-Z before a-z. You can search this list using the Find… command in the Edit menu. At the end of the list, Spotcord reports the total number of keywords it found, and the number of files that it checked in all.

While Spotcord is interesting enough when scanning files that you have added keywords to, it becomes more fascinating when you scan those you have downloaded. When you come across an intriguing keyword, it’s simple to paste it into a Finder Find window or HoudahSpot and locate those files with that set as a Keyword.

Not all file formats offer keywords, and some metadata conventions suggest storing them in the Subject field, so there is a checkbox that lets you include Subject as well as Keywords in the scan.

My first, proof-of-concept beta is now available from here: spotcord01
but not from anywhere else yet. It is, of course, properly notarized, but still rough in parts and far from complete.

I welcome your ideas as to where I should take this. Is there already another app with the same features and more? Would you like direct access to details of all the files containing a selected keyword? Is this potentially useful, or no more than a curiosity?

Enjoy exploring with Spotcord and please let me know where you’d like it to go next.

I’m very grateful to Grant for suggesting this utility.