Are sparse bundles faster than disk images?

Following tests over two years ago, I recommended the use of sparse bundles (UDSB) rather than plain UDIF read-write (UDRW) disk images, because of their faster and more consistent performance. Since then, read-write disk images have adopted a sparse file format, making them more space-efficient. However, their write performance has remained substantially lower than sparse bundles, and the latter remain the preferred format.

More recent performance tests have confirmed this in Sequoia, as shown in the table below.

Most recently, tests of file compression showed relatively poor performance within a sparse bundle, with overall compression speed only 57% that of the SSD hosting the bundle despite apparently good write speeds. This article explores why, and whether that should alter choice of disk images.

Consistency in write speed

As previous articles note, inconsistent performance troubles disk images. Because of their complex bundle structure and storage across thousands of band files, it’s important to consider whether this poor performance is merely a reflection of that inconsistency.

The original Stibium write speed measurements, separated out from the collation shown in that recent article, show more dispersion about the regression line than other tests. As those are based on a limited number of file sizes, a total of 130 write speed measurements were performed using file sizes ranging from 4.4 MB to 3.0 GB, 50 of them being randomly chosen.

Those confirm that some individual write speeds are unexpectedly low, and dispersion isn’t as low as hoped for, but don’t make inconsistency a plausible explanation.

Band size

Unlike other types of disk image, data saved in sparse bundles is contained in many smaller files termed bands, and the maximum size of those band files can be set when a sparse bundle is created. By default, band size is set at 8.4 MB and there are only two ways to use a custom size, either when creating the sparse bundle using the command hdiutil create or in my free utility Spundle. Although those allow you to set a preferred band size, the size actually used may differ slightly from that.

Band size could have significant effects on the performance and stability of sparse bundles, as it determines how many band files are used to store data. Using a size that results in more than about 100,000 band files has in the past (with HFS+ at least) made sparse bundles prone to failure.

To assess the effect of different band sizes on compression rate, I therefore repeated exactly the same test file compression in sparse bundles with 12 band sizes ranging from 2.0 to 1020 MB, as measured rather than the value set.

In the graph above, time to complete the compression task is shown for each of the band sizes tested, and shows a minimum time (fastest compression) at a band size of 15.4 MB, of about 8.3 seconds, significantly lower than the 11.08 seconds at the default of 8.4 MB. Band sizes between 2.0-8.4 MB showed little differences, but increasing band size to 33.8 MB and greater resulted in much slower compression.

When expressed as compression rates, peak performance at a band size of 15.4 MB is even clearer, and approaches 1.9 GB/s, more than twice that of a read-write disk image, but still far below rates in excess of 2.6 GB/s found for the USB4 and internal SSDs. With band sizes above 33.8 MB, compression rates fall below those of a read-write disk image.

Another way of looking at optimum band size is to convert that into the number of band files required. Fastest rates were observed here in sparse bundles that would have about 6,000-7,000 bands in total when the sparse bundle is full, and in this case with 2,000-2,300 band files in use. Performance fell when the maximum number of band files fell below 3,000, or those in use fell below 1,000.

Streaming

Compression tests impose different demands on storage from conventional read or write benchmarks because file data to be compressed is streamed by reading data from storage, and writing compressed data out to the same storage device. Although source and destination are separate files, allowing simultaneous reading and writing, if those files are stored in common band files, it’s likely that access has to be limited and can’t be simultaneous.

As observed here, that’s likely to result in performance substantially below that expected from single-mode transfer tests. Unfortunately, it’s also extremely difficult to measure attempts to simultaneously read and write from different files in the same storage medium.

Conclusions

Sparse bundles (UDSB) remain generally faster in use than read-write (UDRW) disk images.
Sparse bundle performance is sensitive to band size.
Default 8.4 MB bands aren’t fastest in all circumstances.
When sparse bundle performance is critical, it may be worth optimising band file size instead of using the default.
Tasks that stream data from and back to the same storage are likely to run more slowly in a sparse bundle than on its host storage.

Previous articles

Dismal write performance of Disk Images
Which disk image format?
Disk Images: Performance