rust-vs-swift

This project began a long time ago as a personal fun exploration into audio visualization. I needed a tool to process my extensive music library (over 50,000 audio files) and generate stereo waveform visualizations for each track. These visualizations would later be integrated into a custom web-based music player I had developed.

Initial Swift Implementation

I initially chose Swift for this project for several reasons: - Swift is the primary language for Apple’s ecosystem - It has built-in access to Apple’s powerful media processing libraries - I wanted to see how Swift did as a more modern programming language

The Swift implementation performed reasonably well on my Intel MacBook Pro (2019, i9 processor), processing the entire library in approximately 29 minutes. When I later upgraded to an M4 Max MacBook Pro, the same Swift code ran in about 16 minutes - a significant improvement due to the superior hardware performance of the Apple Silicon.

The Rust Alternative

I became interested in exploring Rust as an alternative implementation: - Rust could run on my Linux workstation - Rust has been gaining popularity for performance-critical applications - Rust’s audio processing libraries have matured considerably - I wanted to compare these languages in a real-world scenario - It provided an opportunity to use Rust for a fun project

This led me to reimplement the same functionality in Rust, ensuring both versions produced effectively identical outputs for comparative testing.

People who know me, know what happened next. I benchmarked these implementations. I actually had done that with the Swift code and tuned its audio buffer sizes and processing loop to get the best performance I could. So, that is what I did with the Rust version.

The Benchmarking Process

Once both implementations were complete, I conducted thorough performance testing. For the Swift version, I had already optimized audio buffer sizes and processing loops to achieve optimal performance. I applied similar optimization techniques to the Rust implementation to ensure a fair comparison.

The Unexpected Discovery (and dope-slap)

My benchmarking journey took an unexpected turn when I discovered something remarkable about the Rust implementation on my M4 Max MacBook Pro.

When I initially ran performance tests on the M4, the Rust version already showed impressive results, slightly outpacing the Swift implementation. But during my investigation, I realized that my Rust environment was still configured for Intel architecture. The Rust compiler had been migrated from my old Intel machine and was generating x86_64 binaries that were being executed through Rosetta 2, Apple’s dynamic binary translator for running Intel applications on Apple Silicon.

In other words, the Rust code was running under emulation and still outperforming the native Swift implementation.

This had me giving myself a massive “dope-slap” and some additional work of now doing this correctly. But it also brought home a realization of just how good Apple’s hardware and Rosetta software are.

This discovery was genuinely surprising, as conventional wisdom suggests that native code should always outperform emulated code. The fact that x86_64 Rust code running through Rosetta’s translation layer could match or exceed the performance of native ARM64 Swift code is a testament to both:

After reinstalling Rust with native Apple Silicon support, performance improved even further, with the native ARM64 Rust implementation now significantly outperforming both the emulated Rust code and the native Swift version on Apple Silicon.

These improvements held true for both small single-file operations and large multi-file, highly parallel test cases.

Architecture Matters More Than Language: The performance difference between Intel and Apple Silicon is generally larger than the difference between Rust and Swift on the same architecture. The M4 is substantially faster than the Intel i9 regardless of programming language. So much so that even the code compiled for the Intel processor ran faster on the Apple Silicon than it did running on its native Intel hardware!

Performance

The following sections present visualizations of various performance metrics collected during testing. Each graph shows relative performance across platforms, normalized so the largest value equals 100%.

Wall Clock Time (Real Time)

This metric represents the actual elapsed time from start to finish. It includes all computation, I/O operations, and scheduling delays. Lower values indicate faster overall completion.

User Time (CPU Time)

This metric shows the total CPU time spent executing user-level code across all cores. Lower values indicate more efficient processing. The M4 Rust implementation shows significantly less CPU usage than other configurations.

System Time

This graph displays the CPU time spent in the kernel on behalf of the process, including operations like file I/O and memory allocation. Lower values indicate less kernel overhead. Swift on Intel i9 shows notably high system time, while Rust implementations generally have less kernel overhead.

Maximum Resident Set Size

This metric represents the peak amount of physical memory allocated during execution. The x86_64 binary running under Rosetta on the M4 uses the most memory, while the Linux implementation is the most memory efficient.

Voluntary Context Switches

Voluntary context switches occur when a process willingly gives up the CPU, typically when waiting for I/O. The Swift implementations show dramatically higher voluntary context switches, suggesting they use more asynchronous I/O operations than the Rust versions.

Involuntary Context Switches

These context switches happen when the OS preemptively switches out a process. Higher numbers can indicate CPU contention or scheduler pressure. The Swift implementation on M4 shows an extremely high number, suggesting significant preemption.

Instructions Retired

This metric counts the total number of CPU instructions executed to complete the task. Lower values can indicate less computational work. Interestingly, the x86_64 binary under Rosetta executes significantly more instructions, suggesting emulation overhead.

Cycles Elapsed

This shows the number of CPU cycles used during execution. The Apple Silicon implementations used substantially fewer cycles than their Intel counterparts, demonstrating the architecture’s efficiency.

Instructions Per Cycle

IPC measures CPU efficiency - how many instructions are completed per clock cycle on average. Higher is better. Apple Silicon shows dramatically better IPC than Intel, with the Rosetta x86_64 implementation achieving the highest efficiency despite running emulated code.

Peak Memory Footprint

This represents the maximum memory actually touched (not just allocated) during execution. The Rosetta emulated code has the highest memory footprint, while native Rust implementations on both platforms show efficient memory usage.

Binary Sizes

Performance Details

Platform	Time (seconds)[shorter is faster]
M4 MacBookPro Rust	531.35	\|
M4 MacBookPro Swift	952.81	\|
i9 MacBookPro Rust	2,310.94	\|
i9 MacBookPro Swift	1,763.86	\|
M4 MacBookPro x86_64!	862.21	\|
i9 Linux Desktop Rust	1,372.13	\|

Platform	User Time (seconds)[shorter is better]
M4 MacBookPro Rust	8,233.19	\|
M4 MacBookPro Swift	13,871.20	\|
i9 MacBookPro Rust	34,821.80	\|
i9 MacBookPro Swift	25,992.10	\|
M4 MacBookPro x86_64!	13,080.89	\|
i9 Linux Desktop Rust	21,735.25	\|

Platform	System Time (seconds)[shorter is better]
M4 MacBookPro Rust	71.93	\|
M4 MacBookPro Swift	357.92	\|
i9 MacBookPro Rust	224.28	\|
i9 MacBookPro Swift	695.70	\|
M4 MacBookPro x86_64!	106.17	\|
i9 Linux Desktop Rust	115.48	\|

Platform	Maximum Resident Set Size (megabytes)[shorter is better]
M4 MacBookPro Rust	310	\|
M4 MacBookPro Swift	303	\|
i9 MacBookPro Rust	230	\|
i9 MacBookPro Swift	286	\|
M4 MacBookPro x86_64!	421	\|
i9 Linux Desktop Rust	108	\|

Platform	Voluntary Context Switches (thousands)[shorter is better]
M4 MacBookPro Rust	163	\|
M4 MacBookPro Swift	7,787	\|
i9 MacBookPro Rust	166	\|
i9 MacBookPro Swift	8,352	\|
M4 MacBookPro x86_64!	160	\|
i9 Linux Desktop Rust	214	\|

Platform	Involuntary Context Switches (thousands)[shorter is better]
M4 MacBookPro Rust	3,562	\|
M4 MacBookPro Swift	43,719	\|
i9 MacBookPro Rust	8,297	\|
i9 MacBookPro Swift	10,417	\|
M4 MacBookPro x86_64!	7,048	\|
i9 Linux Desktop Rust	505	\|

Platform	Instructions (trillions)[shorter is better]
M4 MacBookPro Rust	126.72	\|
M4 MacBookPro Swift	200.54	\|
i9 MacBookPro Rust	144.87	\|
i9 MacBookPro Swift	119.54	\|
M4 MacBookPro x86_64!	218.01	\|
i9 Linux Desktop Rust	147.02	\|

Platform	Cycles (trillions)[shorter is better]
M4 MacBookPro Rust	28.98	\|
M4 MacBookPro Swift	49.93	\|
i9 MacBookPro Rust	107.39	\|
i9 MacBookPro Swift	79.28	\|
M4 MacBookPro x86_64!	44.33	\|
i9 Linux Desktop Rust	103.17	\|

Platform	IPC [longer is better]
M4 MacBookPro Rust	4.37	\|
M4 MacBookPro Swift	4.02	\|
i9 MacBookPro Rust	1.35	\|
i9 MacBookPro Swift	1.51	\|
M4 MacBookPro x86_64!	4.92	\|
i9 Linux Desktop Rust	1.42	\|

Platform	Peak Memory (megabytes)[shorter is better]
M4 MacBookPro Rust	118	\|
M4 MacBookPro Swift	296	\|
i9 MacBookPro Rust	104	\|
i9 MacBookPro Swift	280	\|
M4 MacBookPro x86_64!	421	\|

The following performance metrics were collected during various runs of my Waver-rust and Waver-swift tools across my different machines, all processing the same dataset. I made sure the machines had cooled down between runs and chose the best test run from several test runs (albeit they were rather consistent in performance).

Test runs:

Each test is grouped by CPU and platform to help compare relative performance between architectures and toolchains.

CPU: Apple M4 Max - MacBookPro 16-inch

This is the correctly compiled version of the code as native Apple Silicon ARM code.

Note how the Rust version significantly beat the Swift version on the Apple Silicon machine!

Note also that the Rust version seems to allocate more memory than it actually touches.

CPU: Intel Core i9-9980HK - MacBookPro 16-inch

The same code was also compiled and tested on the Intel MacBookPro (i9, high-end configuration). The Intel MacBook was plugged in and actively cooling with fans — though the M4 Max also ran fans, it did so for a shorter period and while unplugged.

Swift wins on x86_64 for Intel even though it loses on Apple Silicon/ARM. This is surprising, likely explained by Apple’s deeply optimized media libraries used by Swift while the Rust code is pure Rust and likely lacks the benefit of those same optimizations. However, the fact that it was able to outdo Apple’s optimized media libraries on Apple’s own silicon is amazing.

The high context switch count in the Swift runs supports the fact that it is using OS- and library-level offloading. Rust, on the other hand, doesn’t tap into these Apple-optimized frameworks and suffers from that on the Intel hardware.

CPU: Apple M4 Max - MacBookPro 16-inch - Rosetta

Here are the numbers for the x86_64 compiled code running on the M4! Yes, wall clock time beat the native Swift code! This just shows how good Apple’s Rosetta is with Intel to Apple Silicon conversion in real time. The M4 Max runs the Intel native binary nearly 3 times fast than the i9 MacBookPro and over 50% faster than the Intel i9 desktop!

This is the same code but with the wrong tool chain! Look at those numbers! In fact, to validate this, I tried the actual binary as built on my Intel MacBookPro and ran the tests again and it produced the same results!

CPU: Intel i9-9900K – Linux Tower (Water-Cooled, ~4.72 GHz Average)

The Linux desktop outperformed the Intel MacBookPro thanks to its higher clock speed and better cooling. It even beat Swift on the Intel platform. Still, it couldn’t keep up with the M4 Max on battery power — a remarkable testament to Apple Silicon.

While the Linux system’s SSD wasn’t quite as fast as Apple’s, it was more than fast enough (averaging 520MB/s, peaking at 600MB/s). With water cooling maintaining ~81°C max CPU temp, it ran at full boost the entire time — unlike the thermally throttled MacBooks.

Tool Chains:

Notable Differences

While both implementations produce visually comparable waveform images, they differ significantly in design philosophy, capabilities, and performance characteristics. The Rust version prioritizes cross-platform purity and memory safety at the cost of reduced feature breadth, whereas the Swift version leverages Apple’s deeply optimized frameworks for platform-specific efficiency but introduces trade-offs in abstraction and flexibility.

Rust Trade-Offs/Benefits

Swift Trade-Offs/Benefits

Big Surprise!

Now, after some time of having written the above, I happened to wonder how well the Rosetta system really worked for other code. The first thing I did was to take the same Swift code but compiled on my i9 MacBookPro and brought it to my M4Max MacBookPro and an M1Max MacBookPro.

The big surprise? Swift compiler produces horrible Apple Silicon native code (or the Swift compiler produces amazing Intel CPU code).

The x86_64 compiled Swift code was actually slightly faster than the natively compiled code! It was even more pronounced on the M1 Max CPU - where the x86_64 code was significantly faster than the native code! How is that possible? Maybe the Swift compiler is just really bad at generating ARM/Apple Silicon code.

Note that the M1Max CPU has only 10 cores (8P cores and 2E cores) while the M4Max has 16 cores (12P cores and 4E cores) which is why the wall clock time showed nearly 3x performance improvement for the native code on M4 compared to M1 but if you look at CPU time, it is not that much. Still, the M4Max definitely retires more instructions per cycle in addition to running at higher clock speeds.

The i9 MacBookPro shows that the Swift code gen for the Intel processor is really good in comparison. It performed better than the M1 MaxBookPro in wall clock (user time) but used more CPU time - the i9 MacBookPro has 8 real cores with 2 hardware threads per core so it acts close to a 16 core machine under the right conditions and this seems to have shown that!

Note These tests were run over a different dataset than those earlier in the document. I no longer had that exact same dataset available.

Rust produces better Apple Silicon code

The Rust compiled code performance is as you would expect. It also shows that it produces exactly the same results on both processor architectures. The subtle rounding/image differences do not show up. They are bit-for-bit exactly the same results. This is what you want from code that processes data.

It also shows how much better the M4 processor is over the M1 and how the M1 already is significantly better than the Intel i9.

Not only that, the Rust code also produces much more optimal PNG files which explains the significant size savings even though they have the same wave form images in them.

Note that in all cases the x86_64 binaries are the same (built on the i9 machine).

What this all shows is that there is some significant benefit to using Rust from a performance standpoint in addition to the stricter correctness and safety. It also shows that the Swift compiler toolchain, somewhere, is leaving a lot of performance on the floor. The Intel code gen clearly is doing well while the Apple Silicon code gen is, to put it frankly, rather poor - especially when it can be beaten by Rosetta doing its magic on the Intel code from the same source code.

Rust clearly shows the performance difference that is expected when the compiler produces quality code.

Platform	Time (seconds)[shorter is faster]
M4 MacBookPro Swift Native	57.88	\|
M4 MacBookPro Swift x86_64	57.08	\|
M1 MacBookPro Swift Native	149.02	\|
M1 MacBookPro Swift x86_64	113.76	\|
i9 MacBookPro Swift	106.81	\|

Platform	CPU Time (seconds)[shorter is faster]
M4 MacBookPro Swift Native	901.07	\|
M4 MacBookPro Swift x86_64	886.71	\|
M1 MacBookPro Swift Native	1440.82	\|
M1 MacBookPro Swift x86_64	1098.64	\|
i9 MacBookPro Swift	1657.28	\|

Platform	Time (seconds)[shorter is faster]
M4 MacBookPro Rust Native	33.69	\|
M4 MacBookPro Rust x86_64	48.29	\|
M1 MacBookPro Rust Native	74.62	\|
M1 MacBookPro Rust x86_64	94.08	\|
i9 MacBookPro Rust	120.23	\|

Platform	CPU Time (seconds)[shorter is faster]
M4 MacBookPro Rust Native	525.52	\|
M4 MacBookPro Rust x86_64	747.10	\|
M1 MacBookPro Rust Native	720.93	\|
M1 MacBookPro Rust x86_64	910.52	\|
i9 MacBookPro Rust	1885.22	\|

This also, again, shows the amazing capabilities of Rosetta and the Apple Silicon CPU. It is hard to express just how impressive they are given the performance gains and power savings compared to the Intel counterparts.

Note on the data set

Since a significant amount of time had passed since I first did the Swift and Rust testing, I no longer had the exact same data set to test on so I had to make a benchmark dataset specifically designed to help test the code again. It contains 4 different MP3 files of 4 very different sorts: 48kHz stereo, 44.1kHz joint stereo, 44.1kHz stereo, and 22kHz mono - all with different bit rates. These 4 files were then duplicated (via hard links) to produce a total of 3,116 MP3 files (equal counts of each) distributed over many directories and sub-directories such that the tree was complex and mixed.

Why the hard links? Hard links allowed the operating system to cache just the 4 unique MP3 files in memory, completely eliminating disk I/O as a performance factor. While the test was processing over 3,000 MP3 files that represented more than 21 gigabytes of logical data, the actual unique file content was less than 30 megabytes - small enough to fit entirely in the OS buffer cache. This approach ensured that file reading time was effectively zero, allowing us to focus purely on processing performance. (Write operations for the PNG output files were still required, but these were minimal - totaling less than 46 megabytes across all files created.)

Small Dataset Performance

CPU: Apple M4 Max - MacBookPro 16-inch

CPU: Intel(R) Core(TM) i9-9980HK - MacBookPro 16-inch

M1 MacBookAir performance

This is a smaller, different dataset as the base model M1 MacBookAir does not have the size of SSD or the performance of the newer M4.

You can definitely see the slower clock rate and lower IPC counts but the performance difference is real between the Rust code and the Swift code. At 2 times slower, the Swift code definitely loses in this specific scenario, even on Apple Silicon.

CPU: Apple M1 - MacBookAir

The Code

Swift

// Simple hack program that, on MacOS with Swift, reads audio file (MP3/etc)
// and produces a PNG file that represents its waveform visually.

// This specifically uses streaming such that the whole audio does not
// need to be loaded into memory at once.  This allows the processing of
// some large/long audio files at the speed of decoding them without
// significant memory overhead.  Because of this we can then safely
// parallelize the operations such that multiple audio files can be
// processed at once.

// We use the swift argument parser library to have declarative arguments
// and argument validation handled in both a standard and simple way

import AVFoundation
import AppKit
import ArgumentParser
import Foundation


/// Waver Command Line Tool
/// The main command line tool struct for converting audio files to waveform
/// PNG images.
///
/// This tool accepts various command line arguments to customize the output of
/// the waveforms, such as image dimensions, colors for different channels and
/// background, buffer size for streaming, and more.
@main
struct Waver: ParsableCommand {
    @Option(help: "Image Width")
    var width: Int = 2048

    @Option(help: "Image Height (must be even)")
    var height: Int = 128

    @Option(help: "Left channel (and mono) color in RGB/RRGGBB/RRGGBBAA hex")
    var leftColor = "00ff99"

    @Option(help: "Right channel color in RGB/RRGGBB/RRGGBBAA hex")
    var rightColor = "99ff00"

    @Option(help: "Background color in RGB/RRGGBB/RRGGBBAA hex")
    var backgroundColor = "ffffff00"

    @Option(name: .shortAndLong, help: "Optional output file name (defaults to input+.png)")
    var outputFilename = ""

    @Option(help: "File extensions to process when given a directory")
    var fileExtensions = ["mp3", "m4a"]

    // Constrain our audio reading buffer size such that we stream
    // the audio but not in too small of chunk sizes and not too large.
    //
    // If you set this too large, the memory load and cache performance
    // can be impacted.  If you set it too small, the number stream read
    // requests go up and can impact performance.
    //
    // Perf Tests - when processing a directory tree with 579 MP3 files that
    // total around 3.4GB in size, all in highest quality variable bit rate
    // encodings, we get the following performance on my MacBookPro:
    // (Measured after multiple repeated runs with the same buffer size,
    // taking the best run which usually was the last run)
    //
    //       4194303        131071         65535         32767          4095  --buffer-size
    // ------------- ------------- ------------- ------------- -------------  -------------------------
    //          9.35          8.79          8.64          8.77          9.07  real time
    //        139.18        137.43        135.36        136.81        136.76  user (cpu user time)
    //          4.12          0.77          0.75          1.02          5.11  sys (cpu OS time)
    //    2806333440     206422016     162676736     141836288     123486208  maximum resident set size
    // 1956819286348 1944140432707 1944272337536 1946504453175 1979054857635  instructions retired
    //  511558401161  493050273427  485401415345  482658543430  506685982189  cycles elapsed
    //    2802748008     201704432     157926360     137069504     118653792  peak memory footprint
    //
    // Note how increased buffer sizes slow things down due to the impact on
    // CPU cache and smaller sizes slow things down due to number of additional
    // read stream calls.
    //
    // The peak performance is somewhere between 32K and 128K.  In order to test
    // all of this, I had it as a parameter but have the default value at the
    // peak performance point.
    //
    // I have this as an option so that it can be tuned for performance
    // and experimented with.  I have also found some cases where the
    // AVAudioFile.length is not accurate and reading until that length
    // can fail.  This is actually more common than I expected.  See comments
    // in the code where I handle those conditions.
    @Option(help: "Buffer size to use while streaming audio (in frames)")
    var bufferSize = AVAudioFrameCount(65535)

    @Flag(help: "Do not write anything")
    var dryRun = false

    @Flag(help: "Overwrite existing output file")
    var overwrite = false

    @Flag(help: "Quieter output")
    var quiet = false

    @Flag(help: "Verbose - show overwrite warnings")
    var verbose = false

    // The audio file(s) or directory to process.
    @Argument(help: "The audio file to process (or files/directory if not using --output-filename)")
    var audioFilenames: [String]

    /// An error type for handling argument validation issues.
    struct ArgumentError: LocalizedError {
        let description: String

        init(_ description: String) {
            self.description = description
        }

        var errorDescription: String? {
            description
        }
    }

    /// An error type for handling waveform generation issues.
    struct GenerationError: LocalizedError {
        let description: String

        init(_ description: String) {
            self.description = description
        }

        var errorDescription: String? {
            description
        }
    }

    /// Converts a hex color string to a `CGColor` object.
    /// Supports these RGB formats:
    ///  RGB      - transforms to RRGGBB
    ///  RRGGBB   - assumes alpha is 1.0
    ///  RRGGBBAA - alpha value is 00 to FF (0.0 to 1.0)
    ///
    /// - Parameter rgbString: hex string representing the color (RGB, RRGGBB, or RRGGBBAA).
    /// - Returns: A `CGColor` object.
    /// - Throws: An `ArgumentError` if the color string is invalid.
    func cgColorFrom(rgbString: String) throws -> CGColor {
        guard
            let rgb = UInt(rgbString, radix: 16)
        else {
            throw ArgumentError(
                "Invalid color format: '\(rgbString)' - invalid hex number")
        }
        // Simple 3 digit RGB color
        if rgbString.count == 3 {
            return CGColor(
                red: (CGFloat)(((rgb & 0x0F00) >> 8) * 17) / 255.0,
                green: (CGFloat)(((rgb & 0x00F0) >> 4) * 17) / 255.0,
                blue: (CGFloat)((rgb & 0x000F) * 17) / 255.0,
                alpha: 1.0)
        }
        // Common 6 digit RGB color
        if rgbString.count == 6 {
            return CGColor(
                red: (CGFloat)((rgb & 0x00FF_0000) >> 16) / 255.0,
                green: (CGFloat)((rgb & 0x0000_FF00) >> 8) / 255.0,
                blue: (CGFloat)(rgb & 0x0000_00FF) / 255.0,
                alpha: 1.0)
        }
        // Common 8 digit RGBA color
        if rgbString.count == 8 {
            return CGColor(
                red: (CGFloat)((rgb & 0x00_FF00_0000) >> 24) / 255.0,
                green: (CGFloat)((rgb & 0x00_00FF_0000) >> 16) / 255.0,
                blue: (CGFloat)((rgb & 0x00_0000_FF00) >> 8) / 255.0,
                alpha: (CGFloat)(rgb & 0x00_0000_00FF) / 255.0)
        }
        // Unknown format
        throw ArgumentError(
            "Invalid color format: '\(rgbString)' - We support RGB, RRGGBB, and RRGGBBAA hex formats"
        )
    }

    /// Validates the command-line arguments provided.  It can not mutate
    /// anything, just success or throws an error.  Automatically called
    /// by the argument parser after completing the parsing of the command
    /// line arguments and before calling the run() entry point.
    ///
    /// - Throws: An `ArgumentError` if any validation fails.
    func validate() throws {
        // Since validation can't actually save the results, just checks, we
        // just try to convert each color argument to validate that they do
        // correctly convert during validation such that we don't have to do it
        // later.
        _ = try cgColorFrom(rgbString: leftColor)
        _ = try cgColorFrom(rgbString: rightColor)
        _ = try cgColorFrom(rgbString: backgroundColor)

        if width < 16 {
            throw ArgumentError("--width must be at least 16 pixels")
        }

        // We need the height to be even (left/right channels) and at least 6 pixels
        if height < 6 || height & 1 == 1 {
            throw ArgumentError("--height must be at least 6 pixels and even")
        }

        if bufferSize < 1024 {
            throw ArgumentError("--buffer-size must be at least 1024 frames")
        }

        if audioFilenames.count < 1 {
            throw ArgumentError("No audio file specified")
        }

        // Check that output filename is not specified with multiple audio files
        if audioFilenames.count > 1 && outputFilename.count > 0 {
            throw ArgumentError("Cannot specify --output-filename with multiple audio files")
        }

        // Check that audio files/directories exist
        for filename in audioFilenames {
            var isDirectory: ObjCBool = false
            if !FileManager.default.fileExists(atPath: filename, isDirectory: &isDirectory) {
                throw ArgumentError("Could not find audio-filename: \(filename)")
            }
            // Check that output filename is not specified with a directory
            if outputFilename.count > 0 && isDirectory.boolValue {
                throw ArgumentError("Cannot specify --output-filename with a directory")
            }
        }
    }

    /// Prints messages to standard error.
    ///
    /// - Parameter items: The items to print.
    /// - Parameter separator: A string to separate the printed items (default is space).
    /// - Parameter terminator: The character(s) to print after the last item (default is newline).
    func printToStdErr(_ items: Any..., separator: String = " ", terminator: String = "\n") {
        if !quiet {
            let output =
                items
                .map { String(describing: $0) }
                .joined(separator: separator) + terminator

            FileHandle.standardError.write(output.data(using: .utf8)!)
        }
    }

    /// Generates a waveform PNG image from an audio file.
    ///
    /// This method reads audio data in chunks (streaming) to minimize memory usage,
    /// processes each chunk to compute the waveform, and draws it onto a Core Graphics context.
    ///
    /// - Parameters:
    ///   - inputFile: The path to the input audio file.
    ///   - outputFile: The path where the output PNG image should be saved.
    ///   - colors: An array of `CGColor` objects representing the left channel, right channel,
    ///     and background colors in that order.
    /// - Throws: A `GenerationError` if any issues occur during processing, such as file I/O errors
    ///           or invalid audio formats.
    func generateWavePng(
        inputFile: String,
        outputFile: String,
        colors: [CGColor]
    ) throws {
        // We were going to make the image and PNG as a 2-bit (4 color) image
        // but MacOS libraries no longer seem to support that.  This results in
        // PNG files that are a bit larger (about twice as large) as they would
        // be if they were 2bpp.  But they are rather small anyway.

        // Check for existing output file if overwriting is not allowed
        if !overwrite && FileManager.default.fileExists(atPath: outputFile) {
            if !verbose {
                return
            }
            throw GenerationError(
                "Output file '\(outputFile)' already exists - use --overwrite to overwrite")
        }
        do {
            // Open the MP3 file without loading it fully into memory
            guard
                let audioFile = try? AVAudioFile(forReading: URL(fileURLWithPath: inputFile))
            else {
                throw GenerationError("Could not open for audio processing")
            }

            guard
                let buffer = AVAudioPCMBuffer(
                    pcmFormat: audioFile.processingFormat,
                    frameCapacity: bufferSize)
            else {
                throw GenerationError("Could not create buffer for audio file")
            }

            let imageWidth = CGFloat(width)
            let imageHeight = CGFloat(height)
            let imageCenter = imageHeight / 2
            let pixelsPerSample = imageWidth / Double(audioFile.length)

            // Get our rendering context for an image of our size
            guard
                let context = CGContext(
                    data: nil,
                    width: width,
                    height: height,
                    bitsPerComponent: 8,  // 8 bits per component
                    bytesPerRow: 0,  // Let CGContext define this
                    space: CGColorSpace(name: CGColorSpace.sRGB)!,
                    bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue)
            else {
                throw GenerationError("Could not create rendering context")
            }

            // Set the background color and fill the entire image with it
            context.setFillColor(colors[colors.count - 1])
            context.fill(CGRect(origin: .zero, size: CGSize(width: imageWidth, height: imageHeight)))

            // Tracks the maximum values for each channel
            var maxVals = [Float](repeating: 0.0, count: 2)

            // Track our progress along the graph.  We use fractional
            // pixel progress as the samples are unlikely to be a perfect
            // multiple of the image width.
            var pixelProgress = CGFloat(0.0)
            var pixelPos = CGFloat(0.0)
            var nextPixel = pixelPos + 1.0

            // Stream the audio data in chunks and render the waveform
            while audioFile.framePosition < audioFile.length {
                let beforePos = audioFile.framePosition
                do {
                    try audioFile.read(into: buffer)
                } catch {
                    // Some audio terminate a bit earlier than expected and
                    // if the buffer size happens to match a multiple of the
                    // early termination point then we did not detect it at
                    // the bottom of the loop due to partial buffer fill and
                    // have to notice it here.
                    // If we are under a buffer length left over, it was likely
                    // this same problem and we just happened to have a buffer
                    // size that was a whole fraction of the early termination
                    // point.  What this really means is that the length of
                    // an AVAudioFile looks to be inaccurate at times.  (Likely
                    // due to VBR MP3 files, which is what I have)
                    // So, in this case, we accept it as "it is what it is"
                    // with optional verbose "warning" being logged
                    if audioFile.length - beforePos < bufferSize {
                        if verbose {
                            printToStdErr("Warning: Premature end reading '\(inputFile)' @ \(beforePos)/\(audioFile.length) \(Double(beforePos*10000/audioFile.length)/100.0)% : \(error.localizedDescription)")
                        }
                        break
                    } else {
                        throw GenerationError(
                            "Error reading @ \(beforePos)/\(audioFile.length): \(error.localizedDescription)"
                        )
                    }
                }

                let channelCount = min(2, Int(buffer.format.channelCount))
                let frameCount = buffer.frameLength

                guard
                    let channelData = buffer.floatChannelData
                else {
                    break
                }

                // Loop over each frame (sample) in the buffer keeping track of
                // the maximum value for each channel.  We do this until we have
                // combined enough samples to fill a single pixel width of the
                // image.  Then we render the pixel and reset the maximum values
                // back to 0.
                for frame in 0..<Int(frameCount) {
                    for channel: Int in 0..<channelCount {
                        // Note that we assume values are -1.0 to 1.0 in the
                        // samples but we protect ourselves by clamping this to
                        // 1.0 max value using the min(1.0, value) trick.
                        maxVals[channel] = max(maxVals[channel], min(1.0, abs(channelData[channel][frame])))
                    }
                    pixelProgress += pixelsPerSample
                    if pixelProgress > nextPixel {
                        // Render this pixel of the image...
                        context.setFillColor(colors[0])
                        context.fill(
                            CGRect(
                                x: pixelPos, y: imageCenter, width: 1,
                                height: round(imageCenter * CGFloat(maxVals[0]))))
                        context.setFillColor(colors[channelCount - 1])
                        context.fill(
                            CGRect(
                                x: pixelPos, y: imageCenter, width: 1,
                                height: -round(imageCenter * CGFloat(maxVals[channelCount - 1]))))

                        // Get ready for the next pixel
                        pixelPos = nextPixel
                        nextPixel += 1.0
                        for channel: Int in 0..<channelCount {
                            maxVals[channel] = 0.0
                        }
                    }
                }

                if buffer.frameLength < bufferSize {
                    // Render the last pixel in the wave for
                    // those cases where we did not get to a
                    // perfect multiple of the image size.
                    if pixelPos < imageWidth {
                        context.setFillColor(colors[0])
                        context.fill(
                            CGRect(
                                x: pixelPos, y: imageCenter, width: 1,
                                height: round(imageCenter * CGFloat(maxVals[0]))))
                        context.setFillColor(colors[channelCount - 1])
                        context.fill(
                            CGRect(
                                x: pixelPos, y: imageCenter, width: 1,
                                height: -round(imageCenter * CGFloat(maxVals[channelCount - 1]))))
                    }
                    // We are done with the file - we should not need to break
                    // out but it turns out AVAudioFile.length is not always
                    // accurate enough and we may think there are a few more
                    // samples to get but, really, we are done since we got
                    // a partial read.  It is faster for us to just break out
                    // here rather than do the next read which will throw and
                    // then break out.
                    break
                }
            }

            // Generate the final image
            let image = context.makeImage()!

            // If we are not dry-run, write the image to the file.
            if !dryRun {
                guard
                    let pngData = NSBitmapImageRep(cgImage: image).representation(
                        using: NSBitmapImageRep.FileType.png, properties: [:])
                else {
                    throw GenerationError("Could not create PNG representation of image")
                }

                try pngData.write(to: URL(fileURLWithPath: outputFile))
                if !quiet {
                    print("Created \(outputFile)")
                }
            }
            else if verbose {
                    print("DryRun \(outputFile)")
            }
        } catch {
            throw GenerationError("Error processing '\(inputFile)' : \(error.localizedDescription)")
        }
    }

    /// Recursively processes files in a directory tree.
    ///
    /// This method traverses the directory structure starting at the given URL and applies
    /// the provided handler closure to each matching file.
    ///
    /// - Parameters:
    ///   - path: The URL pointing to a file or directory to process.
    ///   - handler: A closure that processes each matching file.  The closure receives a `URL`
    ///             pointing to the file to be processed.
    /// - Throws: A `GenerationError` if there are issues accessing files or directories.
    func handleFiles(path: URL, handler: (URL) -> Void) throws {
        let fileManager = FileManager.default

        // Check if the directory exists or is a file
        var isDirectory: ObjCBool = false
        guard
            fileManager.fileExists(atPath: path.path, isDirectory: &isDirectory)
        else {
            throw GenerationError("Directory does not exist: \(path.path)")
        }
        if !isDirectory.boolValue {
            return handler(path)  // If it is not a directory, just call the handler
        }

        // It is a directory.  Get the contents of the directory
        // and process them.
        let items = try fileManager.contentsOfDirectory(at: path, includingPropertiesForKeys: nil)

        for item in items {
            // If the item is a directory, recursively call the function
            if item.hasDirectoryPath {
                try handleFiles(path: item, handler: handler)
            } else {
                // Check if the item matches the pattern
                for fileExtension in fileExtensions {
                    if item.pathExtension == fileExtension {
                        handler(item)  // Call the handler with the matching file
                        break
                    }
                }
            }
        }
    }

    /// The main execution method that runs the Waver tool.
    ///
    /// - Throws: A `GenerationError` if any issues occur during processing.
    func run() throws {
        // This is the main code - after all of the options have been
        // parsed and somewhat validated.

        // Note that since validation already checked these, the "try" will
        // always work here.  (The validation would have failed if they were
        // invalid and thus run() would never have been called.)
        let colors = [
            try cgColorFrom(rgbString: leftColor),  // User left channel color (and mono)
            try cgColorFrom(rgbString: rightColor),  // User right channel color
            try cgColorFrom(rgbString: backgroundColor),  // User background color (last element)
        ]

        // We need a queue group to wait for work to finish...
        let workGroup = DispatchGroup()
        // We need a dispatch group that lets work run concurrently...
        let workQueue = DispatchQueue(label: "waver.workQueue", attributes: .concurrent)

        // Safely accumulate errors from concurrent file processing
        // We need a safe way to accumulate errors from the generators
        // Unfortunately, Swift does not have such a safe thing built in
        // and building it safely still requires users to mark it as unsafe.
        class ErrorList {
            private var strings: [String] = ["while processing files:"]
            private let lock = NSLock()

            // Method to append a string in a thread-safe manner
            func safeAppend(_ string: String) {
                lock.lock()
                defer { lock.unlock() }
                strings.append(string)
            }

            func hasErrors() -> Bool {
                lock.lock()
                defer { lock.unlock() }
                return strings.count > 1
            }

            func getText() -> String {
                lock.lock()
                defer { lock.unlock() }
                return strings.joined(separator: "\n* ")
            }
        }

        // This is safe when used to append to the list
        nonisolated(unsafe) let errors = ErrorList()

        for path in audioFilenames {
            do {
                // We run handleFiles which recursively processes a path
                // if it is a directory.  In all cases, it then calls
                // our closure (lambda) for each of the files.
                try handleFiles(path: URL(fileURLWithPath: path)) { fileURL in
                    // At this point we have a single file name
                    // as passed to us from the handleFiles function.
                    // We use the dispatch queue to dispatch the processing
                    // of that file here.  This way we don't actually do the
                    // work directly but just queue the work and let each
                    // file be processed in parallel with other files.
                    let filename = fileURL.path
                    workQueue.async(group: workGroup) {
                        do {
                            try generateWavePng(
                                inputFile: filename,
                                // If output filename given, use it, otherwise generate one
                                outputFile: (outputFilename.count > 0) ? outputFilename : filename + ".png",
                                colors: colors)
                        } catch {
                            let errorText = "\(error.localizedDescription)"
                            printToStdErr(errorText)
                            errors.safeAppend(errorText)
                        }
                    }
                }
            } catch {
                let errorText = "Error handling '\(path)' : \(error.localizedDescription)"
                printToStdErr(errorText)
                errors.safeAppend(errorText)
            }
        }

        // Wait for all file processing to complete
        workGroup.wait()

        // Throw an error if any issues occurred during processing
        if errors.hasErrors() {
            // If there are errors, print a blank line to split the output
            // and then throw a detailed error text that will be rendered
            // by the argument parser entry point.
            printToStdErr("")
            throw GenerationError(errors.getText())
        }
    }
}

Rust

/// Waver: Generate waveform visualizations from audio files.
///
/// This tool creates PNG visualizations of audio waveforms from various audio
/// file formats.  It supports multiple audio file processing, customizable
/// colors, and various output options.
///
/// # Architecture
///
/// The program follows a data processing pipeline:
/// 1. Parse and validate command-line arguments
/// 2. Collect audio files to process
/// 3. Process each file in parallel, generating waveform images
/// 4. Report any errors that occurred during processing
///
/// # Performance
///
/// Key performance optimizations:
/// - Parallel processing of audio files using rayon
/// - Streaming audio decoding rather than buffering
/// - 2-bit pixel depth in PNG output for smaller files

use std::fs::File;
use std::io;
use std::io::BufWriter;
use std::path::{Path, PathBuf};
use std::str::FromStr;
use std::sync::Mutex;

use clap::Parser;
use png::{Encoder, FilterType};
use rayon::prelude::*;
use symphonia::core::audio::{AudioBuffer, Signal};
use symphonia::core::codecs::{DecoderOptions, CODEC_TYPE_NULL};
use symphonia::core::formats::FormatOptions;
use symphonia::core::io::MediaSourceStream;
use symphonia::core::meta::MetadataOptions;
use symphonia::core::probe::Hint;
use thiserror::Error;
use walkdir::WalkDir;

//------------------------------------------------------------------------------
// Audio Processing
//------------------------------------------------------------------------------

/// Audio processing functionality for waveform generation.

/// Generates a waveform visualization from an audio file.
///
/// # Arguments
///
/// * `input_path` - Path to the input audio file
/// * `output_path` - Path where the output PNG will be saved
/// * `args` - Command-line arguments containing configuration
///
/// # Returns
///
/// `Ok(())` on success, or an error if processing fails
pub fn generate_waveform(
    input_path: &AudioPath,
    output_path: impl AsRef<Path>,
    args: &WaverArgs,
) -> Result<()> {
    let input_path = input_path.path();
    let output_path = output_path.as_ref();

    // Skip if output exists and overwrite isn't allowed
    if !args.overwrite && output_path.exists() {
        if args.verbose {
            return Err(WaverError::generation_error(format!(
                "Output file '{}' already exists - use --overwrite",
                output_path.display()
            )));
        }
        return Ok(());
    }

    // Generate the image buffer
    let mut image = WaveImage::new(args.width, args.height);

    // Process audio file and generate waveform
    process_audio_file(input_path, &mut image, args.width())?;

    // Save or log the result
    if !args.dry_run {
        image.save_png(
            &args.background_color,
            &args.left_color,
            &args.right_color,
            output_path,
        )?;
        args.print_to_stdout(&format!("Created {}", output_path.display()));
    } else if args.verbose {
        args.print_verbose(&format!("DryRun {}", output_path.display()));
    }

    Ok(())
}

/// Processes an audio file and generates a waveform visualization using a streaming approach.
///
/// This function opens an audio file, decodes it frame by frame, and immediately
/// processes each frame to generate the waveform image, without storing all audio data in memory.
///
/// # Performance
///
/// This is a performance-critical function.  It uses a streaming approach rather than buffering the
/// entire audio file, which results in:
/// - ~24x lower memory usage
/// - ~6.7x faster execution time
/// - Significantly fewer system calls
///
/// Do not change this to buffer all audio samples, as that would cause severe performance degradation.
///
/// # Arguments
///
/// * `input_path` - Path to the input audio file
/// * `image` - The waveform image to draw into
/// * `width` - Width of the output image in pixels
///
/// # Returns
///
/// `Ok(())` on success, or an error if processing fails
fn process_audio_file(input_path: &Path, image: &mut WaveImage, width: u32) -> Result<()> {
    // Open and probe the audio file
    let file = File::open(input_path)?;
    let mss = MediaSourceStream::new(Box::new(file), Default::default());

    let probed = symphonia::default::get_probe().format(
        &Hint::new(),
        mss,
        &FormatOptions::default(),
        &MetadataOptions::default(),
    )?;

    // Extract the first audio track
    let mut format = probed.format;
    let track = format
        .tracks()
        .iter()
        .find(|t| t.codec_params.codec != CODEC_TYPE_NULL)
        .ok_or_else(|| {
            WaverError::generation_error(format!(
                "No audio track found in '{}'",
                input_path.display()
            ))
        })?;

    // Initialize decoder
    let mut decoder = symphonia::default::get_codecs().make(
        &track.codec_params,
        &DecoderOptions {
            ..Default::default()
        },
    )?;

    // Get channel information
    let channel_count = track
        .codec_params
        .channels
        .map(|c| c.count())
        .unwrap_or(1)
        .min(2) as usize;

    // Get total number of frames (samples per channel) for scaling calculation
    let total_samples = track.codec_params.n_frames.unwrap_or(0).max(1) as u64;

    // Calculate samples per pixel and the fractional
    // samples per pixel in 1/width units - since we have
    // to use width as u64 a number of times, do that conversion once
    let width64 = width as u64;
    let samples_per_pixel = total_samples / width64;
    let fractional_samples = total_samples % width64;

    // Initialize state variables for processing
    let mut left = 0.0f32;
    let mut right = 0.0f32;
    let mut sample_progress = samples_per_pixel;
    let mut partial_progress = 0 as u64;
    let mut pixel_pos = 0;

    // Process audio stream packet by packet
    while let Ok(packet) = format.next_packet() {
        let decoded = decoder.decode(&packet)?;
        let mut buffer = AudioBuffer::<f32>::new(decoded.capacity() as u64, *decoded.spec());
        decoded.convert(&mut buffer);

        // Process each frame in the current packet
        for frame in 0..buffer.frames() {
            // Update max amplitude values for each channel
            left = left.max(buffer.chan(0)[frame].abs().min(1.0));
            if channel_count > 1 {
                right = right.max(buffer.chan(1)[frame].abs().min(1.0));
            }

            // Map samples to pixels
            sample_progress -= 1;

            if sample_progress == 0 {
                // When we've accumulated enough samples for a pixel, draw it
                if channel_count > 1 {
                    image.draw_point(pixel_pos, left, right);
                } else {
                    image.draw_point_mono(pixel_pos, left);
                }
                left = 0.0;
                right = 0.0; // Reset max values for next pixel
                pixel_pos += 1;
                sample_progress = samples_per_pixel;
                partial_progress += fractional_samples;
                // If we got enough fractional samples to get another
                // sample in this next section, bump it by one and
                // subtract the width.
                if partial_progress >= width64 {
                    partial_progress -= width64;
                    sample_progress += 1;
                }
            }
        }
    }

    // Draw any remaining partial pixel
    if pixel_pos < width {
        if channel_count > 1 {
            image.draw_point(pixel_pos, left, right);
        } else {
            image.draw_point_mono(pixel_pos, left);
        }
    }

    Ok(())
}

//------------------------------------------------------------------------------
// Image Generation
//------------------------------------------------------------------------------

/// Waveform image generation functionality with optimized 2-bit PNG output.

/// Represents the different channel types in a waveform image.
///
/// Using an enum instead of constants provides better type safety and
/// makes the code more self-documenting.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Channel {
    /// Background (transparent or base color)
    Background = 0,
    /// Left audio channel (typically drawn above center)
    Left = 1,
    /// Right audio channel (typically drawn below center)
    Right = 2,
}

impl From<Channel> for u8 {
    fn from(channel: Channel) -> Self {
        channel as u8
    }
}

impl From<u8> for Channel {
    fn from(value: u8) -> Self {
        match value {
            0 => Channel::Background,
            1 => Channel::Left,
            2 => Channel::Right,
            _ => Channel::Background, // Default to background for invalid values
        }
    }
}

/// Represents a waveform visualization image with 2-bit pixel depth optimization.
///
/// Contains the image dimensions and pixel data where each pixel is represented
/// by an index value (0 for background, 1 for left channel, 2 for right channel).
/// Uses 2 bits per pixel for significant space savings in the output PNG.
pub struct WaveImage {
    /// Width of the image in pixels.
    width: u32,

    /// Height of the image in pixels.
    height: u32,

    /// Vertical center line position.
    center: u32,

    /// Line size in bytes (due to 2 bits per pixel)
    line_width: u32,

    /// Pixel data stored as channel indices.
    /// During image generation, we use 1 byte per pixel for simplicity.
    pixels: Vec<u8>,
}

/// Convert a color index to the bit location based on the x coordinate
///
/// # Arguments
///
/// # `color` - The 2-bit color
/// # `x` - The horizontal position of the pixel
///
/// # Returns
///
/// The u8 with the color bits shifted to the correct location for 2-bpp
fn draw_bits(color: u8, x: u32) -> u8 {
    (color & 3) << (2 * (x & 3))
}

impl WaveImage {
    /// Creates a new waveform 2-bit per pixel image with the specified
    /// dimensions.  We render directly into the 2-bit per pixel form
    /// to reduce memory footprint and because we can do it efficiently.
    ///
    /// # Arguments
    ///
    /// * `width` - Width of the image in pixels
    /// * `height` - Height of the image in pixels (must be even)
    ///
    /// # Returns
    ///
    /// A new WaveImage instance initialized with background pixels
    ///
    /// # Notes
    ///
    /// This is a 2-bit per pixel image since we really only need
    /// at most 4 colors:
    ///   0:  Background color
    ///   1:  Left Channel  (or mono)
    ///   2:  Right Channel
    ///   3:  Background due to Left and Right collision
    pub fn new(width: Width, height: Height) -> Self {
        let width_val = width.value();
        let line_val = (width_val + 3) >> 2;
        let height_val = height.value();

        Self {
            width: width_val,
            height: height_val,
            line_width: line_val,
            center: height.center(),
            pixels: vec![0 as u8; (line_val * height_val) as usize],
        }
    }

    /// Draws a single point (left and right channels) of the waveform.
    ///
    /// # Arguments
    ///
    /// * `x` - The horizontal position to draw at
    /// * `left` - Left channel maximum amplitude
    /// * `right` - Right channel maximum amplitude
    pub fn draw_point(&mut self, x: u32, left: f32, right: f32) {
        if x >= self.width {
            return;
        }

        // The byte offset where the 2-bit pixel will be
        let offset = x >> 2;

        // Draw left channel (above center, going up)
        // The bits for the left channel at this pixel offset
        let draw_left = draw_bits(Channel::Left as u8, x);
        let left_height = (self.center as f32 * left + 0.5) as u32;
        for y in self.center.saturating_sub(left_height)..self.center {
            let idx = (offset + y * self.line_width) as usize;
            self.pixels[idx] |= draw_left;
        }

        // Draw right channel (below center, going down)
        // The bits for the right channel at this pixel offset
        let draw_right = draw_bits(Channel::Right as u8, x);
        let right_height = (self.center as f32 * right + 0.5) as u32;
        let max_y = std::cmp::min(self.center + right_height, self.height);
        for y in self.center..max_y {
            let idx = (offset + y * self.line_width) as usize;
            self.pixels[idx] |= draw_right;
        }
    }

    /// Draws a single point for mono audio (symmetric around center).
    ///
    /// # Arguments
    ///
    /// * `x` - The horizontal position to draw at
    /// * `mono` - Mono channel maximum amplitude
    pub fn draw_point_mono(&mut self, x: u32, mono: f32) {
        if x >= self.width {
            return;
        }

        // The byte offset where the 2-bit pixel will be
        let offset = x >> 2;

        // Bit position for the pixel
        let draw = draw_bits(Channel::Left as u8, x);

        let wave_height = (self.center as f32 * mono + 0.5) as u32;
        let y_start = self.center.saturating_sub(wave_height);
        let y_end = std::cmp::min(self.center + wave_height, self.height);

        for y in y_start..y_end {
            let idx = (offset + y * self.line_width) as usize;
            self.pixels[idx] |= draw;
        }
    }

    /// Saves the waveform image as a PNG file with 2-bit pixel depth optimization.
    ///
    /// # Performance
    ///
    /// This function implements several critical optimizations:
    /// - Uses 2-bit color depth instead of 8-bit (75% size reduction)
    /// - Uses indexed color mode with a minimal 3-color palette
    /// - Applies the Up filter which is optimal for waveform imagery
    /// - Uses maximum PNG compression for smallest possible files
    ///
    /// Changing these settings, especially the bit depth or filter type,
    /// would significantly impact file size or performance.
    ///
    /// # Arguments
    ///
    /// * `background` - Background color
    /// * `left` - Left channel color
    /// * `right` - Right channel color
    /// * `output_path` - Path where the PNG file will be saved
    ///
    /// # Returns
    ///
    /// `Ok(())` on success, or a WaverError on failure
    pub fn save_png(
        &self,
        background: &Rgba,
        left: &Rgba,
        right: &Rgba,
        output_path: impl AsRef<Path>,
    ) -> Result<()> {
        // Create palette for indexed color PNG
        let palette = [
            background.red,
            background.green,
            background.blue,
            left.red,
            left.green,
            left.blue,
            right.red,
            right.green,
            right.blue,
            background.red,
            background.green,
            background.blue,
        ];

        // Create transparency array
        let transparent = [background.alpha, left.alpha, right.alpha, background.alpha];

        // Create the output file and BufWriter
        let file = File::create(output_path)?;
        let mut encoder = Encoder::new(BufWriter::new(file), self.width, self.height);

        // Configure the PNG encoder - use 2-bit depth since we only need 3 colors
        encoder.set_color(png::ColorType::Indexed);
        encoder.set_depth(png::BitDepth::Two);
        encoder.set_palette(&palette);
        encoder.set_trns(&transparent);

        // Optimize for waveform imagery which typically has vertical runs
        encoder.set_filter(FilterType::Up);

        // Use maximum compression
        encoder.set_compression(png::Compression::Best);

        // Write the PNG data
        let mut writer = encoder.write_header()?;
        writer.write_image_data(&self.pixels)?;
        writer.finish()?;

        Ok(())
    }
}

//------------------------------------------------------------------------------
// CLI Types and Arguments
//------------------------------------------------------------------------------

/// Command-line interface functionality.

/// Custom types for command-line argument validation.
///
/// These types provide strongly-typed representations of command-line arguments
/// with built-in validation. These types make invalid states unrepresentable and
/// push validation to the earliest possible point - during argument parsing.
///
/// # Design Philosophy
///
/// Rather than validate arguments after parsing, we use Rust's type system to:
/// 1. Ensure values meet constraints (e.g., minimum width/height)
/// 2. Provide clear, targeted error messages directly during parsing
/// 3. Allow functions to assume arguments are already valid
/// 4. Make the code more self-documenting
///
/// # Usage
///
/// These types implement FromStr and can be used with clap's value_parser:
/// ```
/// #[arg(value_parser = clap::value_parser!(Width))]
/// pub width: Width,
/// ```

/// A validated width value for the waveform image.
///
/// Ensures the width is at least 16 pixels.
#[derive(Debug, Clone, Copy)]
pub struct Width(u32);

impl Width {
    /// The minimum allowed width in pixels.
    pub const MIN_WIDTH: u32 = 16;

    /// Creates a new validated width.
    pub fn new(width: u32) -> Result<Self> {
        if width < Self::MIN_WIDTH {
            return Err(WaverError::argument_error(format!(
                "Width must be at least {} pixels",
                Self::MIN_WIDTH
            )));
        }
        Ok(Self(width))
    }

    /// Returns the width value.
    pub fn value(&self) -> u32 {
        self.0
    }
}

impl FromStr for Width {
    type Err = WaverError;

    fn from_str(s: &str) -> Result<Self> {
        let width = s
            .parse::<u32>()
            .map_err(|_| WaverError::argument_error("Width must be a positive integer"))?;
        Self::new(width)
    }
}

/// A validated height value for the waveform image.
///
/// Ensures the height is at least 6 pixels and even.
#[derive(Debug, Clone, Copy)]
pub struct Height(u32);

impl Height {
    /// The minimum allowed height in pixels.
    pub const MIN_HEIGHT: u32 = 6;

    /// Creates a new validated height.
    pub fn new(height: u32) -> Result<Self> {
        if height < Self::MIN_HEIGHT {
            return Err(WaverError::argument_error(format!(
                "Height must be at least {} pixels",
                Self::MIN_HEIGHT
            )));
        }
        if height % 2 != 0 {
            return Err(WaverError::argument_error("Height must be an even number"));
        }
        Ok(Self(height))
    }

    /// Returns the height value.
    pub fn value(&self) -> u32 {
        self.0
    }

    /// Returns the vertical center line position.
    pub fn center(&self) -> u32 {
        self.0 / 2
    }
}

impl FromStr for Height {
    type Err = WaverError;

    fn from_str(s: &str) -> Result<Self> {
        let height = s
            .parse::<u32>()
            .map_err(|_| WaverError::argument_error("Height must be a positive integer"))?;
        Self::new(height)
    }
}

/// A validated audio file path.
///
/// Ensures the path exists and is a file.
#[derive(Debug, Clone)]
pub struct AudioPath(PathBuf);

impl AudioPath {
    /// Creates a new validated audio path.
    pub fn new(path: impl AsRef<Path>) -> Result<Self> {
        let path = path.as_ref();
        if !path.exists() {
            return Err(WaverError::argument_error(format!(
                "Path does not exist: {}",
                path.display()
            )));
        }
        Ok(Self(path.to_path_buf()))
    }

    /// Returns whether this path points to a directory.
    pub fn is_dir(&self) -> bool {
        self.0.is_dir()
    }

    /// Returns the path.
    pub fn path(&self) -> &Path {
        &self.0
    }
}

impl FromStr for AudioPath {
    type Err = WaverError;

    fn from_str(s: &str) -> Result<Self> {
        Self::new(s)
    }
}

/// A validated audio file extension.
///
/// Ensures the extension is valid.
#[derive(Debug, Clone)]
pub struct FileExtension(String);

impl FileExtension {
    /// Creates a new validated file extension.
    pub fn new(extension: impl AsRef<str>) -> Result<Self> {
        let extension = extension.as_ref().trim().to_lowercase();
        if extension.is_empty() {
            return Err(WaverError::argument_error("File extension cannot be empty"));
        }
        Ok(Self(extension))
    }

    /// Returns the extension string.
    pub fn as_str(&self) -> &str {
        &self.0
    }
}

impl FromStr for FileExtension {
    type Err = WaverError;

    fn from_str(s: &str) -> Result<Self> {
        Self::new(s)
    }
}

/// A collection of validated file extensions.
#[derive(Debug, Clone)]
pub struct FileExtensions(Vec<FileExtension>);

impl FileExtensions {
    /// Creates a new collection of validated file extensions.
    pub fn new(extensions: Vec<impl AsRef<str>>) -> Result<Self> {
        let mut validated_extensions = Vec::with_capacity(extensions.len());

        for ext in extensions {
            validated_extensions.push(FileExtension::new(ext)?);
        }

        Ok(Self(validated_extensions))
    }

    /// Returns the file extensions as a vector of strings.
    pub fn as_strings(&self) -> Vec<String> {
        self.0.iter().map(|e| e.as_str().to_string()).collect()
    }
}

impl FromStr for FileExtensions {
    type Err = WaverError;

    fn from_str(s: &str) -> Result<Self> {
        let extensions = s
            .split(',')
            .map(|part| part.trim())
            .filter(|part| !part.is_empty())
            .collect::<Vec<_>>();

        if extensions.is_empty() {
            return Err(WaverError::argument_error("No file extensions specified"));
        }

        Self::new(extensions)
    }
}

//------------------------------------------------------------------------------
// Color Handling
//------------------------------------------------------------------------------

/// Color handling functionality for waveform visualization.

/// Represents an RGBA color.
#[derive(Clone, Debug)]
pub struct Rgba {
    /// Red component (0-255)
    pub red: u8,
    /// Green component (0-255)
    pub green: u8,
    /// Blue component (0-255)
    pub blue: u8,
    /// Alpha component (0-255)
    pub alpha: u8,
}

impl FromStr for Rgba {
    type Err = WaverError;

    /// Parses a color from a string in the following formats:
    /// - RGB (3-digit hex): e.g. "F00" for bright red
    /// - RRGGBB (6-digit hex): e.g. "FF0000" for bright red
    /// - RRGGBBAA (8-digit hex): e.g. "FF0000FF" for opaque bright red
    fn from_str(color: &str) -> Result<Self> {
        let hex = color.trim();
        let value = u32::from_str_radix(hex, 16)
            .map_err(|e| WaverError::argument_error(format!("Invalid color format: {}", e)))?;

        match hex.len() {
            3 => Ok(Rgba {
                red: ((value & 0xF00) >> 8) as u8 * 17,
                green: ((value & 0x0F0) >> 4) as u8 * 17,
                blue: (value & 0x00F) as u8 * 17,
                alpha: 255,
            }),
            6 => Ok(Rgba {
                red: ((value & 0xFF0000) >> 16) as u8,
                green: ((value & 0x00FF00) >> 8) as u8,
                blue: (value & 0x0000FF) as u8,
                alpha: 255,
            }),
            8 => Ok(Rgba {
                red: ((value & 0xFF000000) >> 24) as u8,
                green: ((value & 0x00FF0000) >> 16) as u8,
                blue: ((value & 0x0000FF00) >> 8) as u8,
                alpha: (value & 0x000000FF) as u8,
            }),
            _ => Err(WaverError::argument_error(
                "Color must be in RGB, RRGGBB, or RRGGBBAA format",
            )),
        }
    }
}

//------------------------------------------------------------------------------
// Error Handling
//------------------------------------------------------------------------------

/// Error types for the waver application.
///
/// This section defines the error types used throughout the application and
/// provides a consistent approach to error handling and propagation.
///
/// # Error Handling Strategy
///
/// The waver application uses a structured error handling approach:
///
/// 1. **Custom Error Types**: All errors are consolidated into the WaverError enum
/// 2. **Context Preservation**: External errors (IO, etc.) are wrapped with context
/// 3. **Early Validation**: Most errors are caught at argument parsing time
/// 4. **Result Propagation**: Errors bubble up with the `?` operator
/// 5. **User-Friendly Messages**: Errors are formatted to be helpful to the user
///
/// This approach makes errors easier to handle, debug, and report to users.

/// Represents all possible errors that can occur in the waver application.
#[derive(Error, Debug)]
pub enum WaverError {
    /// Error when parsing or validating command line arguments.
    #[error("Invalid argument: {0}")]
    ArgumentError(String),

    /// Error during waveform generation process.
    #[error("Waveform generation error: {0}")]
    GenerationError(String),

    /// Error from the underlying IO operations.
    #[error("IO error: {0}")]
    IoError(#[from] io::Error),

    /// Error from the Symphonia audio decoding library.
    #[error("Audio decoding error: {0}")]
    SymphoniaError(#[from] symphonia::core::errors::Error),

    /// Error from the PNG encoding library.
    #[error("PNG encoding error: {0}")]
    PngError(#[from] png::EncodingError),
}

/// Type alias for Result with WaverError.
pub type Result<T> = std::result::Result<T, WaverError>;

impl WaverError {
    /// Create a new ArgumentError with the given message.
    ///
    /// Use this for errors related to command-line arguments or configuration.
    pub fn argument_error(msg: impl Into<String>) -> Self {
        WaverError::ArgumentError(msg.into())
    }

    /// Create a new GenerationError with the given message.
    ///
    /// Use this for errors during the waveform generation process.
    pub fn generation_error(msg: impl Into<String>) -> Self {
        WaverError::GenerationError(msg.into())
    }
}

/// Command line arguments for waveform generation.
#[derive(Parser, Debug)]
#[command(
    name = "waver",
    about = "Generate waveform visualizations from audio files",
    version,
    author
)]
pub struct WaverArgs {
    /// Width of the output image in pixels
    #[arg(long = "width", default_value = "2048", value_parser = clap::value_parser!(Width))]
    pub width: Width,

    /// Height of the output image in pixels (must be even)
    #[arg(long = "height", default_value = "128", value_parser = clap::value_parser!(Height))]
    pub height: Height,

    /// Color for left channel (RGB, RRGGBB, or RRGGBBAA)
    #[arg(long = "left-color", default_value = "00ff99", value_parser = clap::value_parser!(Rgba))]
    pub left_color: Rgba,

    /// Color for right channel (RGB, RRGGBB, or RRGGBBAA)
    #[arg(long = "right-color", default_value = "99ff00", value_parser = clap::value_parser!(Rgba))]
    pub right_color: Rgba,

    /// Background color (RGB, RRGGBB, or RRGGBBAA)
    #[arg(long = "background-color", default_value = "ffffff00", value_parser = clap::value_parser!(Rgba))]
    pub background_color: Rgba,

    /// Output PNG file name (only in single-file mode)
    #[arg(short = 'o', long = "output-filename")]
    pub output_filename: Option<String>,

    /// Comma-separated list of audio file extensions
    #[arg(long = "file-extensions", default_value = "mp3", value_parser = clap::value_parser!(FileExtensions))]
    pub file_extensions: FileExtensions,

    /// Perform actions without generating files
    #[arg(long = "dry-run")]
    pub dry_run: bool,

    /// Overwrite existing output files
    #[arg(long = "overwrite")]
    pub overwrite: bool,

    /// Suppress most output
    #[arg(long = "quiet")]
    pub quiet: bool,

    /// Print additional information
    #[arg(long = "verbose")]
    pub verbose: bool,

    /// Audio files or directories to process
    #[arg(required = true, num_args = 1.., value_parser = clap::value_parser!(AudioPath))]
    pub audio_paths: Vec<AudioPath>,
}

impl WaverArgs {
    /// Parse command-line arguments and validate them.
    pub fn parse_and_validate() -> Result<Self> {
        let args = Self::parse();
        args.validate()?;
        Ok(args)
    }

    /// Validates inter-argument constraints that can't be handled by individual type validations.
    pub fn validate(&self) -> Result<()> {
        // Validate output filename constraints
        if self.audio_paths.len() > 1 && self.output_filename.is_some() {
            return Err(WaverError::argument_error(
                "Cannot specify --output-filename with multiple audio files",
            ));
        }

        // Check directory constraints
        if self.output_filename.is_some() {
            for path in &self.audio_paths {
                if path.is_dir() {
                    return Err(WaverError::argument_error(
                        "Cannot specify --output-filename with a directory",
                    ));
                }
            }
        }

        Ok(())
    }

    /// Prints messages to stderr unless quiet mode is enabled.
    pub fn print_to_stderr(&self, message: &str) {
        if !self.quiet {
            eprintln!("{message}");
        }
    }

    /// Prints messages to stdout (usually for successful operations).
    pub fn print_to_stdout(&self, message: &str) {
        if !self.quiet {
            println!("{message}");
        }
    }

    /// Prints verbose messages if verbose mode is enabled.
    pub fn print_verbose(&self, message: &str) {
        if self.verbose {
            println!("{message}");
        }
    }

    /// Returns the width value.
    pub fn width(&self) -> u32 {
        self.width.value()
    }

    /// Returns the file extensions as strings.
    pub fn file_extensions(&self) -> Vec<String> {
        self.file_extensions.as_strings()
    }
}

/// Main entry point for the waver application.
fn main() -> std::result::Result<(), Box<dyn std::error::Error>> {
    // Parse and validate command-line arguments
    let args = WaverArgs::parse_and_validate()?;

    // Collect all audio files to process
    let mut audio_files = Vec::new();
    for audio_path in &args.audio_paths {
        let path = audio_path.path();
        if path.is_file() {
            // Directly entered file names are just used as is
            // We don't filter it to the extensions
            audio_files.push(path.to_path_buf());
        } else if path.is_dir() {
            // We use WalkDir such that the complexity of loops/etc are handled
            // for us rather than getting us stuck
            for entry in WalkDir::new(path)
                .follow_links(true)
                .into_iter()
                .filter_map(|e| e.ok())
                .filter(|entry| entry.file_type().is_file())
                .filter(|entry| {
                    entry
                        .path()
                        .extension()
                        .and_then(|ext| ext.to_str())
                        .map(|ext_str| args.file_extensions().iter().any(|e| e.eq(ext_str)))
                        .unwrap_or(false)
                })
                .map(|entry| entry.into_path())
            {
                audio_files.push(entry);
            }
        }
    }

    if audio_files.is_empty() {
        return Err(Box::new(WaverError::argument_error(
            "No matching audio files found",
        )));
    }

    if args.verbose {
        args.print_verbose(&format!(
            "Found {} audio files to process",
            audio_files.len()
        ));
    }

    // Process files in parallel, collecting errors
    // PERFORMANCE: Parallel processing is critical for handling multiple files efficiently
    // This section uses Rayon's parallel iterator to process files concurrently
    // while safely collecting errors using a synchronized Mutex
    let errors = Mutex::new(Vec::<String>::new());

    // Convert PathBuf to AudioPath for processing
    audio_files.into_par_iter().for_each(|file_path| {
        // For each file, create a validated AudioPath
        match AudioPath::new(&file_path) {
            Ok(audio_path) => {
                let output_file = args
                    .output_filename
                    .clone()
                    .unwrap_or_else(|| format!("{}.png", file_path.display()));

                if let Err(e) = generate_waveform(&audio_path, &output_file, &args) {
                    let error_msg = format!("{}: {}", file_path.display(), e);
                    args.print_to_stderr(&error_msg);
                    errors.lock().unwrap().push(error_msg);
                }
            }
            Err(e) => {
                let error_msg = format!("Invalid audio path {}: {}", file_path.display(), e);
                args.print_to_stderr(&error_msg);
                errors.lock().unwrap().push(error_msg);
            }
        }
    });

    // Report any errors
    let errors = errors.lock().unwrap();
    if !errors.is_empty() {
        return Err(Box::new(WaverError::generation_error(format!(
            "{} errors occurred while processing files",
            errors.len()
        ))));
    }

    Ok(())
}

Rust vs. Swift & Intel vs Apple Silicon

Background

Initial Swift Implementation

The Rust Alternative

The Benchmarking Process

The Unexpected Discovery (and dope-slap)

Performance

Wall Clock Time (Real Time)

User Time (CPU Time)

System Time

Maximum Resident Set Size

Voluntary Context Switches

Involuntary Context Switches

Instructions Retired

Cycles Elapsed

Instructions Per Cycle

Peak Memory Footprint

Binary Sizes

Performance Details

Test runs:

CPU: Apple M4 Max - MacBookPro 16-inch

CPU: Intel Core i9-9980HK - MacBookPro 16-inch

CPU: Apple M4 Max - MacBookPro 16-inch - Rosetta

CPU: Intel i9-9900K – Linux Tower (Water-Cooled, ~4.72 GHz Average)

Tool Chains:

Notable Differences

Rust Trade-Offs/Benefits

Swift Trade-Offs/Benefits

Big Surprise!

Rust produces better Apple Silicon code

Note on the data set

Small Dataset Performance

CPU: Apple M4 Max - MacBookPro 16-inch

CPU: Intel(R) Core(TM) i9-9980HK - MacBookPro 16-inch

M1 MacBookAir performance

CPU: Apple M1 - MacBookAir

The Code

Swift

Rust