Developer enables zero-copy GPU inference from WebAssembly on Apple Silicon by sharing linear memory directly between CPU and GPU via mmap, Metal, and Wasmtime—eliminating serialization overhead for stateful AI workloads.

Zero-Copy GPU Inference from WebAssembly on Apple Silicon

tl;dr: on Apple Silicon, a WebAssembly module's linear memory can be shared directly with the GPU: no copies, no serialization, no intermediate buffers. The CPU and GPU read and write the same physical bytes. End-to-end, it works: a Wasm guest fills a matrix in its linear memory, the GPU reads it, computes, writes back, and the guest sees the result through the same pointer, same memory, zero copies.Normally Wasm and GPUs are separated by an expensive serialization boundary: on most hardware, ge...