Abstract
A robotic phonograph which can remix vinyl records in real-time using analogue techniques.
A new automated-sound-system (ASS) which performs Hermeneutic Machine Variations of any existing vinyl record. The first HMV was Bright Eyes (HMV–I), occurring during the site-specific installation and performance by Donnachie and Simionato at the notFair exhibition, Melbourne, Australia (see exhibitions section for details).
At the centre of the system is a custom-designed robotic phonograph which can remix vinyl records in real-time using analogue techniques originating in DJ culture of the 1970s, such as breakbeating, backspinning, sampling, and scratching. In parallel to producing a continuously evolving soundscape, the automated-art-system performs simultaneous computational processes (which we call ‘nonhuman listening’) by monitoring the audio output in order to adjust and even modify its own behaviours.
The HMV system incorporates computational sound analysis (sometimes simply referred to as ‘nonhuman listening’) which is capable of identifying potential words and phrases produced when the vinyl record is played, constituting an automated search for 'back-masked' lyrics sometimes associated with 'hidden meanings' (whether intentional or otherwise). The emergent language, composed into statistically probable phrases, is displayed in real time on a screen situated alongside the robotic phonograph. Furthermore, the system ‘rewards' the listener by extending any loops which contain a rich source of phonemes. In this paper we will briefly outline the project, and attempt to situate it within creative practices of glitch-turntablism and broader notions of ‘computational unknowing’.
Keywords: electroacoustic, nonhuman performance, remixologies, glitch turntablism, cybernetics, infinite knowable, conspiracy
Glitch Turntablism
This automated-sound-system explores the combinatorial potential of language within existing literary sources, which in turn forms part of a broader enquiry at the CCU into 'computational unknowing'.
The conceptual framing of the HMV can be traced to our earlier automated-art-systems which used computational processes to alter existing publications. For example, concepts underpinning HMV-V (exhibited at Palazzo Madama, Turin, 2026) link to the Library of Nonhuman Books where a mix of Computer Vision and Natural Language Processing algorithms were used to detect and isolate words within the pages of existing (paper) books– a process we describe as ‘nonhuman reading’. Working within the medium of audio recordings rather than print media, HMV-V also uses algorithmic processes to detect latent language by analyzing the nonlinear and reversed playback of existing vinyl records and presenting any newly ‘discovered’ words as lyrics on a screen. Where the nonhuman books resulted in isolated words on the page, the HMV-V system repeats the ‘uncovered’ back-masked audio sample as a looping sequence, allowing the emergent ‘phantom’ lyrics to become a powerful suggestion to the listener who may perceive them despite their initial implausibility (an effect called auditory pareidolia).
The design of the HMV-V machine, with its distinctive counter-balanced dual-arm stylus, emerged while experimenting with another auatomated-art-system called I Don’t Know What I Think Until I Read What I Write (2025). A robotic scribe which attempts to search for glyphs within non-Euclidean space using the rotary movement of stepper motors to move a calligraphic brush. By adapting this earlier work to move a custom-made electromagnetic stylus across the surface of a vinyl record instead of a brush across paper, we stumbled upon a highly efficient and unique range of motions for ‘playing’ vinyl records in non-indexical ways we likened to gestures used by DJs of the 1970s, such as backspinning, sampling, and scratching.
Nonhuman Listening
The ‘nonhuman listening’ system controlling the robotic phonograph is designed to detect audio generated by its styli moving at different speeds, angles, pressures and directions across the surface of a vinyl record. This (analogue) audio signal is initially passed through standard circuits for pre-amplification [RIAA] and transformation into digital data [ADC] before processing by our custom-coded software in real time.
Once digitised the audio data is decoded and prepared for processing via Librosa and Pyaudio libraries, while the Numpy-based Fast Fourier Transform [FFT] spectral noise filtering and signal refinements enhances the real-time audio stream for subsequent analysis.
Acoustic events detected by this initial analysis are processed for phoneme recognition and language construction with both Allosaurus and Kaldi (Vosk), with the outcomes of the system’s iterative, context-aware meaning generation then output to a screen in sync with the audio (or at least with negligible latency).
The resulting phonemes, words and potential phrases are finally logged, along with the phonograph’s relative motor positions and speeds.
The often slowed, distorted or slurred nature of the speech emerging from the backmasked audio (similar to speech produced by individuals suffering from ‘dysarthria’) is not easily recognised by standard speech recognition systems. In order to accommodate and improve voice detection within these highly distorted audio inputs, we extended the phonetic detection of the model by using TensorFlow to re-train for improved translation (mapping) of the acoustic vectors of the captured audio. This bespoke and expanded mapping of phonemes was then incorporated into our local Vosk library.
It is worth noting that other than this early training of the phoneme recognition, the system does not use any auto-regressive or ‘generative’ AI for its language detection at run-time. The words displayed are constructed almost entirely from audio analysis and phoneme mapping, rather than statistical probability and context. An auto-regressive model in this specific use case tends to create entirely new sequences of words, prioritising the generation of a fluid (‘natural’) text over the direct mapping of words to phonemes, often hallucinating new words to ‘resolve’ the sentence, particularly in the absence of any unambiguous or predictable meaning in the (backmasked) audio. Combined with the hypnotic effect of phoneme-rich audio-looping samples, this ‘old-fashioned AI’ approach may contribute to the pareidolic perception of emergent language in the recording.
Janky-Hermeneutics
The historical invention of the phonograph simultaneously evokes a sense of "immediacy" (it is one of the earliest examples of a seemingly direct, unmediated connection to a past performance) and "hypermediacy" (the invention draws attention to the technology that shapes the sound). The assumption that the raw, acoustic data provided by a recording is sufficient for a simple, self-evident interpretation of musical or spoken content has been widely internalised over the 20th century. Indeed, the famous logo for the original HMV (His Masters Voice company) was a small dog listening to a phonograph playing a record, its head inquisitively cocked to one side as it seems to confound the recording of the voice with the presence of its master. Our own HMV (Hermeneutic Machine Variation) re-instates the tension between these two perceptions; requiring a type of interpretive (hermeneutic) analysis to understand how the sound is being produced and experienced, and how this contributes to the making of meaning.
Arguably any technological reproduction introduces a new context, material limitations, and sonic characteristics that inherently require interpretation. This work problematises such attempts by the listener to interpret the vinyl recording by reconfiguring the phonograph and medium as decoding tools.
Our (robotic) phonograph is deliberately designed to not work as a neutral conveyor of sound. We therefore introduce hermeneutically ‘janky’ concepts; not only by adopting and introducing technology that actively disrupts, disturbs, even destroys the sound it reproduces, thus demanding the listener engage in active interpretation. But also by including processes of auto-suggestion which influence and problematise our perception of the audio recording and the technology itself.
Like other technologies which have rapidly become ubiquitous, current (so-called) intelligent systems such as AI are often granted the privilege of neutrality, insinuating their presence into the everyday tasks we perform across digital and physical worlds– especially when those tasks happen through voice and conversational AI agents, sometimes called chat bots. Yet these systems carry the potential to suggest how we interpret these worlds. Through entangled ‘nonhuman listening’, words and phrases may be perceived, meaning made (and unmade), and new realities emerge. Will we be able to discern our master’s voice?
Acknowledgments. This artwork uses a number of open-source libraries, software and tools— Debian OS; Python; and multiple open source Python libraries such as Librosa, and Fast Fourier Transform tools, for acoustic analysis, refinement, and analogue to digital conversion; Allosaurus phoneme recogniser; Kaldi speech recognition and Vosk speech to text; Tensorflow for phoneme recognition dictionary refinement; Natural Language Toolkit (NLTK); Vader sentiment analysis.
Images in this document are of a prototype of the robotic-phonograph which was part of a larger site-specific installation & performance by Donnachie and Simionato in notFair2026, Melbourne, Australia, curated by Brie Trenerry.