Welcome to DAFx 2018, the international conference on Digital Audio Effects, to be held at Aveiro, Portugal, on September 4–8 2018.

DAFx18 is organised by the University of Aveiro, through its Institute of Electronics and Informatics Engineering (IEETA), in collaboration with the Portuguese Audio Engineering Association (APEA).

The conference will be hosted at the university campus and will feature oral and poster presentations of accepted papers, keynote addresses, tutorials and demonstrations. The social program – including welcome reception, concert and banquet – will offer opportunities for more informal interaction while enjoying the city and the region.

This annual conference is a coming together of those working across the globe in research on digital audio processing for music and speech, sound art, acoustics and related applications.


Meet people

People who share the same passion for sound

Free Lunch & Coffee Breaks

Don't miss it


“Perceptual and cognitive factors for VR audio”, by Catarina Mendonça

Catarina Mendonça

Aalto University, Dept. Signal Processing and Acoustics

There are many challenges faced by those aiming to render and reproduce convincing virtual audio. This tutorial defines key concepts and goals to allow for the feeling of presence in a simulated audio world. The specific role of factors such as individualization of HRTFs and headphones, sensory adaptation, room cues, motion cues, real-time rendering, and multimodal interfaces is addressed. There is a complex interplay between the ideal sound accuracy and several of these factors. When is accuracy perceptually relevant? When can we fool the listener? These questions are answered having in mind indicators such as localization accuracy, externalization, multimodal interactions and attentional effects. There are three main conclusions: 1) what the listener perceives depends on what we ask, 2) sensory adaptation ultimately allows to overcome most technical limitations, and 3) more accurate rendering will always have benefits.

Dr. Catarina Mendonça is an Adjunct Professor in Psychoacoustics at Aalto University’s Acoustics Lab, in Finland. She has a background in Psychology and Cognitive Sciences, having specialised in Psychoacoustics. Throughout her research career she has always worked in perceptual studies in virtual reality. Her main areas of work have been spatial hearing, auditory adaptation, and multisensory processes. Before obtaining her current title, Dr. Catarina Mendonça held three post-doctoral fellowships. First, she was a post-doctoral researcher at Carl von Ossietzky University (Germany) for the German Cluster of Excellence Hearing4all. The topic of her post was auditory cognition. She then became a post-doctoral fellow for the Academy of Finland (Finland) on the topic of multimodal interactions in spatial hearing and spatial audio. She later became a Marie Sklodowska Curie Fellow by EU’s H2020 programme. She worked on the topic of attention mechanisms and perception in different spatial audio setups.

“Digital Audio Filters”, by Vesa Välimäki

Vesa Välimäki

Aalto University

This tutorial will review the basic digital filters used in audio and music processing, such FIR, allpass, and equalizing filters. FIR filtering is carried out by convolving the samples of the input signal with the filter coefficients. An allpass filter has a flat magnitude response and a nonlinear phase response. It is useful in numerous audio applications, such as in artificial reverberation and in delay equalization. Equalizing filters enable enhancement of sound reproduction systems. The tutorial will include sound examples and interactive demonstrations to explain how the digital filters work and what they can achieve.

"Vesa Välimäki is a Full Professor of audio signal processing and the Vice Dean for research at the Aalto University School of Electrical Engineering, Espoo, Finland. He received the MSc in Technology and the Doctor of Science in Technology degrees, both in electrical engineering, from the Helsinki University of Technology, Espoo, Finland, in 1992 and 1995, respectively. In 1996, he was a Postdoctoral Research Fellow at the University of Westminster, London, UK. In 2008-2009, he was a Visiting Scholar at the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, Stanford, CA, USA. He has collaborated in research with companies such as Genelec and Nokia Technologies. Prof. Välimäki is a Fellow of the AES (Audio Engineering Society), a Fellow of the IEEE, and a Life Member of the Acoustical Society of Finland. He is a Senior Area Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing. He has organized several special issues for scientific journals, such as the Audio Signal Processing special issue for the Applied Sciences in 2016. He was the Chairman of the International Conference on Digital Audio Effects DAFx-08 in 2008, and was the Chairman of the Sound and Music Computing Conference SMC-17 in 2017."

"Building plugins and DSP with JUCE”,
by Julian Storer

Julian Storer

This talk is an introduction to the how the JUCE library providesclasses and tools that can help developers who are building plugins (or plugin hosts) and writing DSP algorithms. The topics covered are:

- A quick high-level overview of JUCE and the functional areas it covers;
- A dive into how the audio plugin abstraction layer works and how it would be used to build a simple plugin;
- An overview of how JUCE's plugin hosting classes work and how they might be used to write a simple plugin host;
- A dive into what JUCE's DSP module provides;
- If time permits, a quick introduction to some JUCE GUI library concepts.

No familiarity with JUCE is expected, but the talk will require some experience with C++ to get the most out of it.

“Machine Learning with Applications to Audio”,
by Shahan Nercessian

Julian Storer

More info coming soon...

9:00 Keynote 1 Joshua Reiss Auditorium
Poster Session 1 / Coffee break

Poster Session 1 / Coffee break

Efficient emulation of tape-like delay modulation behaviour

A Combined Model for a Bucket Brigade Device and its Input and Output Filters

Removing Lavalier Microphone Rustle With Recurrent Neural Networks

A Micro-Controlled Digital Effect Unit for Guitars

Various Foyer

Oral Session 1: Analysis/Synthesis 1

Creating Endless Sounds

Autoencoding Neural Networks as Musical Audio Synthesizers

Audio style transfer with rhythmic constraints

Parametric Synthesis of Glissando Note Transitions - A user Study in a Real-Time Application

Various Auditorium
Oral Session 2: Percussive sound separation/transcription

Towards Multi-Instrument Drum Transcription

Increasing Drum Transcription Vocabulary Using Data Synthesis

Stationary/transient audio separation using convolutional autoencoders

Automatic drum transcription with convolutional neural networks

Various Auditorium
15:20 Poster Craze 2 - Auditorium

Poster Session 2 / Coffee break

Optimized Velvet-Noise Decorrelator

Surround Sound without Rear Loudspeakers: Multichannel Compensated Amplitude Panning and Ambisonics

A Feedback Canceling Reverberator

Efficient signal extrapolation by granulation and convolution with velvet noise

Various Foyer

Oral Session 3: Intelligibility & Perception

Improving intelligibility prediction under informational masking using an auditory saliency model

Using semantic differential scales to assess the subjective perception of auditory warning signals

Soundscape auralisation and visualisation: A cross-modal approach to Soundscape evaluation

Acoustic assessment of a classroom and rehabilitation guided by simulation

Various Auditorium
Keynote 1, by Joshua Reiss

Joshua Reiss

More info coming soon...

Poster Session 1-1
"Efficient emulation of tape-like delay modulation behaviour"

Vadim Zavalishin and Julian Parker

A significant part of the appeal of tape-based delay effects is the manner in which the pitch of their output responds to changes in delay-time. Straightforward approaches to implementation of de- lays with tape-like modulation behavior result in algorithms with time complexity proportional to the tape speed, leading to notice- able increases of CPU load at smaller delay times. We propose a method which has constant time complexity, except during tape speedup transitions, where the complexity grows logarithmically, or, if proper antialiasing is desired, linearly with respect to the speedup factor.

Poster Session 1-2
"A Combined Model for a Bucket Brigade Device and its Input and Output Filters"

Martin Holters and Julian Parker

Bucket brigade devices (BBDs) were invented in the late 1960s as a method of introducing a time-delay into an analog electrical circuit. They work by sampling the input signal at a certain clock rate and shifting it through a chain of capacitors to obtain the delay. BBD chips have been used to build a large variety of analog effects processing devices, ranging from chorus to flanging to echo effects. They have therefore attracted interest in virtual analog modeling and a number of approaches to modeling them digitally have appeared. In this paper, we propose a new model for the bucket-brigade device. This model is based on a variable sample- rate, and utilizes the surrounding filtering circuitry found in real devices to avoid the need for the interpolation usually needed in such a variable sample-rate system.

Poster Session 1-3
"Removing Lavalier Microphone Rustle With Recurrent Neural Networks"

Gordon Wichern and Alexey Lukin

The noise that lavalier microphones produce when rubbing against clothing (typically referred to as rustle) can be extremely diffi- cult to automatically remove because it is highly non-stationary and overlaps with speech in both time and frequency. Recent breakthroughs in deep neural networks have led to novel techni- ques for separating speech from non-stationary background noise. In this paper, we apply neural network speech separation techni- ques to remove rustle noise, and quantitatively compare multiple deep network architectures and input spectral resolutions. We find the best performance using bidirectional recurrent networks and spectral resolution of around 20 Hz. Furthermore, we propose an ambience preservation post-processing step to minimize potential gating artifacts during pauses in speech.

Poster Session 1-4
"A Micro-Controlled Digital Effect Unit for Guitars"

Geovani Alves and Marcelo Rosa

Here we present a micro-controlled digital effect unit for guitars. Different from other undergraduate projects, we used high-quality 16-bit Analog-to-Digital (A/D) and Digital-to-Analog (D/A) con- verters operating at 48kHz that respectively transfer data to and from a micro-controller through serial peripheral interfaces (SPIs). We discuss the design decisions for interconnecting all these com- ponents, the project of anti-aliasing (low-pass) filters, and addi- tional features useful for players. Finally, we show some results obtained from this device, and discuss future improvements.

Oral Session 1-1
"Creating Endless Sounds"

Vesa Välimäki, Jussi Ramo and Fabian Esqueda

This paper proposes signal processing methods to extend a station- ary part of an audio signal endlessly. A frequent occasion is that there is not enough audio material to build a synthesizer, but an example sound must be extended or modified for more variabil- ity. Filtering of a white noise signal with a filter designed based on high-order linear prediction or concatenation of the example signal can produce convincing arbitrarily long sounds, such as ambient noise or musical tones, and can be interpreted as a spectral freeze technique without looping. It is shown that the random input sig- nal will pump energy to the narrow resonances of the filter so that lively and realistic variations in the sound are generated. For real- time implementation, this paper proposes to replace white noise with velvet noise, as this reduces the number of operations by 90% or more, with respect to standard convolution, without affecting the sound quality, or by FFT convolution, which can be simplified to the randomization of spectral phase and only taking the inverse FFT. Examples of producing endless airplane cabin noise and pi- ano tones based on a short example recording are studied. The proposed methods lead to a new way to generate audio material for music, films, and gaming.

Oral Session 1-2
"Autoencoding Neural Networks as Musical Audio Synthesizers"

Joseph Colonel, Christopher Curro and Sam Keene

A method for musical audio synthesis using autoencoding neural networks is proposed. The autoencoder is trained to compress and reconstruct magnitude short-time Fourier transform frames. The autoencoder produces a spectrogram by activating its smallest hid- den layer, and a phase response is calculated using real-time phase gradient heap integration. Taking an inverse short-time Fourier transform produces the audio signal. Our algorithm is light-weight when compared to current state-of-the-art audio-producing ma- chine learning algorithms. We outline our design process, produce metrics, and detail an open-source Python implementation of our model.

Oral Session 1-3
"Audio style transfer with rhythmic constraints"

Maciek Tomczak, Carl Southall and Jason Hockman

In this transformation we present a rhythmically constrained au- dio style transfer technique for automatic mixing and mashing of two audio inputs. In this transformation the rhythmic and timbral features of both input signals are combined together through the use of an audio style transfer process that transforms the files so that they adhere to a larger metrical structure of the chosen input. This is accomplished by finding beat boundaries of both inputs and performing the transformation on beat-length audio segments. In order for the system to perform a mashup between two signals, we reformulate the previously used audio style transfer loss terms into three loss functions and enable them to be independent of the input. We measure and compare rhythmic similarities of the trans- formed and input audio signals using their rhythmic envelopes to investigate the influence of the tested transformation objectives.

Oral Session 1-4
"Parametric Synthesis of Glissando Note Transitions - A user Study in a Real-Time Application"

Henrik von Coler, Moritz Götz and Steffen Lepa

This paper investigates the use of different mathematical mod- els for the parametric synthesis of fundamental frequency trajecto- ries in glissando note transitions. Hyperbolic tangent, cubic splines and Bézier curves were implemented in a real-time synthesis sys- tem. In a user study, test subjects were presented two-note se- quences with glissando transitions, which had to be re-synthesized using the three different trajectory models, employing a pure sine wave synthesizer. Resulting modeling errors and user feedback on the models were evaluated, indicating a significant disadvantage of the hyperbolic tangent in the modeling accuracy. Its reduced complexity and number of parameters were however not rated to increase the usability.

Oral Session 2-1
"Towards Multi-Instrument Drum Transcription"

Richard Vogl, Gerhard Widmer and Peter Knees

Automatic drum transcription, a subtask of the more general auto- matic music transcription, deals with extracting drum instrument note onsets from an audio source. Recently, progress in transcrip- tion performance has been made using non-negative matrix fac- torization as well as deep learning methods. However, these works primarily focus on transcribing three drum instruments only: snare drum, bass drum, and hi-hat. Yet, for many applications, the abil- ity to transcribe more drum instruments which make up standard drum kits used in western popular music would be desirable. In this work, convolutional and convolutional recurrent neural net- works are trained to transcribe a wider range of drum instruments. First, the shortcomings of publicly available datasets in this con- text are discussed. To overcome these limitations, a larger syn- thetic dataset is introduced. Then, methods to train models using the new dataset focusing on generalization to real world data are investigated. Finally, the trained models are evaluated on publicly available datasets and results are discussed. The contributions of this work comprise: (i.) a large-scale synthetic dataset for drum transcription, (ii.) first steps towards an automatic drum transcrip- tion system that supports a larger range of instruments by eval- uating and discussing training setups and the impact of datasets in this context, and (iii.) a publicly available set of trained mod- els for drum transcription. Additional materials are available at http://ifs.tuwien.ac.at/~vogl/dafx2018.

Oral Session 2-2
"Increasing Drum Transcription Vocabulary Using Data Synthesis"

Mark Cartwright and Juan Pablo Bello

Current datasets for automatic drum transcription (ADT) are small and limited due to the tedious task of annotating onset events. While some of these datasets contain large vocabularies of percus- sive instrument classes (e.g. ~20 classes), many of these classes occur very infrequently in the data. This paucity of data makes it difficult to train models that support such large vocabularies. Therefore, data-driven drum transcription models often focus on a small number of percussive instrument classes (e.g. 3 classes). In this paper, we propose to support large-vocabulary drum tran- scription by generating a large synthetic dataset (210,000 eight second examples) of audio examples for which we have ground- truth transcriptions. Using this synthetic dataset along with exist- ing drum transcription datasets, we train convolutional-recurrent neural networks (CRNNs) in a multi-task framework to support large-vocabulary ADT. We find that training on both the synthetic and real music drum transcription datasets together improves per- formance on not only large-vocabulary ADT, but also beat / down- beat detection small-vocabulary ADT.

Oral Session 2-3
"Stationary/transient audio separation using convolutional autoencoders"

Gerard Roma, Owen Green and Pierre Alexandre Tremblay

Extraction of stationary and transient components from audio has many potential applications to audio effects for audio content pro- duction. In this paper we explore stationary/transient separation using convolutional autoencoders. We propose two novel unsuper- vised algorithms for individual and and joint separation. We de- scribe our implementation and show examples. Our results show promise for the use of convolutional autoencoders in the extraction of sparse components from audio spectrograms, particularly using monophonic sounds.

Oral Session 2-4
"Automatic drum transcription with convolutional neural networks"

Celine Jacques and Axel Roebel

Automatic drum transcription (ADT) aims to detect drum events in polyphonic music. This task is part of the more general problem of transcribing a music signal in terms of its musical score and addi- tionally can be very interesting for extracting high level informa- tion e.g. tempo, downbeat, measure. This article has the objective to investigate the use of Convolutional Neural Networks (CNN) in the context of ADT. Two different strategies are compared. First an approach based on a CNN based detection of drum only onsets is combined with an algorithm using Non-negative Matrix Decon- volution (NMD) for drum onset transcription. Then an approach relying entirely on CNN for the detection of individual drum in- struments is described. The question of which loss function is the most adapted for this task is investigated together with the question of the optimal input structure. All algorithms are evaluated using the publicly available ENST Drum database, a widely used estab- lished reference dataset, allowing easy comparison with other al- gorithms. The comparison shows that the purely CNN based algo- rithm significantly outperforms the NMD based approach, and that the results are significantly better for the snare drum, but slightly worse for both the bass drum and the hi-hat when compared to the best results published so far and ones using also a neural network model.

Poster Session 2-1
"Optimized Velvet-Noise Decorrelator"

Sebastian J. Schlecht, Benoit Alary, Vesa Välimäki and Emanuel A. P. Habets

Decorrelation of audio signals is a critical step for spatial sound reproduction on multichannel configurations. Correlated signals yield a focused phantom source between the reproduction loud- speakers and may produce undesirable comb-filtering artifacts when the signal reaches the listener with small phase differences. Decorrelation techniques reduce such artifacts and extend the spa- tial auditory image by randomizing the phase of a signal while minimizing the spectral coloration. This paper proposes a method to optimize the decorrelation properties of a sparse noise sequence, called velvet noise, to generate short sparse FIR decorrelation fil- ters. The sparsity allows a highly efficient time-domain convolu- tion. The listening test results demonstrate that the proposed op- timization method can yield effective and colorless decorrelation filters. In comparison to a white noise sequence, the filters ob- tained using the proposed method preserve better the spectrum of a signal and produce good quality broadband decorrelation while using 76% fewer operations for the convolution. Satisfactory re- sults can be achieved with an even lower impulse density which decreases the computational cost by 88%.

Poster Session 2-2
"Surround Sound without Rear Loudspeakers: Multichannel Compensated Amplitude Panning and Ambisonics"

Dylan Menzies and Filippo Maria Fazi

Conventional panning approaches for surround sound require loud- speakers to be distributed over the regions where images are needed. However in many listening situations it is not practical or desirable to place loudspeakers some positions, such as behind or above the listener. Compensated Amplitude Panning (CAP) is a method that adapts dynamically to the listener’s head orientation to provide im- ages in any direction, in the frequency range up to ≈ 1000 Hz using only 2 loudspeakers. CAP is extended here for more loud- speakers, which removes some limitations and provides additional benefits. The new CAP method is also compared with an Am- bisonics approach that is adapted for surround sound without rear loudspeakers.

Poster Session 2-3
"A Feedback Canceling Reverberator"

Jonathan S. Abel, Eoin F. Callery and Elliot K. Canfield-Dafilou

Conventional panning approaches for surround sound require loud- speakers to be distributed over the regions where images are needed. However in many listening situations it is not practical or desirable to place loudspeakers some positions, such as behind or above the listener. Compensated Amplitude Panning (CAP) is a method that adapts dynamically to the listener’s head orientation to provide im- ages in any direction, in the frequency range up to ≈ 1000 Hz using only 2 loudspeakers. CAP is extended here for more loud- speakers, which removes some limitations and provides additional benefits. The new CAP method is also compared with an Am- bisonics approach that is adapted for surround sound without rear loudspeakers.

Poster Session 2-4
"Efficient signal extrapolation by granulation and convolution with velvet noise"

Stefano D'Angelo and Leonardo Gabrielli

Several methods are available nowadays to artificially extend the duration of a signal for audio restoration or creative music produc- tion purposes. The most common approaches include overlap-and- add (OLA) techniques, FFT-based methods, and linear predictive coding (LPC). In this work we describe a novel OLA algorithm based on convolution with velvet noise, in order to exploit its spar- sity and spectrum flatness. The proposed method suppresses spec- tral coloration and achieves remarkable computational efficiency. Its issues are addressed and some design choices are explored. Ex- perimental results are proposed and compared to a well-known FFT-based method.

Oral Session 3-1
"Improving intelligibility prediction under informational masking using an auditory saliency model"

Yan Tang and Trevor J. Cox

The reduction of speech intelligibility in noise is usually domi- nated by energetic masking (EM) and informational masking (IM). Most state-of-the-art objective intelligibility measures (OIM) esti- mate intelligibility by quantifying EM. Few measures model the effect of IM in detail. In this study, an auditory saliency model, which intends to measure the probability of the sources obtain- ing auditory attention in a bottom-up process, was integrated into an OIM for improving the performance of intelligibility predic- tion under IM. While EM is accounted for by the original OIM, IM is assumed to arise from the listener’s attention switching be- tween the target and competing sounds existing in the auditory scene. The performance of the proposed method was evaluated along with three reference OIMs by comparing the model predictions to the listener word recognition rates, for different noise maskers, some of which introduce IM. The results shows that the predictive accuracy of the proposed method is as good as the best reported in the literature. The proposed method, however, provides a physiologically-plausible possibility for both IM and EM modelling.

Oral Session 3-2
"Using semantic differential scales to assess the subjective perception of auditory warning signals"

Joana Vieira, Jorge Almeida Santos and Paulo Noriega

The relationship between physical acoustic parameters and the subjective responses they evoke is important to assess in audio alarm design. While the perception of urgency has been thor- oughly investigated, the perception of other variables such as pleasantness, negativeness and irritability has not. To characterize the psychological correlates of variables such as frequency, speed, rhythm and onset, twenty-six participants evaluated fifty-four au- dio warning signals according to six different semantic differential scales. Regression analysis showed that speed predicted mostly the perception of urgency, preoccupation and negativity; frequency predicted the perception of pleasantness and irritability; and rhythm affected the perception of urgency. No correlation was found with onset and offset times. These findings are important to human-centred design recommendations for auditory warning signals.

Oral Session 3-3
"Soundscape auralisation and visualisation: A cross-modal approach to Soundscape evaluation"

Francis Stevens, Damian Murphy and Stephen Smith

Soundscape research is concerned with the study and understanding of our relationship with our surrounding acoustic environments and the sonic elements that they are comprised of. Whilst much of this research has focussed on sound alone, any practical application of soundscape methodologies should consider the interaction between aural and visual environmental features: an interaction known as cross-modal perception. This presents an avenue for soundscape research exploring how an environment’s visual features can affect an individual’s experience of the soundscape of that same envi- ronment. This paper presents the results of two listening tests1: one a preliminary test making use of static stereo UHJ renderings of first-order-ambisonic (FOA) soundscape recordings and static panoramic images; the other using YouTube as a platform to present dynamic binaural renderings of the same FOA recordings along- side full motion spherical video. The stimuli for these tests were recorded at several locations around the north of England including rural, urban, and suburban environments exhibiting soundscapes comprised of many natural, human, and mechanical sounds. The purpose of these tests was to investigate how the presence of vi- sual stimuli can alter soundscape perception and categorisation. This was done by presenting test subjects with each soundscape alone and then with visual accompaniment, and then comparing collected subjective evaluation data. Results indicate that the pres- ence of certain visual features can alter the emotional state evoked by exposure to a soundscape, for example, where the presence of ‘green infrastructure’ (parks, trees, and foliage) results in a less agitating experience of a soundscape containing high levels of envi- ronmental noise. This research represents an important initial step toward the integration of virtual reality technologies into sound- scape research, and the use of suitable tools to perform subjective evaluation of audiovisual stimuli. Future research will consider how these methodologies can be implemented in real-world applications.

Oral Session 3-4
"Acoustic assessment of a classroom and rehabilitation guided by simulation"

Raquel Ribeiro and Diamantino Freitas

The acoustics of spaces whose purpose is the acoustic com- munication through speech, namely classrooms, is a subject that has not been given the due importance in architectural projects, with consequences in the existence of adverse acoustic conditions, which affect on a daily basis the learning of the students and the well-being of teachers. One of the lecture rooms of the Faculty of Engineering of the University of Porto (FEUP) was chosen, more precisely amphithe- ater B013, with a criterion of generality, in which the acoustic con- ditions were evaluated and compared with those that are known to be necessary for the intended acoustic communication effect. Several measurements were made in the space to investigate the acoustic parameters situation relatively to the appropriate range. An acoustic model of the amphitheater under study was devel- oped in the EASE software, with which it was possible to obtain simulated results for comparison with the previously measured pa- rameters and to introduce changes in the model to perceive their impact in the real space. In this phase it was possible to use the au- ralization resources of the software to create perception of how the sound is heard on the built model. This was useful for the phase of rehabilitation of the space because it was possible to judge subjec- tively the improvement of the sound intelligibility in that space. Finally, possible solutions are presented in the acoustic do- main and using electroacoustic sound reinforcement aiming to pro- vide a better acoustic comfort and communicational effectiveness for the people who use it.

Paul Hanson

"Paul Hansonʼs musical journey is a testament of fearless dedication to craft and creativity. (...) His explorations have transcended limitations and created new possibilities-all while making music of the highest quality. Paulʼs repertoire encompasses musical aspects of all modern styles of improvised music." - Paul Hanson's Bio

Paul Hanson's Website: http://paulhansonmusic.com/

Paul Hanson's Facebook: https://www.facebook.com/paulhansonmusic/

Paul Hanson's YouTube: https://www.youtube.com/user/jazzbassoonpaul

9:00 Keynote: "Confessions from a plug-in junkie" David Farmer Auditorium
10:00 Poster Craze 3 - Auditorium

Poster Session 3 / Coffee break

Real-Time Wave Digital Simulation of Cascaded Vacuum Tube Amplifiers using Modified Blockwise Method

Time Warping in Digital Audio Effects

Joint modeling of impedance and radiation as a recursive parallel filter structure for efficient synthesis of wind instrument sound

Interpretation and control in AM/FM-based audio effects

Various Foyer

Oral Session 4: Analysis/Synthesis 2

High frequency magnitude spectrogram reconstruction for music mixtures using convolutional autoencoders

A holistic glottal phase-related feature

Sound morphologies due to non-linear interactions : towards a perceptive control of environmental sound-synthesis processes

Group Delay-Based Allpass Filters for Abstract Sound Synthesis and Audio Effects Processing

Various Auditorium
Oral Session 5: Virtual environments

Assessing the Effect of Adaptive Music on Player Navigation in Virtual Environments

Modeling and Rendering for Virtual Dropping Sound based on Physical Model of Rigid Body

Objective Evaluations of Synthesised Environmental Sounds

Resizing Rooms in Convolution, Delay Network, and Modal Reverberators

Various Auditorium
15:20 Poster Craze 4 - Auditorium

Poster Session 4 / Coffee break

BIVIB: A Multimodal Piano Sample Library Of Binaural Sounds And Keyboard Vibrations

Position-Based Attenuation And Amplification For Stereo Mixes

The Application of Dimensionality Reduction Techniques for Fear Emotion Detection from Speech.

Immersive Audio-Guiding

Various Foyer

Oral Session 6: Analog Systems & Processing

Power-Balanced Modelling Of Circuits As Skew Gradient Systems

Modeling Time-Varying Reactances using Wave Digital Filters

Experimental Study of Guitar Pickup Nonlinearity

Waveshaping with Norton Amplifiers: Modeling the Serge Triple Waveshaper

Various Auditorium
"Confessions from a plug-in junkie",
by David Farmer

David Farmer

Here, the intention is simply to give a window into an actual users experience. Some examples will be shown of how the use of plugins is applied in a typical day. This will include what draws somebody to use certain plugins over others that may do similar things. Some GUI features will be explored that are found useful and also what is a hinderance. It will be also discussed what it's like to be an end user in a saturated market of products and just how it is to discover, try, and buy developers products.

"David Farmer was born and raised in Virginia and sound captured his interest as a young boy. He moved to the Los Angeles area in 1992 and in 1996 began Sound Designing at Skywalker Sound on The Arrival. David worked with Chris Boyes on numerous films including Armageddon, Con Air, Space Cowboys, and The 13th Warrior. In 1999 David began an extended period in New Zealand as the Sound Designer for the Lord of the Rings trilogy, King Kong, and followed most recently by The Hobbit trilogy. David recently finished Sound Design on Marvel’s Ant-Man."

Poster Session 3-1
"Real-Time Wave Digital Simulation of Cascaded Vacuum Tube Amplifiers using Modified Blockwise Method"

Jingjie Zhang and Julius Smith

Vacuum tube amplifiers, known for their acclaimed distortion char- acteristics, are still widely used in hi-fi audio devices. However, bulky, fragile and power-consuming vacuum tube devices have also motivated much research on digital emulation of vacuum tube amplifier behaviors. Recent studies on Wave Digital Filters (WDF) have made possible the modeling of multi-stage vacuum tube am- plifiers within single WDF SPQR trees. Our research combines the latest progress on WDF with the modified blockwise method to reduce the overall computational complexity of modeling cas- caded vacuum tube amplifiers by decomposing the whole circuit into several small stages containing only two adjacent triodes. Cer- tain performance optimization methods are discussed and applied in the eventual real-time implementation.

Poster Session 3-2
"Time Warping in Digital Audio Effects"

Gianpaolo Evangelista

Time warping is an important paradigm in sound processing, which consists of composing the signal with another function of time called the warping map. This paradigm leads to different points of view in signal processing, fostering the development of new effects or the conception of new implementations of existing ones. While the introduction of time warping in continuous-time signals is in principle not problematic, time warping of discrete- time signals is not self-evident. On one hand, if the signal samples were obtained by sampling a bandlimited signal, the warped sig- nal is not necessarily bandlimited: it has a sampling theorem of its own, based on irregular sampling, unless the map is linear. On the other hand, most signals are regularly sampled so that the samples at non-integer multiples of the sampling interval are not known. While the use of interpolation can partly solve the problem it usu- ally introduces artifacts. Moreover, in many sound applications, the computation already involves a phase vocoder. In this paper we introduce new methods and algorithms for time-warping based on warped time-frequency representations. These lead to alterna- tive algorithms for warping for use in sound processing tools and digital audio effects and shed new light in the interaction of time warping with phase vocoders. We also outline the applications of time warping in digital audio effects.

Poster Session 3-3
"Joint modeling of impedance and radiation as a recursive parallel filter structure for efficient synthesis of wind instrument sound"

Esteban Maestre, Gary Scavone and Julius Smith

In the context of efficient synthesis of wind instrument sound, we introduce a technique for joint modeling of input impedance and sound pressure radiation as digital filters in parallel form, with the filter coefficients derived from experimental data. In a series of laboratory measurements taken on an alto saxophone, the in- put impedance and sound pressure radiation responses were ob- tained for each fingering. In a first analysis step, we iteratively minimize the error between the frequency response of an input impedance measurement and that of a digital impedance model constructed from a parallel filter structure akin to the discretiza- tion of a modal expansion. With the modal coefficients in hand, we propose a digital model for sound pressure radiation which relies on the same parallel structure, thus suitable for coefficient estimation via frequency-domain least-squares. For modeling the transition between fingering positions, we propose a simple model based on linear interpolation of input impedance and sound pres- sure radiation models. For efficient sound synthesis, the common impedance-radiation model is used to construct a joint reflectance- radiation digital filter realized as a digital waveguide termination that is interfaced to a reed model based on nonlinear scattering.

Poster Session 3-4
"Interpretation and control in AM/FM-based audio effects"

Antonio Goulart, Marcelo Queiroz, Joseph Timoney and Victor Lazzarini

This paper is a continuation of our first studies on AM/FM digital audio effects, where the AM/FM decomposition equations were reviewed and some exploratory examples of effects were intro- duced. In the current paper we present more insight on the signals obtained with the AM/FM decomposition, intending to illustrate manipulations in the AM/FM domain that can be applied as in- teresting audio effects. We provide high-quality AM/FM effects and their implementations, alongside a brief objective evaluation. Audio samples and codes for real-time operation are also supplied.

Oral Session 4-1
"High Frequency Magnitude Spectrogram Reconstruction For Music Mixtures Using Convolutional Autoencoders"

Marius Miron and Matthew Davies

We present a new approach for audio bandwidth extension for mu- sic signals using convolutional neural networks (CNNs). Inspired by the concept of inpainting from the field of image processing, we seek to reconstruct the high-frequency region (i.e., above a cutoff frequency) of a time-frequency representation given the observa- tion of a band-limited version. We then invert this reconstructed time-frequency representation using the phase information from the band-limited input to provide an enhanced musical output. We contrast the performance of two musically adapted CNN architec- tures which are trained separately using the STFT and the invert- ible CQT. Through our evaluation, we demonstrate that the CQT, with its logarithmic frequency spacing, provides better reconstruc- tion performance as measured by the signal to distortion ratio.

Oral Session 4-2
"A Holistic Glottal Phase-Related Feature"

Aníbal Ferreira and José Tribolet

This paper addresses a phase-related feature that is time-shift in- variant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay (NRD) and we show that it is particularly useful to describe the holistic phase properties of voiced sounds produced by a human speaker, notably vowel sounds. We illustrate the NRD feature with real data that is ob- tained from five sustained vowels uttered by 20 female speakers and 17 male speakers. It is shown that not only NRD coefficients carry idiosyncratic information, but also their estimation is quite stable and robust for all harmonics encompassing, for most vow- els, at least the first four formant frequencies. The average NRD model that is estimated using data pertaining to all speakers in our database is compared to that of the idealized Liljencrants-Fant (L- F) and Rosenberg glottal models. We also present results on the phase effects of linear-phase FIR and IIR vocal tract filter models when a plausible source excitation is used that corresponds to the derivative of the L-F glottal flow model. These results suggest that the shape of NRD feature vectors is mainly determined by the glot- tal pulse and only marginally affected by either the group delay of the vocal tract filter model, or by the acoustic coupling between glottis and vocal tract structures.

Oral Session 4-3
"Sound Morphologies Due To Non-Linear Interactions : Towards A Perceptive Control Of Environmental Sound-Synthesis Processes"

Samuel Poirot, Stefan Bilbao, Mitsuko Aramaki and Richard Kronland-Martinet

This paper is concerned with perceptual control strategies for phys- ical modeling synthesis of vibrating resonant objects colliding non- linearly with rigid obstacles. For this purpose, we investigate sound morphologies from samples synthesized using physical modeling for non-linear interactions. As a starting point, we study the effect of linear and non-linear springs and collisions on a single-degree- of-freedom system and on a stiff strings. We then synthesize real- istic sounds of a stiff string colliding with a rigid obstacle. Numer- ical simulations allowed the definition of specific signal patterns characterizing the non linear behavior of the interaction according to the attributes of the obstacle. Finally, a global description of the sound morphology associated with this type of interaction is proposed. This study constitutes a first step towards further per- ceptual investigations geared towards the development of intuitive synthesis controls.

Oral Session 4-4
"Group Delay-Based Allpass Filters for Abstract Sound Synthesis and Audio Effects Processing"

Elliot K. Canfield-Dafilou and Jonathan S. Abel

An algorithm for artistic spectral audio processing and synthesis using allpass filters is presented. These filters express group de- lay trajectories, allowing fine control of their frequency-dependent arrival times. We present methods for designing the group delay trajectories to yield a novel class of filters for sound synthesis and audio effects processing. A number of categories of group de- lay trajectory design are discussed, including stair-stepped, mod- ulated, and probabilistic. Synthesis and processing examples are provided.

Oral Session 5-1
"Assessing the Effect of Adaptive Music on Player Navigation in Virtual Environments"

Manuel López Ibáñez, Nahum Álvarez and Federico Peinado

Through this research, we develop a study aiming to explore how adaptive music can help in guiding players across virtual environ- ments. A video game consisting of a virtual 3D labyrinth was built, and two groups of subjects played through it, having the goal of retrieving a series of objects in as short a time as possible. Each group played a different version of the prototype in terms of audio: one had the ability to state their preferences by choosing several musical attributes, which would influence the actual spatialised music they listened to during gameplay; the other group played a version of the prototype with a default, non-adaptive, but also spa- tialised soundtrack. Time elapsed while completing the task was measured as a way to test user performance. Results show a sta- tistically significant correlation between player performance and the inclusion of a soundtrack adapted to each user. We conclude that there is an absence of a firm musical criteria when making sounds be prominent and easy to track for users, and that an adap- tive system like the one we propose proves useful and effective when dealing with a complex user base.

Oral Session 5-2
"Modeling and Rendering for Virtual Dropping Sound based on Physical Model of Rigid Body"

Sota Nishiguchi and Katunobu Itou

Sound production by means of a physical model for falling ob- jects, which is intended for audio synthesis of immersive contents, is described here. Our approach is a mathematical model to syn- thesize sound and audio for animation with rigid body simulation. To consider various conditions, a collision model of an object was introduced for vibration and propagation simulation. The gener- ated sound was evaluated by comparing the model output with real sound using numerical criteria and psychoacoustic analysis. Experiments were performed for a variety of objects and floor sur- faces, approximately 90% of which were similar to real scenarios. The usefulness of the physical model for audio synthesis in virtual reality was represented in terms of breadth and quality of sound.

Oral Session 5-3
"Objective Evaluations of Synthesised Environmental Sounds"

David Moffat and Joshua D. Reiss

There are a range of different methods for comparing or measur- ing the similarity between environmental sound effects. These methods can be used as objective evaluation techniques, to eval- uate the effectiveness of a sound synthesis method by assessing the similarity between synthesised sounds and recorded samples. We propose to evaluate a number of different synthesis objective evaluation metrics, by using the different distance metrics as fit- ness functions within a resynthesis algorithm. A recorded sample is used as a target sound, and the resynthesis is intended to produce a set of synthesis parameters that will synthesise a sound as close to the recorded sample as possible, within the restrictions of the synthesis model. The recorded samples are excerpts of selections from a sound effects library, and the results are evaluated through a subjective listening test. Results show that one of the objective function performs significantly worse than several others. Only one method had a significant and strong correlation between the user perceptual distance and the objective distance. A recommen- dation of an objective evaluation function for measuring similarity between synthesised environmental sounds is made.

Oral Session 5-4
"Resizing Rooms in Convolution, Delay Network, and Modal Reverberators"

Elliot K. Canfield-Dafilou and Jonathan S. Abel

In music recording and virtual reality applications, it is often desir- able to control the perceived size of a synthesized acoustic space. Here, we demonstrate a physically informed method for enlarging and shrinking room size. A room size parameter is introduced to modify the time and frequency components of convolution, delay network, and modal artificial reverberation architectures to affect the listener’s sense of the size of the acoustic space taking into account air and materials absorption.

Poster Session 4-1
"BIVIB: A Multimodal Piano Sample Library Of Binaural Sounds And Keyboard Vibrations"

Stefano Papetti, Federico Avanzini and Federico Fontana

An extensive piano sample library consisting of binaural sounds and keyboard vibration signals is made available through an open- access data repository. Samples were acquired with high-quality audio and vibration measurement equipment on two Yamaha Disklavier pianos (one grand and one upright model) by means of computer-controlled playback of each key at ten different MIDI velocity values. The nominal specifications of the equipment used in the acquisition chain are reported in a companion document, allowing researchers to calculate physical quantities (e.g., acoustic pressure, vibration acceleration) from the recordings. Also, project files are provided for straightforward playback in a free software sampler available for Windows and Mac OS systems. The library is especially suited for acoustic and vibration research on the pi- ano, as well as for research on multimodal interaction with musical instruments.

Poster Session 4-2
"Position-Based Attenuation And Amplification For Stereo Mixes"

Luca Marinelli and Holger Kirchhoff

This paper presents a position-based attenuation and amplifica- tion method suitable for source separation and enhancement. Our novel sigmoidal time-frequency mask allows us to directly control the level within a target azimuth range and to exploit a trade-off between the production of musical noise artifacts and separation quality. The algorithm is fully describable in a closed and compact analytical form. The method was evaluated on a multitrack dataset and compared to another position-based source separation algo- rithm. The results show that although the sigmoidal mask leads to a lower source-to-interference ratio, the overall sound quality mea- sured by the source-to-distortion ratio and the source-to-artifacts ratio is improved.

Poster Session 4-3
"The Application Of Dimensionality Reduction Techniques For Fear Emotion Detection From Speech"

Safa Chebbi and Sofia Ben Jebara

In this paper, we propose to reduce the relatively high-dimension of pitch-based features for fear emotion recognition from speech. To do so, the K-nearest neighbors algorithm has been used to clas- sify three emotion classes: fear, neutral and ’other emotions’. Many techniques of dimensionality reduction are explored. First of all, optimal features ensuring better emotion classification are deter- mined. Next, several families of dimensionality reduction, namely PCA, LDA and LPP, are tested in order to reveal the suitable di- mension range guaranteeing the highest overall and fear recogni- tion rates. Results show that the optimal features group permits 93.34% and 78.7% as overall and fear accuracy rates respectively. Using dimensionality reduction, Principal Component Analysis (PCA) has given the best results: 92% as overall accuracy rate and 93.3% as fear recognition percentage.

Poster Session 4-4
"Immersive Audio-Guiding"

Nuno Carriço, Guilherme Campos and José Vieira

An audio-guide prototype was developed which makes it possi- ble to associate virtual sound sources to tourist route focal points. An augmented reality effect is created, as the (virtual) audio content presented through headphones seems to originate from the specified (real) points. A route management application allows specification of source positions (GPS coordinates), audio content (monophonic files) and route points where playback should be triggered. The binaural spatialisation effects depend on user pose rela- tive to the focal points: position is detected by a GPS receiver; for head-tracking, an IMU is attached to the headphone strap. The main application, developed in C++, streams the audio con- tent through a real-time auralisation engine. HRTF filters are se- lected according to the azimuth and elevation of the path from the virtual source, continuously updated based on user pose. Preliminary tests carried out with ten subjects confirmed the ability to provide the desired audio spatialisation effects and identified position detection accuracy as the main aspect to be improved in the future.

Oral Session 6-1
"Power-Balanced Modelling Of Circuits As Skew Gradient Systems"

Remy Müller and Thomas Hélie

This article is concerned with the power-balanced simulation of analog audio circuits, governed by nonlinear differential algebraic equations (DAE). The proposed approach is to combine principles from the port-Hamiltonian and Brayton-Moser formalisms to yield a skew-symmetric gradient system. The practical interest is to pro- vide a solver, using an average discrete gradient, that handles dif- ferential and algebraic relations in a unified way, and avoids having to pre-solve the algebraic part. This leads to a structure-preserving method that conserves the power balance and total energy. The proposed formulation is then applied on typical nonlinear audio circuits to study the effectiveness of the method.

Oral Session 6-2
"Modeling Time-Varying Reactances using Wave Digital Filters"

Olafur Bogason and Kurt Werner

Wave Digital Filters were developed to discretize linear time in- variant lumped systems, particularly electronic circuits. The time- invariant assumption is baked into the underlying theory and be- comes problematic when simulating audio circuits that are by na- ture time-varying. We present extensions to WDF theory that in- corporate proper numerical schemes, allowing for the accurate sim- ulation of time-varying systems. We present generalized continuous-time models of reactive components that encapsulate the time-varying lossless models pre- sented by Fettweis, the circuit-theoretic time-varying models, as well as traditional LTI models as special cases. Models of time- varying reactive components are valuable tools to have when mod- eling circuits containing variable capacitors or inductors or electri- cal devices such as condenser microphones. A power metric is derived and the model is discretized using the alpha-transform nu- merical scheme and parametric wave definition. Case studies of circuits containing time-varying resistance and capacitance are presented and help to validate the proposed gener- alized continuous-time model and discretization.

Oral Session 6-3
"Experimental Study of Guitar Pickup Nonlinearity"

Antonin Novak, Bertrand Lihoreau, Pierrick Lotton, Emmanuel Brasseur and Laurent Simon

In this paper, we focus on studying nonlinear behavior of the pickup of an electric guitar and on its modeling. The approach is purely experimental, based on physical assumptions and attempts to find a nonlinear model that, with few parameters, would be able to pre- dict the nonlinear behavior of the pickup. In our experimental setup a piece of string is attached to a shaker and vibrates per- pendicularly to the pickup in frequency range between 60 Hz and 400 Hz. The oscillations are controlled by a linearizion feedback to create a purely sinusoidal steady state movement of the string. In the first step, harmonic distortions of three different magnetic pickups (a single-coil, a humbucker, and a rail-pickup) are com- pared to check if they provide different distortions. In the second step, a static nonlinearity of Paiva’s model is estimated from ex- perimental signals. In the last step, the pickup nonlinearities are compared and an empirical model that fits well all three pickups is proposed.

Oral Session 6-4
"Waveshaping with Norton Amplifiers: Modeling the Serge Triple Waveshaper"

Geoffrey Gormond, Fabián Esqueda, Henri Pöntynen and Julian Parker

The Serge Triple Waveshaper (TWS) is a synthesizer module de- signed in 1973 by Serge Tcherepnin, founder of Serge Modular Music Systems. It contains three identical waveshaping circuits that can be used to convert sawtooth waveforms into sine waves. However, its sonic capabilities extend well beyond this particular application. Each processing section in the Serge TWS is built around what is known as a Norton amplifier. These devices, unlike traditional operational amplifiers, operate on a current differencing principle and are featured in a handful of iconic musical circuits. This work provides an overview of Norton amplifiers within the context of virtual analog modeling and presents a digital model of the Serge TWS based on an analysis of the original circuit. Results obtained show the proposed model closely emulates the salient features of the original device and can be used to generate the complex waveforms that characterize “West Coast” synthesis.


9:00 Keynote: "The top ten things you have to know as Developer from the idea to a product, based on History of Audio Plugin´s formats" Yvan Grabit Auditorium
10:00 Poster Craze 5 - Auditorium

Poster Session 5 / Coffee break

End-To-End Equalization With Convolutional Neural Networks

Acoustic Instrument Sensor Matching Using a Modal Architecture

TU-Note Violin Sample Library -- A Database of Violin Sounds with Segmentation Ground Truth

Parametric Multi-Channel Separation and Re-Panning of Harmonic Sources

Various Foyer

Oral Session 7: Frequency / Impulse Estimation

Fast Partial Tracking of Audio with Real-Time Capability through Linear Programming

Modal Analysis Of Room Impulse Responses Using Subband Esprit

FAST MUSIC – An Efficient Implementation Of The Music Algorithm For Frequency Estimation Of Approximately Periodic Signals

Hard Real-Time Onset Detection Of Percussive Instruments

Various Auditorium
Oral Session 8: Digital Audio Effects & Processing

Musikverb A harmonically adaptive audio reverberation

A virtual tube delay effect

Generative Timbre Spaces: Regularizing Variational Auto-Encoders With Perceptual Metrics

Various Auditorium
"The top ten things you have to know as Developer from the idea to a product, based on History of Audio Plugin´s formats",
by Yvan Grabit (Steinberg)

Yvan Grabit

From an idea of an algorithm to a final commercial plugin, there is a lot of steps that have to be known and understood as a developer in order to make the best of an idea. Such top ten things from DSP design to UX/UI design including such concern like latency, bypassing, parameters, precision, automation, surround, persistency, will be talked about... using reference to the development and history of Audio´s plugin formats, mainly based on VST 3. The goal of this keynote is to help future or already established plugin´s developers to be prepared and aware of what should be not forgotten during development.

After a quiet childhood and youth in “Pays de Gex” (an area jammed between the French Jura mountain and Geneva), during which he played as drummer in different Brass Bands, Yvan Grabit completed an Engineering degree in Image Processing and computing at ISEP (“Institut Supérieur d´Électronique de Paris”). He started research at Fraunhofer IGD in Rostock for 8 months, then moved to Paris and Cannes to work for Aérospatiale (now part of Thales Alenia Space) in satellite image processing as developer and project manager. 21 years ago, Yvan decided to change his field of work from image to audio and began his career as a developer at Steinberg (Hamburg). He started in the Nuendo team, then developed different plug-ins, such as LM-4, The Grand and HALion with Charlie Steinberg, and took on the responsibility for the development of plug-in integration in DAW, surround features and VST-SDK (version 2 and later on version 3). Today, he is team leader of the Research group which works in different fields like MIR, 3D/VR audio, Restauration, Machine Learning, etc. As technical lead of VST he promotes and maintains VST 3, and supports 3rd Party developers. He continues to play drums, guitar and piano in several music groups.

Poster Session 5-1
"End-To-End Equalization With Convolutional Neural Networks"

Marco A. Martínez Ramírez and Joshua D. Reiss

This work aims to implement a novel deep learning architec- ture to perform audio processing in the context of matched equal- ization. Most existing methods for automatic and matched equal- ization show effective performance and their goal is to find a re- spective transfer function given a frequency response. Neverthe- less, these procedures require a prior knowledge of the type of filters to be modeled. In addition, fixed filter bank architectures are required in automatic mixing contexts. Based on end-to-end convolutional neural networks, we introduce a general purpose ar- chitecture for equalization matching. Thus, by using an end-to- end learning approach, the model approximates the equalization target as a content-based transformation without directly finding the transfer function. The network learns how to process the au- dio directly in order to match the equalized target audio. We train the network through unsupervised and supervised learning proce- dures. We analyze what the model is actually learning and how the given task is accomplished. We show the model performing matched equalization for shelving, peaking, lowpass and highpass IIR and FIR equalizers.

Poster Session 5-2
"Acoustic Instrument Sensor Matching Using a Modal Architecture"

Mark Rau, Jonathan Abel and Julius Smith

This paper proposes a method to filter the output of instrument contact sensors to approximate the response of a well placed mi- crophone. A modal approach is proposed in which mode frequen- cies and damping ratios are fit to the frequency response of the contact sensor, and the mode gains are then determined for both the contact sensor and the microphone. The mode frequencies and damping ratios are presumed to be associated with the resonances of the instrument. Accordingly, the corresponding contact sensor and microphone mode gains will account for the instrument radia- tion. The ratios between the contact sensor and microphone gains are then used to create a parallel bank of second-order biquad fil- ters to filter the contact sensor signal to estimate the microphone signal.

Poster Session 5-3
"TU-Note Violin Sample Library -- A Database of Violin Sounds with Segmentation Ground Truth"

Henrik von Coler

The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analy- sis/synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions and vi- brato, and solo pieces for performance analysis. All parts come with a hand-labeled segmentation ground truth which mark attack, release and transition/transient segments. Additional relevant in- formation on the samples’ properties is provided for single sounds and two-note sequences. Recordings took place in an anechoic chamber with a professional violinist and a recording engineer, us- ing two microphone positions. This document describes the con- tent and the recording setup in detail, alongside basic statistical properties of the data.

Poster Session 5-4
"Parametric Multi-Channel Separation and Re-Panning of Harmonic Sources"

Martin Weiss Hansen, Jacob Møller Hjerrild, Mads Græsbøll Christensen and Jesper Kjeldskov.

In this paper, a method for separating stereophonic mixtures into their harmonic constituents is proposed. The method is based on a harmonic signal model. An observed mixture is decomposed by first estimating the panning parameters of the sources, and then estimating the fundamental frequencies and the amplitudes of the harmonic components. The number of sources and their panning parameters are estimated using an approach based on clustering of narrowband interaural level and time differences. The panning parameter distribution is modelled as a Gaussian mixture and the generalized variance is used for selecting the number of sources. The fundamental frequencies of the sources are estimated using an iterative approach. To enforce spectral smoothness when estimat- ing the fundamental frequencies, a codebook of magnitude ampli- tudes is used to limit the amount of energy assigned to each har- monic. The source models are used to form Wiener filters which are used to reconstruct the sources. The proposed method can be used for source re-panning (demonstration given), remixing, and multi-channel upmixing, e.g. for hi-fi systems with multiple loud-speakers.

Oral Session 7-1
"Fast Partial Tracking of Audio with Real-Time Capability through Linear Programming"

Julian Neri and Philippe Depalle

This paper proposes a new partial tracking method, based on linear programming, that can run in real-time, is simple to imple- ment, and performs well in difficult tracking situations by consid- ering spurious peaks, crossing partials, and a non-stationary short- term sinusoidal model. Complex constant parameters of a gener- alized short-term signal model are explicitly estimated to inform peak matching decisions. Peak matching is formulated as a vari- ation of the linear assignment problem. Combinatorially optimal peak-to-peak assignments are found in polynomial time using the Hungarian algorithm. Results show that the proposed method cre- ates high-quality representations of monophonic and polyphonic sounds.

Oral Session 7-2
"Modal Analysis Of Room Impulse Responses Using Subband Esprit"

Corey Kereliuk, Woody Herman, Russell Wedelich and Daniel Gillespie

This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters (frequency, decay time, initial magnitude and initial phase) of a modal reverberator that best match a provided room impulse response. By applying per- ceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. A hybrid FIR/modal reverb architecture is also presented which allows for the efficient mod- eling of room impulse responses that contain sparse early reflec- tions and dense late reverb. MUSHRA tests comparing the analy- sis/synthesis using various mode numbers for our algorithms, and for another state of the art algorithm, are included as well.

Oral Session 7-3
"FAST MUSIC – An Efficient Implementation Of The Music Algorithm For Frequency Estimation Of Approximately Periodic Signals"

Orchisama Das, Jonathan Abel and Julius Smith Iii

This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters (frequency, decay time, initial magnitude and initial phase) of a modal reverberator that best match a provided room impulse response. By applying per- ceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. A hybrid FIR/modal reverb architecture is also presented which allows for the efficient mod- eling of room impulse responses that contain sparse early reflec- tions and dense late reverb. MUSHRA tests comparing the analy- sis/synthesis using various mode numbers for our algorithms, and for another state of the art algorithm, are included as well.

Oral Session 7-4
"Hard Real-Time Onset Detection Of Percussive Instruments"

Luca Turchet

To date, the most successful onset detectors are those based on frequency representation of the signal. However, for such methods the time between the physical onset and the reported one is unpre- dictable and may largely vary according to the type of sound being analyzed. Such variability and unpredictability of spectrum-based onset detectors may not be convenient in some real-time applica- tions. This paper proposes a real-time method to improve the tem- poral accuracy of state-of-the-art onset detectors. The method is grounded on the theory of hard real-time operating systems where the result of a task must be reported at a certain deadline. It con- sists of the combination of a time-base technique (which has a high degree of accuracy in detecting the physical onset time but is more prone to false positives and false negatives) with a spectrum-based technique (which has a high detection accuracy but a low tempo- ral accuracy). The developed hard real-time onset detector was tested on a dataset of single non-pitched percussive sounds using the high frequency content detector as spectral technique. Experi- mental validation showed that the proposed approach was effective in better retrieving the physical onset time of about 50% of the hits detected by the spectral technique, with an average improvement of about 3 ms and maximum one of about 12 ms. The results also revealed that the use of a longer deadline may capture better the variability of the spectral technique, but at the cost of a bigger la- tency.

Oral Session 8-1
"Musikverb A harmonically adaptive audio reverberation"

João Pereira, Gilberto Bernardes and Rui Penha

We present MusikVerb, a novel digital reverberation capable of adapting its output to the harmonic context of a live music perfor- mance. The proposed reverberation is aware of the harmonic con- tent of an audio input signal and ‘tunes’ the reverberation output to its harmonic content using a spectral filtering technique. The dy- namic behavior of MusikVerb avoids the sonic clutter of traditional reverberation, and most importantly, fosters creative endeavor by providing new expressive and musically-aware uses of reverbera- tion. Despite its applicability to any input audio signal, the pro- posed effect has been designed primarily as a guitar pedal effect and a standalone software application.

Oral Session 8-2
"A virtual tube delay effect"

Riccardo Simionato, Juho Liski, Vesa Välimäki and Federico Avanzini

A virtual tube delay effect based on the real-time simulation of acoustic wave propagation in a garden hose is presented. The pa- per describes the acoustic measurements conducted and the anal- ysis of the sound propagation in long narrow tubes. The obtained impulse responses are used to design delay lines and digital fil- ters, which simulate the propagation delay, losses, and reflections from the end of the tube which may be open, closed, or acousti- cally attenuated. A study on the reflection caused by a finite-length tube is described. The resulting system consists of a digital waveg- uide model and produces delay effects having a realistic low-pass filtering. A stereo delay effect plugin in PURE DATA1 has been implemented and it is described here.

Oral Session 8-3
"Generative Timbre Spaces: Regularizing Variational Auto-Encoders With Perceptual Metrics"

Philippe Esling, Axel Chemla-Romeu-Santos, Adrien Bitton

Timbre spaces have been used in music perception to study the per- ceptual relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize to novel exam- ples and do not provide an invertible mapping, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an understanding of their inner workings and are usually not related to any perceptually relevant information. Here, we show that Variational Auto-Encoders (VAE) can alleviate all of these limitations by constructing generative timbre spaces. To do so, we adapt VAEs to learn an audio latent space, while using perceptual ratings from timbre studies to regularize the organiza- tion of this space. The resulting space allows us to analyze novel instruments, while being able to synthesize audio from any point of this space. We introduce a specific regularization allowing to enforce any given similarity distances onto these spaces. We show that the resulting space provide almost similar distance relation- ships as timbre spaces. We evaluate several spectral transforms and show that the Non-Stationary Gabor Transform (NSGT) provides the highest correlation to timbre spaces and the best quality of syn- thesis. Furthermore, we show that these spaces can generalize to novel instruments and can generate any path between instruments to understand their timbre relationships. As these spaces are con- tinuous, we study how audio descriptors behave along the latent dimensions. We show that even though descriptors have an overall non-linear topology, they follow a locally smooth evolution. Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.

We are pleased to announce that the 21st International Conference on Digital Audio Effects (DAFx2018) will be held at Aveiro, Portugal, on September 4–8 2018.

DAFx2018 is organised by the University of Aveiro, through its Institute of Electronics and Informatics Engineering (IEETA), in collaboration with the Portuguese Audio Engineering Association (APEA). The conference will be hosted at the university campus and will feature oral and poster presentations of accepted papers, keynote addresses, tutorials and demonstrations. The social program – including welcome reception, concert and banquet – will offer opportunities for more informal interaction while enjoying the city and the region.

This annual conference is a coming together of those working across the globe in research on digital audio processing for music and speech, sound design, acoustics and related applications. Original contributions for DAFx2018 are encouraged in, but not limited to, the following topics:

  • Capture and analysis
  • Representation, transformation and modelling
  • Transmission and resynthesis
  • Speech/voice effects and manipulation
  • Perception, psychoacoustics and evaluation
  • Spatial sound analysis, coding and synthesis
  • Sound source separation
  • Sound synthesis and composition
  • Hardware and software design
  • Computational auditory scene analysis

We especially welcome submissions addressing:

  • Digital audio processing for inclusion
  • Immersive and AR/VR audio effects
  • Sonification and sound design using non-acoustic data

Prospective authors are invited to submit full-length papers, eight pages maximum, for both oral and poster presentations, before March 29th, 2018.

Submitted papers must be camera-ready and formatted according to the templates and instructions available at the DAFx2018 website. All papers have to be submitted through the EasyChair conference management system and are subject to peer review. Acceptance may be conditional upon changes being made to the paper as directed by the reviewers. Proceedings with the final versions of the accepted contributions will be made freely accessible on the DAFx2018 website after the conference closure.

Volumes 2008 to 2017 of DAFx proceedings are now indexed in Scopus and this will apply similarly to DAFx2018 proceedings. Extended versions of the best DAFx2018 papers will be given special consideration for publication in the Journal of the Audio Engineering Society.

Important dates:

  • Full-Paper Submission: March 29th, 2018 April 9th, 2018
  • Notification of Acceptance: May 25th, 2018
  • Final Paper Submission: June 22nd, 2018
  • Author registration deadline: June 29th, 2018

PDF version of the CFP can be found here. Paper Templates (and instructions) can be found here. Any questions can be sent to dafx2018_papers@ua.pt.


