Welcome to DAFx 2018, the international conference on Digital Audio Effects, to be held at Aveiro, Portugal, on September 4–8 2018.
DAFx18 is organised by the University of Aveiro, through its Institute of Electronics and Informatics Engineering (IEETA), in collaboration with the Portuguese Audio Engineering Association (APEA).
The conference will be hosted at the university campus and will feature oral and poster presentations of accepted papers, keynote addresses, tutorials and demonstrations. The social program – including welcome reception, concert and banquet – will offer opportunities for more informal interaction while enjoying the city and the region.
This annual conference is a coming together of those working across the globe in research on digital audio processing for music and speech, sound art, acoustics and related applications.
DAFx18 Local Organizing Committee:
Aveiro University
Aveiro, Portugal
Tuesday to Saturday
September 4 to 8, 2018
- Keynote: Joshua Reiss (Queen Mary University of London)
- Keynote: David Farmer ("The Lord of the Rings", "The Hobbit")
- Keynote: Yvan Grabit (Steinberg)
- Tutorial: "Building plugins and DSP with JUCE”, Julian Storer (ROLI)
- Tutorial: “Machine Learning with Applications to Audio”, Shahan Nercessian (iZotope)
- Tutorial: “Digital Audio Filters”, Vesa Välimäki (Aalto Univ.)
- Tutorial: “Perceptual and cognitive factors for VR audio”, Catarina Mendonça (Aalto Univ.)
- Concert: Paul Hanson (Bassoon + Audio Effects)
- Session: Audio Effects Jam Session, Xperimus Ensemble
Time | Session | Speakers | Venue |
---|---|---|---|
08:30 | Registration (all day) | - | Foyer |
09:00 | Opening Session | - | Auditorium |
09:40 | "Perceptual and cognitive factors for VR audio" | Catarina Mendonça | Auditorium |
10:40 | Coffee Break | - | Foyer |
11:10 | “Digital Audio Filters” | Vesa Välimäki | Auditorium |
12:20 | Lunch | - | Foyer |
14:00 | "Building plugins and DSP with JUCE” | Julian Storer | Auditorium |
15:00 | Coffee Break | - | Foyer |
15:30 | “Machine Learning with Applications to Audio” | Shahan Nercessian | Auditorium |
17:30 | DAFx Welcome Reception | - | Museum of St. Joana |
Aalto University, Dept. Signal Processing and Acoustics
There are many challenges faced by those aiming to render and reproduce convincing virtual audio. This tutorial defines key concepts and goals to allow for the feeling of presence in a simulated audio world. The specific role of factors such as individualization of HRTFs and headphones, sensory adaptation, room cues, motion cues, real-time rendering, and multimodal interfaces is addressed. There is a complex interplay between the ideal sound accuracy and several of these factors. When is accuracy perceptually relevant? When can we fool the listener? These questions are answered having in mind indicators such as localization accuracy, externalization, multimodal interactions and attentional effects. There are three main conclusions: 1) what the listener perceives depends on what we ask, 2) sensory adaptation ultimately allows to overcome most technical limitations, and 3) more accurate rendering will always have benefits.
BIO
"Dr. Catarina Mendonca is an Adjunct Professor in Psychoacoustics at Aalto University’s Acoustics Lab, in Finland. She has a background in Psychology and Cognitive Sciences, having specialised in Psychoacoustics. Throughout her research career, she has always worked in perceptual studies in virtual reality. Her main areas of work have been spatial hearing, auditoryadaptation, and multisensory processes. Before obtaining her current title, Dr. Catarina Mendonca held three post-doctoral fellowships. First, she was a post-doctoral researcher at Carl von Ossietzky University (Germany) for the German Cluster of Excellence Hearing4all. The topic of her post was auditory cognition. She then became a post-doctoral fellow for the Academy of Finland (Finland) on the topic of multimodal interactions in spatial hearing and spatial audio. She later became a Marie Sklodowska Curie Fellow by EU’s H2020 programme. She worked on the topic of attention mechanisms and perception in different spatial audio setups."
Aalto University
This tutorial will review the basic digital filters used in audio and music processing, such FIR, allpass, and equalizing filters. FIR filtering is carried out by convolving the samples of the input signal with the filter coefficients. An allpass filter has a flat magnitude response and a nonlinear phase response. It is useful in numerous audio applications, such as in artificial reverberation and in delay equalization. Equalizing filters enable enhancement of sound reproduction systems. The tutorial will include sound examples and interactive demonstrations to explain how the digital filters work and what they can achieve.
BIO
"Vesa Välimäki is a Full Professor of audio signal processing and the Vice Dean for research at the Aalto University School of Electrical Engineering, Espoo, Finland. He received the MSc in Technology and the Doctor of Science in Technology degrees, both in electrical engineering, from the Helsinki University of Technology, Espoo, Finland, in 1992 and 1995, respectively. In 1996, he was a Postdoctoral Research Fellow at the University of Westminster, London, UK. In 2008-2009, he was a Visiting Scholar at the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, Stanford, CA, USA. He has collaborated in research with companies such as Genelec and Nokia Technologies. Prof. Välimäki is a Fellow of the AES (Audio Engineering Society), a Fellow of the IEEE, and a Life Member of the Acoustical Society of Finland. He is a Senior Area Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing. He has organized several special issues for scientific journals, such as the Audio Signal Processing special issue for the Applied Sciences in 2016. He was the Chairman of the International Conference on Digital Audio Effects DAFx-08 in 2008, and was the Chairman of the Sound and Music Computing Conference SMC-17 in 2017."
This talk is an introduction to the how the JUCE library providesclasses and tools that can help developers who are building plugins (or plugin hosts) and writing DSP algorithms. The topics covered are:
- A quick high-level overview of JUCE and the functional areas it covers;
- A dive into how the audio plugin abstraction layer works and how it would be used to build a simple plugin;
- An overview of how JUCE's plugin hosting classes work and how they might be used to write a simple plugin host;
- A dive into what JUCE's DSP module provides;
- If time permits, a quick introduction to some JUCE GUI library concepts.
No familiarity with JUCE is expected, but the talk will require some experience with C++ to get the most out of it.
BIO
"Jules has been a C++ programmer since the early 1990s, workingmainly in the audio technology industry. He’s best-known for creating theTracktion DAW and the C++ library JUCE, which has been used for over adecade in thousands of commercial and open-source audio products. Now withROLI, he continues to work on JUCE as well as other ROLI products."
Machine learning is an exploding field which over the past few years has seen great advances, received arguably excessive hype, and hasbecome ubiquitous in our every-day lives. In its correct application, machine learning enables and has already demonstrated borderline science-fiction-likeprocessing and decision making of data, particularly in the domain of imageprocessing and analysis. In this tutorial, machine learning and its associated buzzwords will be de-mystified, being explained what it is, what it isn’t, and how it works. Upon some common machine learning problems being formulated and a short overview of more ”classical” machine learning approaches being given, a deeper dive into neural networks and touch on some modern deep learning architectures will be taken. Throughout, applications of machine learning to audio problems will be explored and shown how it is used in iZotope products for carrying out various audio classification and restoration tasks.
BIO
"Shahan Nercessian received his B.S., M.S. and Ph.D. in ElectricalEngineering from Tufts University in 2007, 2009, and 2012 respectively. HisPh.D. research was focused on multi-resolution algorithms for image processinginspired by human visual system phenomena. In 2012, he became a member ofTechnical Staff at MIT Lincoln Laboratory. In 2017, he joined iZotope as a DSPresearch engineer (actually his first week on the job was at DAFx2017!), wherehe develops new DSP algorithms and researches machine learning techniquesfor their product line. He is an avid jazz musician, and continues to produceand play his own genre-bending original music."
The DAFx2018 welcome reception will open with an Aveiro d’Honra – a special reception with sparkling wine from Aveiro’s Bairrada wine region accompanied by ovos moles de Aveiro, a traditional treat consisting of a creamy egg mixture with a light wafer casing.
The reception will be followed by an informal tour of the museum – a former convent founded in the 15th century, synonymous with Princess Joana (1452-1490), who lived there from 1472 to 1490 and was beatified in 1693.
The museum, named after St. Joana, the Princess, holds a stunning art collection spanning the 15th to the 20th century, most notably from the baroque period.
A short musical performance in the cloister, by the a capella choir Voz Nua, conducted by Aoife Hiney, will bring the reception to a close.
Aveiro D'Honra
Informal Tour
Music at the cloister
In films, games, music and virtual reality, surround sounds are recreated, or unreal sounds to evoke emotions and capture the imagination are created. But there is a world of fascinating phenomena related to sound and perception that is not yet understood. If a deep understanding of how complex audio is perceived and responded to can be gained, not only the produced content can be interpreted, but new content of unprecedented quality and range could be created. This talk is targeted at a general audience, and considers the possibilities opened up by such research. What are the limits of human hearing? Can a realistic virtual world be created without relying on recorded samples? If every sound in a major film or game soundtrack were computer-generated, could level of realism comparable to modern computer graphics be reached? Could a robot replace the sound engineer? Investigating such questions reveals surprising aspects of auditory perception, and has the potential to revolutionise sound design and music production
BIO
"Josh Reiss is a Professor of Audio Engineering with the Centrefor Digital Music at Queen Mary University of London. He has publishedmore than 200 scientific papers (including over 50 in premier journals and5 best paper awards), and co-authored the textbook Audio Effects: Theory,Implementation and Application. His research has been featured in dozens oforiginal articles and interviews on TV, radio and in the press. He is a formerGovernor of the Audio Engineering Society (AES), chair of their PublicationsPolicy Committee, and co-chair of the Technical Committee on High-resolutionAudio. He co-founded the highly successful spin-out company, LandR, and iscurrently forming a second start-up, FXive. His primary focus of research is onthe use of state-of-the-art signal processing techniques for sound design andaudio production. He maintains a popular blog, YouTube channel and twitterfeed for scientific education and dissemination of research activities."
Authors
Vadim Zavalishin and Julian Parker
Abstract
A significant part of the appeal of tape-based delay effects is the manner in which the pitch of their output responds to changes in delay-time. Straightforward approaches to implementation of de- lays with tape-like modulation behavior result in algorithms with time complexity proportional to the tape speed, leading to notice- able increases of CPU load at smaller delay times. We propose a method which has constant time complexity, except during tape speedup transitions, where the complexity grows logarithmically, or, if proper antialiasing is desired, linearly with respect to the speedup factor.
Authors
Martin Holters and Julian Parker
Abstract
Bucket brigade devices (BBDs) were invented in the late 1960s as a method of introducing a time-delay into an analog electrical circuit. They work by sampling the input signal at a certain clock rate and shifting it through a chain of capacitors to obtain the delay. BBD chips have been used to build a large variety of analog effects processing devices, ranging from chorus to flanging to echo effects. They have therefore attracted interest in virtual analog modeling and a number of approaches to modeling them digitally have appeared. In this paper, we propose a new model for the bucket-brigade device. This model is based on a variable sample- rate, and utilizes the surrounding filtering circuitry found in real devices to avoid the need for the interpolation usually needed in such a variable sample-rate system.
Authors
Gordon Wichern and Alexey Lukin
Abstract
The noise that lavalier microphones produce when rubbing against clothing (typically referred to as rustle) can be extremely diffi- cult to automatically remove because it is highly non-stationary and overlaps with speech in both time and frequency. Recent breakthroughs in deep neural networks have led to novel techni- ques for separating speech from non-stationary background noise. In this paper, we apply neural network speech separation techni- ques to remove rustle noise, and quantitatively compare multiple deep network architectures and input spectral resolutions. We find the best performance using bidirectional recurrent networks and spectral resolution of around 20 Hz. Furthermore, we propose an ambience preservation post-processing step to minimize potential gating artifacts during pauses in speech.
Authors
Geovani Alves and Marcelo Rosa
Abstract
Here we present a micro-controlled digital effect unit for guitars. Different from other undergraduate projects, we used high-quality 16-bit Analog-to-Digital (A/D) and Digital-to-Analog (D/A) con- verters operating at 48kHz that respectively transfer data to and from a micro-controller through serial peripheral interfaces (SPIs). We discuss the design decisions for interconnecting all these com- ponents, the project of anti-aliasing (low-pass) filters, and addi- tional features useful for players. Finally, we show some results obtained from this device, and discuss future improvements.
Authors
Vesa Välimäki, Jussi Ramo and Fabian Esqueda
Abstract
This paper proposes signal processing methods to extend a station- ary part of an audio signal endlessly. A frequent occasion is that there is not enough audio material to build a synthesizer, but an example sound must be extended or modified for more variabil- ity. Filtering of a white noise signal with a filter designed based on high-order linear prediction or concatenation of the example signal can produce convincing arbitrarily long sounds, such as ambient noise or musical tones, and can be interpreted as a spectral freeze technique without looping. It is shown that the random input sig- nal will pump energy to the narrow resonances of the filter so that lively and realistic variations in the sound are generated. For real- time implementation, this paper proposes to replace white noise with velvet noise, as this reduces the number of operations by 90% or more, with respect to standard convolution, without affecting the sound quality, or by FFT convolution, which can be simplified to the randomization of spectral phase and only taking the inverse FFT. Examples of producing endless airplane cabin noise and pi- ano tones based on a short example recording are studied. The proposed methods lead to a new way to generate audio material for music, films, and gaming.
Authors
Joseph Colonel, Christopher Curro and Sam Keene
Abstract
A method for musical audio synthesis using autoencoding neural networks is proposed. The autoencoder is trained to compress and reconstruct magnitude short-time Fourier transform frames. The autoencoder produces a spectrogram by activating its smallest hid- den layer, and a phase response is calculated using real-time phase gradient heap integration. Taking an inverse short-time Fourier transform produces the audio signal. Our algorithm is light-weight when compared to current state-of-the-art audio-producing ma- chine learning algorithms. We outline our design process, produce metrics, and detail an open-source Python implementation of our model.
Authors
Maciek Tomczak, Carl Southall and Jason Hockman
Abstract
In this transformation we present a rhythmically constrained au- dio style transfer technique for automatic mixing and mashing of two audio inputs. In this transformation the rhythmic and timbral features of both input signals are combined together through the use of an audio style transfer process that transforms the files so that they adhere to a larger metrical structure of the chosen input. This is accomplished by finding beat boundaries of both inputs and performing the transformation on beat-length audio segments. In order for the system to perform a mashup between two signals, we reformulate the previously used audio style transfer loss terms into three loss functions and enable them to be independent of the input. We measure and compare rhythmic similarities of the trans- formed and input audio signals using their rhythmic envelopes to investigate the influence of the tested transformation objectives.
Authors
Henrik von Coler, Moritz Götz and Steffen Lepa
Abstract
This paper investigates the use of different mathematical mod- els for the parametric synthesis of fundamental frequency trajecto- ries in glissando note transitions. Hyperbolic tangent, cubic splines and Bézier curves were implemented in a real-time synthesis sys- tem. In a user study, test subjects were presented two-note se- quences with glissando transitions, which had to be re-synthesized using the three different trajectory models, employing a pure sine wave synthesizer. Resulting modeling errors and user feedback on the models were evaluated, indicating a significant disadvantage of the hyperbolic tangent in the modeling accuracy. Its reduced complexity and number of parameters were however not rated to increase the usability.
Authors
Richard Vogl, Gerhard Widmer and Peter Knees
Abstract
Automatic drum transcription, a subtask of the more general auto- matic music transcription, deals with extracting drum instrument note onsets from an audio source. Recently, progress in transcrip- tion performance has been made using non-negative matrix fac- torization as well as deep learning methods. However, these works primarily focus on transcribing three drum instruments only: snare drum, bass drum, and hi-hat. Yet, for many applications, the abil- ity to transcribe more drum instruments which make up standard drum kits used in western popular music would be desirable. In this work, convolutional and convolutional recurrent neural net- works are trained to transcribe a wider range of drum instruments. First, the shortcomings of publicly available datasets in this con- text are discussed. To overcome these limitations, a larger syn- thetic dataset is introduced. Then, methods to train models using the new dataset focusing on generalization to real world data are investigated. Finally, the trained models are evaluated on publicly available datasets and results are discussed. The contributions of this work comprise: (i.) a large-scale synthetic dataset for drum transcription, (ii.) first steps towards an automatic drum transcrip- tion system that supports a larger range of instruments by eval- uating and discussing training setups and the impact of datasets in this context, and (iii.) a publicly available set of trained mod- els for drum transcription. Additional materials are available at http://ifs.tuwien.ac.at/~vogl/dafx2018.
Authors
Mark Cartwright and Juan Pablo Bello
Abstract
Current datasets for automatic drum transcription (ADT) are small and limited due to the tedious task of annotating onset events. While some of these datasets contain large vocabularies of percus- sive instrument classes (e.g. ~20 classes), many of these classes occur very infrequently in the data. This paucity of data makes it difficult to train models that support such large vocabularies. Therefore, data-driven drum transcription models often focus on a small number of percussive instrument classes (e.g. 3 classes). In this paper, we propose to support large-vocabulary drum tran- scription by generating a large synthetic dataset (210,000 eight second examples) of audio examples for which we have ground- truth transcriptions. Using this synthetic dataset along with exist- ing drum transcription datasets, we train convolutional-recurrent neural networks (CRNNs) in a multi-task framework to support large-vocabulary ADT. We find that training on both the synthetic and real music drum transcription datasets together improves per- formance on not only large-vocabulary ADT, but also beat / down- beat detection small-vocabulary ADT.
Authors
Gerard Roma, Owen Green and Pierre Alexandre Tremblay
Abstract
Extraction of stationary and transient components from audio has many potential applications to audio effects for audio content pro- duction. In this paper we explore stationary/transient separation using convolutional autoencoders. We propose two novel unsuper- vised algorithms for individual and and joint separation. We de- scribe our implementation and show examples. Our results show promise for the use of convolutional autoencoders in the extraction of sparse components from audio spectrograms, particularly using monophonic sounds.
Authors
Celine Jacques and Axel Roebel
Abstract
Automatic drum transcription (ADT) aims to detect drum events in polyphonic music. This task is part of the more general problem of transcribing a music signal in terms of its musical score and addi- tionally can be very interesting for extracting high level informa- tion e.g. tempo, downbeat, measure. This article has the objective to investigate the use of Convolutional Neural Networks (CNN) in the context of ADT. Two different strategies are compared. First an approach based on a CNN based detection of drum only onsets is combined with an algorithm using Non-negative Matrix Decon- volution (NMD) for drum onset transcription. Then an approach relying entirely on CNN for the detection of individual drum in- struments is described. The question of which loss function is the most adapted for this task is investigated together with the question of the optimal input structure. All algorithms are evaluated using the publicly available ENST Drum database, a widely used estab- lished reference dataset, allowing easy comparison with other al- gorithms. The comparison shows that the purely CNN based algo- rithm significantly outperforms the NMD based approach, and that the results are significantly better for the snare drum, but slightly worse for both the bass drum and the hi-hat when compared to the best results published so far and ones using also a neural network model.
Authors
Sebastian J. Schlecht, Benoit Alary, Vesa Välimäki and Emanuel A. P. Habets
Abstract
Decorrelation of audio signals is a critical step for spatial sound reproduction on multichannel configurations. Correlated signals yield a focused phantom source between the reproduction loud- speakers and may produce undesirable comb-filtering artifacts when the signal reaches the listener with small phase differences. Decorrelation techniques reduce such artifacts and extend the spa- tial auditory image by randomizing the phase of a signal while minimizing the spectral coloration. This paper proposes a method to optimize the decorrelation properties of a sparse noise sequence, called velvet noise, to generate short sparse FIR decorrelation fil- ters. The sparsity allows a highly efficient time-domain convolu- tion. The listening test results demonstrate that the proposed op- timization method can yield effective and colorless decorrelation filters. In comparison to a white noise sequence, the filters ob- tained using the proposed method preserve better the spectrum of a signal and produce good quality broadband decorrelation while using 76% fewer operations for the convolution. Satisfactory re- sults can be achieved with an even lower impulse density which decreases the computational cost by 88%.
Authors
Dylan Menzies and Filippo Maria Fazi
Abstract
Conventional panning approaches for surround sound require loud- speakers to be distributed over the regions where images are needed. However in many listening situations it is not practical or desirable to place loudspeakers some positions, such as behind or above the listener. Compensated Amplitude Panning (CAP) is a method that adapts dynamically to the listener’s head orientation to provide im- ages in any direction, in the frequency range up to ≈ 1000 Hz using only 2 loudspeakers. CAP is extended here for more loud- speakers, which removes some limitations and provides additional benefits. The new CAP method is also compared with an Am- bisonics approach that is adapted for surround sound without rear loudspeakers.
Authors
Jonathan S. Abel, Eoin F. Callery and Elliot K. Canfield-Dafilou
Abstract
Conventional panning approaches for surround sound require loud- speakers to be distributed over the regions where images are needed. However in many listening situations it is not practical or desirable to place loudspeakers some positions, such as behind or above the listener. Compensated Amplitude Panning (CAP) is a method that adapts dynamically to the listener’s head orientation to provide im- ages in any direction, in the frequency range up to ≈ 1000 Hz using only 2 loudspeakers. CAP is extended here for more loud- speakers, which removes some limitations and provides additional benefits. The new CAP method is also compared with an Am- bisonics approach that is adapted for surround sound without rear loudspeakers.
Authors
Stefano D'Angelo and Leonardo Gabrielli
Abstract
Several methods are available nowadays to artificially extend the duration of a signal for audio restoration or creative music produc- tion purposes. The most common approaches include overlap-and- add (OLA) techniques, FFT-based methods, and linear predictive coding (LPC). In this work we describe a novel OLA algorithm based on convolution with velvet noise, in order to exploit its spar- sity and spectrum flatness. The proposed method suppresses spec- tral coloration and achieves remarkable computational efficiency. Its issues are addressed and some design choices are explored. Ex- perimental results are proposed and compared to a well-known FFT-based method.
Authors
Yan Tang and Trevor J. Cox
Abstract
The reduction of speech intelligibility in noise is usually domi- nated by energetic masking (EM) and informational masking (IM). Most state-of-the-art objective intelligibility measures (OIM) esti- mate intelligibility by quantifying EM. Few measures model the effect of IM in detail. In this study, an auditory saliency model, which intends to measure the probability of the sources obtain- ing auditory attention in a bottom-up process, was integrated into an OIM for improving the performance of intelligibility predic- tion under IM. While EM is accounted for by the original OIM, IM is assumed to arise from the listener’s attention switching be- tween the target and competing sounds existing in the auditory scene. The performance of the proposed method was evaluated along with three reference OIMs by comparing the model predictions to the listener word recognition rates, for different noise maskers, some of which introduce IM. The results shows that the predictive accuracy of the proposed method is as good as the best reported in the literature. The proposed method, however, provides a physiologically-plausible possibility for both IM and EM modelling.
Authors
Joana Vieira, Jorge Almeida Santos and Paulo Noriega
Abstract
The relationship between physical acoustic parameters and the subjective responses they evoke is important to assess in audio alarm design. While the perception of urgency has been thor- oughly investigated, the perception of other variables such as pleasantness, negativeness and irritability has not. To characterize the psychological correlates of variables such as frequency, speed, rhythm and onset, twenty-six participants evaluated fifty-four au- dio warning signals according to six different semantic differential scales. Regression analysis showed that speed predicted mostly the perception of urgency, preoccupation and negativity; frequency predicted the perception of pleasantness and irritability; and rhythm affected the perception of urgency. No correlation was found with onset and offset times. These findings are important to human-centred design recommendations for auditory warning signals.
Authors
Francis Stevens, Damian Murphy and Stephen Smith
Abstract
Soundscape research is concerned with the study and understanding of our relationship with our surrounding acoustic environments and the sonic elements that they are comprised of. Whilst much of this research has focussed on sound alone, any practical application of soundscape methodologies should consider the interaction between aural and visual environmental features: an interaction known as cross-modal perception. This presents an avenue for soundscape research exploring how an environment’s visual features can affect an individual’s experience of the soundscape of that same envi- ronment. This paper presents the results of two listening tests1: one a preliminary test making use of static stereo UHJ renderings of first-order-ambisonic (FOA) soundscape recordings and static panoramic images; the other using YouTube as a platform to present dynamic binaural renderings of the same FOA recordings along- side full motion spherical video. The stimuli for these tests were recorded at several locations around the north of England including rural, urban, and suburban environments exhibiting soundscapes comprised of many natural, human, and mechanical sounds. The purpose of these tests was to investigate how the presence of vi- sual stimuli can alter soundscape perception and categorisation. This was done by presenting test subjects with each soundscape alone and then with visual accompaniment, and then comparing collected subjective evaluation data. Results indicate that the pres- ence of certain visual features can alter the emotional state evoked by exposure to a soundscape, for example, where the presence of ‘green infrastructure’ (parks, trees, and foliage) results in a less agitating experience of a soundscape containing high levels of envi- ronmental noise. This research represents an important initial step toward the integration of virtual reality technologies into sound- scape research, and the use of suitable tools to perform subjective evaluation of audiovisual stimuli. Future research will consider how these methodologies can be implemented in real-world applications.
Authors
Raquel Ribeiro and Diamantino Freitas
Abstract
The acoustics of spaces whose purpose is the acoustic com- munication through speech, namely classrooms, is a subject that has not been given the due importance in architectural projects, with consequences in the existence of adverse acoustic conditions, which affect on a daily basis the learning of the students and the well-being of teachers.
One of the lecture rooms of the Faculty of Engineering of the University of Porto (FEUP) was chosen, more precisely amphithe- ater B013, with a criterion of generality, in which the acoustic con- ditions were evaluated and compared with those that are known to be necessary for the intended acoustic communication effect. Several measurements were made in the space to investigate the acoustic parameters situation relatively to the appropriate range.
An acoustic model of the amphitheater under study was devel- oped in the EASE software, with which it was possible to obtain simulated results for comparison with the previously measured pa- rameters and to introduce changes in the model to perceive their impact in the real space. In this phase it was possible to use the au- ralization resources of the software to create perception of how the sound is heard on the built model. This was useful for the phase of rehabilitation of the space because it was possible to judge subjec- tively the improvement of the sound intelligibility in that space.
Finally, possible solutions are presented in the acoustic do- main and using electroacoustic sound reinforcement aiming to pro- vide a better acoustic comfort and communicational effectiveness for the people who use it.
Wednesday, Sept. 5th / Teatro Aveirense
"Paul Hansonʼs musical journey is a testament of fearless dedication to craft and creativity. (...) His explorations have transcended limitations and created new possibilities-all while making music of the highest quality. Paulʼs repertoire encompasses musical aspects of all modern styles of improvised music." - Paul Hanson's Bio
Paul Hanson's Website: http://paulhansonmusic.com/
Paul Hanson's Facebook: https://www.facebook.com/paulhansonmusic/
Paul Hanson's YouTube: https://www.youtube.com/user/jazzbassoonpaul
Here, the intention is simply to give a window into an actual users experience. Some examples will be shown of how the use of plugins is applied in a typical day. This will include what draws somebody to use certain plugins over others that may do similar things. Some GUI features will be explored that are found useful and also what is a hinderance. It will be also discussed what it's like to be an end user in a saturated market of products and just how it is to discover, try, and buy developers products.
BIO
"David Farmer was born and raised in Virginia and sound captured his interest as a young boy. He moved to the Los Angeles area in 1992 and in 1996 began Sound Designing at Skywalker Sound on The Arrival. David worked with Chris Boyes on numerous films including Armageddon, Con Air, Space Cowboys, and The 13th Warrior. In 1999 David began an extended period in New Zealand as the Sound Designer for the Lord of the Rings trilogy, King Kong, and followed most recently by The Hobbit trilogy. David recently finished Sound Design on Marvel’s Ant-Man."
Authors
Jingjie Zhang and Julius Smith
Abstract
Vacuum tube amplifiers, known for their acclaimed distortion char- acteristics, are still widely used in hi-fi audio devices. However, bulky, fragile and power-consuming vacuum tube devices have also motivated much research on digital emulation of vacuum tube amplifier behaviors. Recent studies on Wave Digital Filters (WDF) have made possible the modeling of multi-stage vacuum tube am- plifiers within single WDF SPQR trees. Our research combines the latest progress on WDF with the modified blockwise method to reduce the overall computational complexity of modeling cas- caded vacuum tube amplifiers by decomposing the whole circuit into several small stages containing only two adjacent triodes. Cer- tain performance optimization methods are discussed and applied in the eventual real-time implementation.
Author
Gianpaolo Evangelista
Abstract
Time warping is an important paradigm in sound processing, which consists of composing the signal with another function of time called the warping map. This paradigm leads to different points of view in signal processing, fostering the development of new effects or the conception of new implementations of existing ones. While the introduction of time warping in continuous-time signals is in principle not problematic, time warping of discrete- time signals is not self-evident. On one hand, if the signal samples were obtained by sampling a bandlimited signal, the warped sig- nal is not necessarily bandlimited: it has a sampling theorem of its own, based on irregular sampling, unless the map is linear. On the other hand, most signals are regularly sampled so that the samples at non-integer multiples of the sampling interval are not known. While the use of interpolation can partly solve the problem it usu- ally introduces artifacts. Moreover, in many sound applications, the computation already involves a phase vocoder. In this paper we introduce new methods and algorithms for time-warping based on warped time-frequency representations. These lead to alterna- tive algorithms for warping for use in sound processing tools and digital audio effects and shed new light in the interaction of time warping with phase vocoders. We also outline the applications of time warping in digital audio effects.
Authors
Esteban Maestre, Gary Scavone and Julius Smith
Abstract
In the context of efficient synthesis of wind instrument sound, we introduce a technique for joint modeling of input impedance and sound pressure radiation as digital filters in parallel form, with the filter coefficients derived from experimental data. In a series of laboratory measurements taken on an alto saxophone, the in- put impedance and sound pressure radiation responses were ob- tained for each fingering. In a first analysis step, we iteratively minimize the error between the frequency response of an input impedance measurement and that of a digital impedance model constructed from a parallel filter structure akin to the discretiza- tion of a modal expansion. With the modal coefficients in hand, we propose a digital model for sound pressure radiation which relies on the same parallel structure, thus suitable for coefficient estimation via frequency-domain least-squares. For modeling the transition between fingering positions, we propose a simple model based on linear interpolation of input impedance and sound pres- sure radiation models. For efficient sound synthesis, the common impedance-radiation model is used to construct a joint reflectance- radiation digital filter realized as a digital waveguide termination that is interfaced to a reed model based on nonlinear scattering.
Authors
Antonio Goulart, Marcelo Queiroz, Joseph Timoney and Victor Lazzarini
Abstract
This paper is a continuation of our first studies on AM/FM digital audio effects, where the AM/FM decomposition equations were reviewed and some exploratory examples of effects were intro- duced. In the current paper we present more insight on the signals obtained with the AM/FM decomposition, intending to illustrate manipulations in the AM/FM domain that can be applied as in- teresting audio effects. We provide high-quality AM/FM effects and their implementations, alongside a brief objective evaluation. Audio samples and codes for real-time operation are also supplied.
Authors
Marius Miron and Matthew Davies
Abstract
We present a new approach for audio bandwidth extension for mu- sic signals using convolutional neural networks (CNNs). Inspired by the concept of inpainting from the field of image processing, we seek to reconstruct the high-frequency region (i.e., above a cutoff frequency) of a time-frequency representation given the observa- tion of a band-limited version. We then invert this reconstructed time-frequency representation using the phase information from the band-limited input to provide an enhanced musical output. We contrast the performance of two musically adapted CNN architec- tures which are trained separately using the STFT and the invert- ible CQT. Through our evaluation, we demonstrate that the CQT, with its logarithmic frequency spacing, provides better reconstruc- tion performance as measured by the signal to distortion ratio.
Authors
Aníbal Ferreira and José Tribolet
Abstract
This paper addresses a phase-related feature that is time-shift in- variant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay (NRD) and we show that it is particularly useful to describe the holistic phase properties of voiced sounds produced by a human speaker, notably vowel sounds. We illustrate the NRD feature with real data that is ob- tained from five sustained vowels uttered by 20 female speakers and 17 male speakers. It is shown that not only NRD coefficients carry idiosyncratic information, but also their estimation is quite stable and robust for all harmonics encompassing, for most vow- els, at least the first four formant frequencies. The average NRD model that is estimated using data pertaining to all speakers in our database is compared to that of the idealized Liljencrants-Fant (L- F) and Rosenberg glottal models. We also present results on the phase effects of linear-phase FIR and IIR vocal tract filter models when a plausible source excitation is used that corresponds to the derivative of the L-F glottal flow model. These results suggest that the shape of NRD feature vectors is mainly determined by the glot- tal pulse and only marginally affected by either the group delay of the vocal tract filter model, or by the acoustic coupling between glottis and vocal tract structures.
Authors
Samuel Poirot, Stefan Bilbao, Mitsuko Aramaki and Richard Kronland-Martinet
Abstract
This paper is concerned with perceptual control strategies for phys- ical modeling synthesis of vibrating resonant objects colliding non- linearly with rigid obstacles. For this purpose, we investigate sound morphologies from samples synthesized using physical modeling for non-linear interactions. As a starting point, we study the effect of linear and non-linear springs and collisions on a single-degree- of-freedom system and on a stiff strings. We then synthesize real- istic sounds of a stiff string colliding with a rigid obstacle. Numer- ical simulations allowed the definition of specific signal patterns characterizing the non linear behavior of the interaction according to the attributes of the obstacle. Finally, a global description of the sound morphology associated with this type of interaction is proposed. This study constitutes a first step towards further per- ceptual investigations geared towards the development of intuitive synthesis controls.
Authors
Elliot K. Canfield-Dafilou and Jonathan S. Abel
Abstract
An algorithm for artistic spectral audio processing and synthesis using allpass filters is presented. These filters express group de- lay trajectories, allowing fine control of their frequency-dependent arrival times. We present methods for designing the group delay trajectories to yield a novel class of filters for sound synthesis and audio effects processing. A number of categories of group de- lay trajectory design are discussed, including stair-stepped, mod- ulated, and probabilistic. Synthesis and processing examples are provided.
Authors
Manuel López Ibáñez, Nahum Álvarez and Federico Peinado
Abstract
Through this research, we develop a study aiming to explore how adaptive music can help in guiding players across virtual environ- ments. A video game consisting of a virtual 3D labyrinth was built, and two groups of subjects played through it, having the goal of retrieving a series of objects in as short a time as possible. Each group played a different version of the prototype in terms of audio: one had the ability to state their preferences by choosing several musical attributes, which would influence the actual spatialised music they listened to during gameplay; the other group played a version of the prototype with a default, non-adaptive, but also spa- tialised soundtrack. Time elapsed while completing the task was measured as a way to test user performance. Results show a sta- tistically significant correlation between player performance and the inclusion of a soundtrack adapted to each user. We conclude that there is an absence of a firm musical criteria when making sounds be prominent and easy to track for users, and that an adap- tive system like the one we propose proves useful and effective when dealing with a complex user base.
Authors
Sota Nishiguchi and Katunobu Itou
Abstract
Sound production by means of a physical model for falling ob- jects, which is intended for audio synthesis of immersive contents, is described here. Our approach is a mathematical model to syn- thesize sound and audio for animation with rigid body simulation. To consider various conditions, a collision model of an object was introduced for vibration and propagation simulation. The gener- ated sound was evaluated by comparing the model output with real sound using numerical criteria and psychoacoustic analysis. Experiments were performed for a variety of objects and floor sur- faces, approximately 90% of which were similar to real scenarios. The usefulness of the physical model for audio synthesis in virtual reality was represented in terms of breadth and quality of sound.
Authors
David Moffat and Joshua D. Reiss
Abstract
There are a range of different methods for comparing or measur- ing the similarity between environmental sound effects. These methods can be used as objective evaluation techniques, to eval- uate the effectiveness of a sound synthesis method by assessing the similarity between synthesised sounds and recorded samples. We propose to evaluate a number of different synthesis objective evaluation metrics, by using the different distance metrics as fit- ness functions within a resynthesis algorithm. A recorded sample is used as a target sound, and the resynthesis is intended to produce a set of synthesis parameters that will synthesise a sound as close to the recorded sample as possible, within the restrictions of the synthesis model. The recorded samples are excerpts of selections from a sound effects library, and the results are evaluated through a subjective listening test. Results show that one of the objective function performs significantly worse than several others. Only one method had a significant and strong correlation between the user perceptual distance and the objective distance. A recommen- dation of an objective evaluation function for measuring similarity between synthesised environmental sounds is made.
Authors
Elliot K. Canfield-Dafilou and Jonathan S. Abel
Abstract
In music recording and virtual reality applications, it is often desir- able to control the perceived size of a synthesized acoustic space. Here, we demonstrate a physically informed method for enlarging and shrinking room size. A room size parameter is introduced to modify the time and frequency components of convolution, delay network, and modal artificial reverberation architectures to affect the listener’s sense of the size of the acoustic space taking into account air and materials absorption.
Authors
Stefano Papetti, Federico Avanzini and Federico Fontana
Abstract
An extensive piano sample library consisting of binaural sounds and keyboard vibration signals is made available through an open- access data repository. Samples were acquired with high-quality audio and vibration measurement equipment on two Yamaha Disklavier pianos (one grand and one upright model) by means of computer-controlled playback of each key at ten different MIDI velocity values. The nominal specifications of the equipment used in the acquisition chain are reported in a companion document, allowing researchers to calculate physical quantities (e.g., acoustic pressure, vibration acceleration) from the recordings. Also, project files are provided for straightforward playback in a free software sampler available for Windows and Mac OS systems. The library is especially suited for acoustic and vibration research on the pi- ano, as well as for research on multimodal interaction with musical instruments.
Authors
Luca Marinelli and Holger Kirchhoff
Abstract
This paper presents a position-based attenuation and amplifica- tion method suitable for source separation and enhancement. Our novel sigmoidal time-frequency mask allows us to directly control the level within a target azimuth range and to exploit a trade-off between the production of musical noise artifacts and separation quality. The algorithm is fully describable in a closed and compact analytical form. The method was evaluated on a multitrack dataset and compared to another position-based source separation algo- rithm. The results show that although the sigmoidal mask leads to a lower source-to-interference ratio, the overall sound quality mea- sured by the source-to-distortion ratio and the source-to-artifacts ratio is improved.
Authors
Safa Chebbi and Sofia Ben Jebara
Abstract
In this paper, we propose to reduce the relatively high-dimension
of pitch-based features for fear emotion recognition from speech.
To do so, the K-nearest neighbors algorithm has been used to clas- sify three emotion classes: fear, neutral and ’other emotions’. Many techniques of dimensionality reduction are explored. First of all, optimal features ensuring better emotion classification are deter- mined. Next, several families of dimensionality reduction, namely PCA, LDA and LPP, are tested in order to reveal the suitable di- mension range guaranteeing the highest overall and fear recogni- tion rates. Results show that the optimal features group permits 93.34% and 78.7% as overall and fear accuracy rates respectively. Using dimensionality reduction, Principal Component Analysis (PCA) has given the best results: 92% as overall accuracy rate and 93.3% as fear recognition percentage.
Authors
Nuno Carriço, Guilherme Campos and José Vieira
Abstract
An audio-guide prototype was developed which makes it possi- ble to associate virtual sound sources to tourist route focal points. An augmented reality effect is created, as the (virtual) audio content presented through headphones seems to originate from the specified (real) points.
A route management application allows specification of source positions (GPS coordinates), audio content (monophonic files) and route points where playback should be triggered.
The binaural spatialisation effects depend on user pose rela- tive to the focal points: position is detected by a GPS receiver; for head-tracking, an IMU is attached to the headphone strap. The main application, developed in C++, streams the audio con- tent through a real-time auralisation engine. HRTF filters are se- lected according to the azimuth and elevation of the path from the virtual source, continuously updated based on user pose.
Preliminary tests carried out with ten subjects confirmed the ability to provide the desired audio spatialisation effects and identified position detection accuracy as the main aspect to be improved in the future.
Authors
Remy Müller and Thomas Hélie
Abstract
This article is concerned with the power-balanced simulation of analog audio circuits, governed by nonlinear differential algebraic equations (DAE). The proposed approach is to combine principles from the port-Hamiltonian and Brayton-Moser formalisms to yield a skew-symmetric gradient system. The practical interest is to pro- vide a solver, using an average discrete gradient, that handles dif- ferential and algebraic relations in a unified way, and avoids having to pre-solve the algebraic part. This leads to a structure-preserving method that conserves the power balance and total energy. The proposed formulation is then applied on typical nonlinear audio circuits to study the effectiveness of the method.
Authors
Olafur Bogason and Kurt Werner
Abstract
Wave Digital Filters were developed to discretize linear time in- variant lumped systems, particularly electronic circuits. The time- invariant assumption is baked into the underlying theory and be- comes problematic when simulating audio circuits that are by na- ture time-varying. We present extensions to WDF theory that in- corporate proper numerical schemes, allowing for the accurate sim- ulation of time-varying systems.
We present generalized continuous-time models of reactive components that encapsulate the time-varying lossless models pre- sented by Fettweis, the circuit-theoretic time-varying models, as well as traditional LTI models as special cases. Models of time- varying reactive components are valuable tools to have when mod- eling circuits containing variable capacitors or inductors or electri- cal devices such as condenser microphones. A power metric is derived and the model is discretized using the alpha-transform nu- merical scheme and parametric wave definition.
Case studies of circuits containing time-varying resistance and capacitance are presented and help to validate the proposed gener- alized continuous-time model and discretization.
Authors
Antonin Novak, Bertrand Lihoreau, Pierrick Lotton, Emmanuel Brasseur and Laurent Simon
Abstract
In this paper, we focus on studying nonlinear behavior of the pickup of an electric guitar and on its modeling. The approach is purely experimental, based on physical assumptions and attempts to find a nonlinear model that, with few parameters, would be able to pre- dict the nonlinear behavior of the pickup. In our experimental setup a piece of string is attached to a shaker and vibrates per- pendicularly to the pickup in frequency range between 60 Hz and 400 Hz. The oscillations are controlled by a linearizion feedback to create a purely sinusoidal steady state movement of the string. In the first step, harmonic distortions of three different magnetic pickups (a single-coil, a humbucker, and a rail-pickup) are com- pared to check if they provide different distortions. In the second step, a static nonlinearity of Paiva’s model is estimated from ex- perimental signals. In the last step, the pickup nonlinearities are compared and an empirical model that fits well all three pickups is proposed.
Authors
Geoffrey Gormond, Fabián Esqueda, Henri Pöntynen and Julian Parker
Abstract
The Serge Triple Waveshaper (TWS) is a synthesizer module de- signed in 1973 by Serge Tcherepnin, founder of Serge Modular Music Systems. It contains three identical waveshaping circuits that can be used to convert sawtooth waveforms into sine waves. However, its sonic capabilities extend well beyond this particular application. Each processing section in the Serge TWS is built around what is known as a Norton amplifier. These devices, unlike traditional operational amplifiers, operate on a current differencing principle and are featured in a handful of iconic musical circuits. This work provides an overview of Norton amplifiers within the context of virtual analog modeling and presents a digital model of the Serge TWS based on an analysis of the original circuit. Results obtained show the proposed model closely emulates the salient features of the original device and can be used to generate the complex waveforms that characterize “West Coast” synthesis.
Music... -
From an idea of an algorithm to a final commercial plugin, there is a lot of steps to be known and understood as a developer in order to make the best of an idea. Such most important things will be adressed, from DSP design to UX/UI design including such concern like latency, bypassing, parameters, precision, automation, surround, persistency,... using reference to Audio's plugin formats, mainly based on VST3. The goal of this keynote is to help future or already established plugin's developers to be prepared and aware of what should be not forgotten during development.
BIO
"After a quiet childhood and youth in “Pays de Gex” (an area jammed between the French Jura mountain and Geneva), during which he played as drummer in different Brass Bands, Yvan Grabit completed an Engineering degree in Image Processing and computing at ISEP (“Institut Supérieur d´Électronique de Paris”).
He started research at Fraunhofer IGD in Rostock for 8 months, then moved to Paris and Cannes to work for Aérospatiale (now part of Thales Alenia Space) in satellite image processing as developer and project manager.
21 years ago, Yvan decided to change his field of work from image to audio and began his career as a developer at Steinberg (Hamburg). He started in the Nuendo team, then developed different plug-ins, such as LM-4, The Grand and HALion with Charlie Steinberg, and took on the responsibility for the development of plug-in integration in DAW, surround features and VST-SDK (version 2 and later on version 3).
Today, he is team leader of the Research group which works in different fields like MIR, 3D/VR audio, Restauration, Machine Learning, etc. As technical lead of VST he promotes and maintains VST 3, and supports 3rd Party developers.
He continues to play drums, guitar and piano in several music groups."
Authors
Marco A. Martínez Ramírez and Joshua D. Reiss
Abstract
This work aims to implement a novel deep learning architec- ture to perform audio processing in the context of matched equal- ization. Most existing methods for automatic and matched equal- ization show effective performance and their goal is to find a re- spective transfer function given a frequency response. Neverthe- less, these procedures require a prior knowledge of the type of filters to be modeled. In addition, fixed filter bank architectures are required in automatic mixing contexts. Based on end-to-end convolutional neural networks, we introduce a general purpose ar- chitecture for equalization matching. Thus, by using an end-to- end learning approach, the model approximates the equalization target as a content-based transformation without directly finding the transfer function. The network learns how to process the au- dio directly in order to match the equalized target audio. We train the network through unsupervised and supervised learning proce- dures. We analyze what the model is actually learning and how the given task is accomplished. We show the model performing matched equalization for shelving, peaking, lowpass and highpass IIR and FIR equalizers.
Authors
Mark Rau, Jonathan Abel and Julius Smith
Abstract
This paper proposes a method to filter the output of instrument contact sensors to approximate the response of a well placed mi- crophone. A modal approach is proposed in which mode frequen- cies and damping ratios are fit to the frequency response of the contact sensor, and the mode gains are then determined for both the contact sensor and the microphone. The mode frequencies and damping ratios are presumed to be associated with the resonances of the instrument. Accordingly, the corresponding contact sensor and microphone mode gains will account for the instrument radia- tion. The ratios between the contact sensor and microphone gains are then used to create a parallel bank of second-order biquad fil- ters to filter the contact sensor signal to estimate the microphone signal.
Author
Henrik von Coler
Abstract
The presented sample library of violin sounds is designed as a tool for the research, development and testing of sound analy- sis/synthesis algorithms. The library features single sounds which cover the entire frequency range of the instrument in four dynamic levels, two-note sequences for the study of note transitions and vi- brato, and solo pieces for performance analysis. All parts come with a hand-labeled segmentation ground truth which mark attack, release and transition/transient segments. Additional relevant in- formation on the samples’ properties is provided for single sounds and two-note sequences. Recordings took place in an anechoic chamber with a professional violinist and a recording engineer, us- ing two microphone positions. This document describes the con- tent and the recording setup in detail, alongside basic statistical properties of the data.
Authors
Martin Weiss Hansen, Jacob Møller Hjerrild, Mads Græsbøll Christensen and Jesper Kjeldskov.
Abstract
In this paper, a method for separating stereophonic mixtures into their harmonic constituents is proposed. The method is based on a harmonic signal model. An observed mixture is decomposed by first estimating the panning parameters of the sources, and then estimating the fundamental frequencies and the amplitudes of the harmonic components. The number of sources and their panning parameters are estimated using an approach based on clustering of narrowband interaural level and time differences. The panning parameter distribution is modelled as a Gaussian mixture and the generalized variance is used for selecting the number of sources. The fundamental frequencies of the sources are estimated using an iterative approach. To enforce spectral smoothness when estimat- ing the fundamental frequencies, a codebook of magnitude ampli- tudes is used to limit the amount of energy assigned to each har- monic. The source models are used to form Wiener filters which are used to reconstruct the sources. The proposed method can be used for source re-panning (demonstration given), remixing, and multi-channel upmixing, e.g. for hi-fi systems with multiple loud-speakers.
Authors
Julian Neri and Philippe Depalle
Abstract
This paper proposes a new partial tracking method, based on linear programming, that can run in real-time, is simple to imple- ment, and performs well in difficult tracking situations by consid- ering spurious peaks, crossing partials, and a non-stationary short- term sinusoidal model. Complex constant parameters of a gener- alized short-term signal model are explicitly estimated to inform peak matching decisions. Peak matching is formulated as a vari- ation of the linear assignment problem. Combinatorially optimal peak-to-peak assignments are found in polynomial time using the Hungarian algorithm. Results show that the proposed method cre- ates high-quality representations of monophonic and polyphonic sounds.
Authors
Corey Kereliuk, Woody Herman, Russell Wedelich and Daniel Gillespie
Abstract
This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters (frequency, decay time, initial magnitude and initial phase) of a modal reverberator that best match a provided room impulse response. By applying per- ceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. A hybrid FIR/modal reverb architecture is also presented which allows for the efficient mod- eling of room impulse responses that contain sparse early reflec- tions and dense late reverb. MUSHRA tests comparing the analy- sis/synthesis using various mode numbers for our algorithms, and for another state of the art algorithm, are included as well.
Authors
Orchisama Das, Jonathan Abel and Julius Smith Iii
Abstract
This paper describes a modification of the ESPRIT algorithm which can be used to determine the parameters (frequency, decay time, initial magnitude and initial phase) of a modal reverberator that best match a provided room impulse response. By applying per- ceptual criteria we are able to match room impulse responses using a variable number of modes, with an emphasis on high quality for lower mode counts; this allows the synthesis algorithm to scale to different computational environments. A hybrid FIR/modal reverb architecture is also presented which allows for the efficient mod- eling of room impulse responses that contain sparse early reflec- tions and dense late reverb. MUSHRA tests comparing the analy- sis/synthesis using various mode numbers for our algorithms, and for another state of the art algorithm, are included as well.
Author
Luca Turchet
Abstract
To date, the most successful onset detectors are those based on frequency representation of the signal. However, for such methods the time between the physical onset and the reported one is unpre- dictable and may largely vary according to the type of sound being analyzed. Such variability and unpredictability of spectrum-based onset detectors may not be convenient in some real-time applica- tions. This paper proposes a real-time method to improve the tem- poral accuracy of state-of-the-art onset detectors. The method is grounded on the theory of hard real-time operating systems where the result of a task must be reported at a certain deadline. It con- sists of the combination of a time-base technique (which has a high degree of accuracy in detecting the physical onset time but is more prone to false positives and false negatives) with a spectrum-based technique (which has a high detection accuracy but a low tempo- ral accuracy). The developed hard real-time onset detector was tested on a dataset of single non-pitched percussive sounds using the high frequency content detector as spectral technique. Experi- mental validation showed that the proposed approach was effective in better retrieving the physical onset time of about 50% of the hits detected by the spectral technique, with an average improvement of about 3 ms and maximum one of about 12 ms. The results also revealed that the use of a longer deadline may capture better the variability of the spectral technique, but at the cost of a bigger la- tency.
Authors
João Pereira, Gilberto Bernardes and Rui Penha
Abstract
We present MusikVerb, a novel digital reverberation capable of adapting its output to the harmonic context of a live music perfor- mance. The proposed reverberation is aware of the harmonic con- tent of an audio input signal and ‘tunes’ the reverberation output to its harmonic content using a spectral filtering technique. The dy- namic behavior of MusikVerb avoids the sonic clutter of traditional reverberation, and most importantly, fosters creative endeavor by providing new expressive and musically-aware uses of reverbera- tion. Despite its applicability to any input audio signal, the pro- posed effect has been designed primarily as a guitar pedal effect and a standalone software application.
Authors
Riccardo Simionato, Juho Liski, Vesa Välimäki and Federico Avanzini
Abstract
A virtual tube delay effect based on the real-time simulation of acoustic wave propagation in a garden hose is presented. The pa- per describes the acoustic measurements conducted and the anal- ysis of the sound propagation in long narrow tubes. The obtained impulse responses are used to design delay lines and digital fil- ters, which simulate the propagation delay, losses, and reflections from the end of the tube which may be open, closed, or acousti- cally attenuated. A study on the reflection caused by a finite-length tube is described. The resulting system consists of a digital waveg- uide model and produces delay effects having a realistic low-pass filtering. A stereo delay effect plugin in PURE DATA1 has been implemented and it is described here.
Authors
Philippe Esling, Axel Chemla-Romeu-Santos, Adrien Bitton
Abstract
Timbre spaces have been used in music perception to study the per- ceptual relationships between instruments based on dissimilarity ratings. However, these spaces do not generalize to novel exam- ples and do not provide an invertible mapping, preventing audio synthesis. In parallel, generative models have aimed to provide methods for synthesizing novel timbres. However, these systems do not provide an understanding of their inner workings and are usually not related to any perceptually relevant information. Here, we show that Variational Auto-Encoders (VAE) can alleviate all of these limitations by constructing generative timbre spaces. To do so, we adapt VAEs to learn an audio latent space, while using perceptual ratings from timbre studies to regularize the organiza- tion of this space. The resulting space allows us to analyze novel instruments, while being able to synthesize audio from any point of this space. We introduce a specific regularization allowing to enforce any given similarity distances onto these spaces. We show that the resulting space provide almost similar distance relation- ships as timbre spaces. We evaluate several spectral transforms and show that the Non-Stationary Gabor Transform (NSGT) provides the highest correlation to timbre spaces and the best quality of syn- thesis. Furthermore, we show that these spaces can generalize to novel instruments and can generate any path between instruments to understand their timbre relationships. As these spaces are con- tinuous, we study how audio descriptors behave along the latent dimensions. We show that even though descriptors have an overall non-linear topology, they follow a locally smooth evolution. Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.
-
In this improvised concert by Xperimus Ensemble — on this occasion led by Helena Marinho (piano) and including Belquior Guerrero (guitar), Gilberto Bernardes (saxophone), and Luís Bittencourt (percussion) — the musicians will explore extreme settings of submitted Audio Effects as resonant sonic spaces, searching in real time for their unique musical expressivity.
We asked DAFx participants to submit their Audio Effects for this Jam Session, particularly the ones being presented at this year’s conference (as a VST, Audio Unit, Pure Data or Max/MSP, running on macOS High Sierra), an idea for an extreme preset they would like to see explored and instructions on how to use the plugin.
Time | Session | Venue |
---|---|---|
8:30 | DAFx Saturday Trip: Tour of Arouca | University - bus departure |
Saturday, Sept. 8th / Arouca
Arouca is a mountainous region of 328 km2 renowned for its outstanding natural and cultural heritage, declared, in its entirety, a UNESCO Global Geopark. It is also a biodiversity hotspot, with 47% of its total area protected under the Natura 2000 Network.
PRICE PER PERSON: €95 (VAT INCLUDED)
GROUP SIZE: up to 50 participants
INCLUDED: private coach, all admissions, coffee break, Arouca Geopark guiding, lunch with wine, pipe organ concert, taste of the Arouca Monastery sweet treats and personal accident insurance.
TOUR LEADER: Eduarda Paz / Managing Director Biologist (U. of Coimbra); MA in Conservation - Historic Gardens and Landscapes (U. of York)
More info here.
We are pleased to announce that the 21st International Conference on Digital Audio Effects (DAFx2018) will be held at Aveiro, Portugal, on September 4–8 2018.
DAFx2018 is organised by the University of Aveiro, through its Institute of Electronics and Informatics Engineering (IEETA), in collaboration with the Portuguese Audio Engineering Association (APEA). The conference will be hosted at the university campus and will feature oral and poster presentations of accepted papers, keynote addresses, tutorials and demonstrations. The social program – including welcome reception, concert and banquet – will offer opportunities for more informal interaction while enjoying the city and the region.
This annual conference is a coming together of those working across the globe in research on digital audio processing for music and speech, sound design, acoustics and related applications. Original contributions for DAFx2018 are encouraged in, but not limited to, the following topics:
We especially welcome submissions addressing:
Prospective authors are invited to submit full-length papers, eight pages maximum, for both oral and poster presentations, before March 29th, 2018.
Submitted papers must be camera-ready and formatted according to the templates and instructions available at the DAFx2018 website. All papers have to be submitted through the EasyChair conference management system and are subject to peer review. Acceptance may be conditional upon changes being made to the paper as directed by the reviewers. Proceedings with the final versions of the accepted contributions will be made freely accessible on the DAFx2018 website after the conference closure.
Volumes 2008 to 2017 of DAFx proceedings are now indexed in Scopus and this will apply similarly to DAFx2018 proceedings. Extended versions of the best DAFx2018 papers will be given special consideration for publication in the Journal of the Audio Engineering Society.
Important dates:
PDF version of the CFP can be found here. Paper Templates (and instructions) can be found here. Any questions can be sent to dafx2018_papers@ua.pt. Publication Ethics and Malpractice Statement here.
DAFx18 Programme Committee
DAFx Board
# | Hotel | Stars |
---|---|---|
H1 | Hotel As Américas (Recommended) | ✻ ✻ ✻ ✻ |
H2 | Meliá Ria Hotel & Spa | ✻ ✻ ✻ ✻ |
H3 | Hotel Imperial | ✻ ✻ ✻ |
H4 | Hotel das Salinas | ✻ ✻ |
H5 | Hotel José Estêvão | ✻ ✻ |
H6 | HI Hostel Aveiro - Pousada da Juventude (Youth Hostel) | - |
Rooms Prices
- Individual: 76€ per night
- Double: 94€ per night
Reservations (and other info): www.hotelasamericas.com
Note: when making the reservation, don't forget to mention the DAFx 2018 event to receive the best price.
Rooms Prices*
- Individual: 84€ per night (taxes included)
- Double: 94€ per night (taxes included)
* when making the reservation, don't forget to mention the DAFx 2018 event to receive the best price; the reservation must be done within the contacts below, and must be paid directly by the DAFx 2018 participant(s).
Reservations
- E-mail: melia.ria@meliaportugal.com
- Phone: (+351) 234 401 000
More info: https://www.meliaria.com/
Rooms Prices
- Individual: 39.5€ per night
- Double: 55€ per night
Reservations
- E-mail: reservas@hotelimperial.pt
- Phone: (+351) 234 380 159
Note: when making the reservation, don't forget to mention the DAFx 2018 event to receive the best price.
Rooms Prices
- Individual: 50€ per night
- Double (one double bed): 65€ per night
Studios Prices
- Individual: 55€ per night
- Double (two individual beds): 75€ per night
Reservations
- E-mail: reservas@hoteldassalinas.com
- Phone: (+351) 234 404 190
More info: http://www.hoteldassalinas.com/
Note: when making the reservation, don't forget to mention the DAFx 2018 event to receive the best price.
Rooms Prices
- Individual: 50€ per night
- Double: 60€ per night
Reservations (and other info): http://www.joseestevao.com/
Note: when making the reservation, don't forget to mention the DAFx 2018 event to receive the best price.
Rooms prices: starting at 13,30€
Reservations (and other info): https://pousadasjuventude.pt/pt/pousadas/aveiro/
From the north using the A1 motorway or from the east using the IP5/A25. Take the A1 motorway toward Lisbon. Exit the A1 toward Aveiro and take the A25 motorway. There are two exits to Aveiro from the A25: "Aveiro-Norte" and some kilometres further on, "Aveiro". This second exit is best for reaching the University of Aveiro. (The University is near the hospital).
From the south using the A1 motorway: take the A1 motorway toward Porto. Exit the motorway at "Aveiro-Sul/Águeda" (exit 15) and follow the EN235 road directly to the University Campus (The University is near the hospital). From the south using the A8 and A17 motorways: exit the motorway at "Aveiro-Sul” and follow the EN235 road directly to the University campus (The University is near the hospital).
The closest airport is Francisco Sá Carneiro International Airport in Porto, located some 70 kms to the north of Aveiro. Humberto Delgado International Airport in Lisbon is located 250 kms to the south of Aveiro.
From Porto to Aveiro
The trip between the airport and the railway station (called Porto-Campanhã), which has direct rail connections to Aveiro, can be done by taxi for around 20€ and takes around 30 minutes. A less expensive way to reach the Porto-Campanhã railway station is to use the Metro, which has a terminal at the airport. The Metro trip takes about 32 minutes and costs €2,60 (rechargeable ‘Andante’ card for €0,60 and Z4 ticket for €2,00). Click here for Metro information.The train journey to Aveiro takes between 40 minutes and 1h15, depending on the type of train used. There are regular trains to Aveiro from Porto. Click here for train timetables.
From Lisbon to Aveiro
If you're arriving by plane, the simplest form of transport to Aveiro is by train from Lisboa-Oriente railway station. A taxi ride to the train station takes about 10 minutes and costs around 10€, but there are less expensive alternatives - by bus, for example (lines 5 and 44). Click here for train timetables.
To reach the University Campus
Aveiro railway station is located about 20 minutes walking distance or 5 minutes taxi ride from the University Campus. To reach the Campus, you can also use the bus (green line) which departs at regular intervals from just outside the railway station.