1151

Dr.Godfried-Willem RAES

Kursus Experimentele Muziek: Boekdeel 1: Algoritmische Kompositie

Hogeschool Gent : Departement Muziek & Drama

Naar vorig hoofstuk: Klanksynthese

1151:

Digitale Klankbewerking

Bespraken we in de vorige paragraaf enkele werkwijzen waarop beroep kan worden gedaan om via programmering klanken en geluiden op te wekken, dan willen we hier enkele technieken en metodes behandelen die kunnen worden gebruikt voor de verdere verwerking en transformatie van dergelijke klanken en geluiden.

Het uitgangspunt wordt hier gevormd door de aanname dat we over in de komputer opgeslagen sample-bestanden beschikken. Deze bestanden (files) kunnen het resultaat zijn van berekeningen zoals in vorige paragraaf, maar natuurlijk kunnen zij evengoed tot stand zijn gekomen via opname en ADC konversie van geluiden uit de 'akoestische werkelijkheid'.

Voor een goed begrip van de hier behandelde technieken verdient het aanbeveling eerst boekdeel 2 van deze syllabus te bestuderen, zodat je vertrouwd raakt met de elektronische koncepten en technieken van signaalbewerking. Hun digitale implementaties steunen immers steeds op een bekendheid met hun elektronische equivalenten.

Bewerkingen op digitale samples

1. Toonhoogte-bewerkingen

Beschikken we over een sampleverzameling, dan is de afspeelsnelheid de bepalende faktor voor de toonhoogte van de klank. De meest voor de hand liggende metode om een gegeven klank over het gehele audiobereik weer te geven, bestaat er dan ook in de sampling rate bij weergave te wijzigen en de moduleren. Halveren we de weergavesnelheid dan oktaveren we de klank 1 oktaaf naar omlaag, verdubbelen we haar, dan oktaveren we omhoog.

Hoe eenvoudig dit ook moge klinken, toch zijn er aan deze werkwijze heel wat bezwaren en beperkingen verbonden:

1.- de duur van de klank is nu gekoppeld aan zijn toonhoogte

2.- wanneer we de klank omlaag transponeren, dan verlagen we meteen ook zijn resolutie: immers het aantal beschikbare samples per tijdseenheid wordt verminderd. Hierdoor neemt de sampling-ruis sterk toe.

3. Vanzodra we de weergave sampling rate zover gaan verlagen dat de frekwentie ervan binnen het audiobereik komt te vallen, wordt deze als een (storende) toon hoorbaar in het signaal. Alleen door het inschakelen van een hogere-orde lowpass filter waarvan de afsnijfrekwentie op de helft van de weergave samplingrate wordt ingesteld kan hieraan verholpen worden.

4.- De formant eigen aan de oorspronkelijke klank wordt meegetransponeerd, waardoor de eigen karakteristieken van het erin aanwezige timbre in grote mate verloren gaan. Het resulterende geluid gaat meteen 'elektronisch' klinken. Immers, bij geen enkel akoestisch instrument doet dit verschijnsel zich voor!

Het zal duidelijk zijn, dat voor een muzikaal meer aanvaardbare transpositie, andere technieken in stelling zullen moeten worden gebracht. Deze technieken kunnen verschillen al naargelang de oorsprong van de sample-verzameling die we willen gaan transponeren.

Wanneer we onze samples zuiver algoritmisch tot stand brachten, bestaat de aangewezen techniek voor toonhoogtevariatie erin, deze op het nivo van de klanksynthese-algoritmiek zelf vast te leggen. De op te wekken toonhoogte moet een parameter zijn van het synthese-algoritme. Voor elke toonhoogte zal dan ook een ander sample berekend worden.

Ook voor samples van akoestische oorsprong is het principieel beter uit te gaan van een verzameling samples even groot als het aantal toonhoogtes waarvoor we de samples willen gebruiken.

Willen we toch, uitgaand van een sample een reeks andere samples met verschillende toonhoogte ter beschikking krijgen, dan bestaat de beste metode erin, de afgeleide en in toonhoogte verschillende samples te berekenen door een techniek van interpolatie. Een eenvoudig voorbeeld moge dit verduidelijken:

Veronderstel een sample array Samp(i) bestaande uit 64 samples (i=1 tot 64)

Stel de oorspronkelijke sample-rate was 44.1ks/s, en dus de tijdsduur van 1 enkel sample (de sample-periode) Period = 1/44.100 = 22.67 microsekonden.

De [-pseudo-]kode voor de normale weergave zou eruit zien alsvolgt:

FOR i=1 TO 64

t= TIMER + Period

Uit DAC_adres, Samp(i)

DO: LOOP UNTIL TIMER>= t

NEXT i

Een oktavering naar omlaag, gebruik makend van (lineaire) interpollatie, verloopt dan alsvolgt:

FOR i=1 TO 63

t= TIMER + Period

Uit DAC_adres, Samp(i)

DO: LOOP UNTIL TIMER >=t

t= TIMER + Period

Uit DAC_adres, (Samp(i) + Samp(i+1))/2

DO: LOOP UNTIL TIMER >=t

NEXT i

Merk op dat de weergave sampling rate nu konstant blijft!

Vergelijk dit met de eerstgenoemde primitieve wijze van transponeren in pseudokode verlopend alsvolgt:

FOR i= 1 TO 64

t= TIMER + Period * 2

Uit DAC_adres, Samp(i)

DO: LOOP UNTIL TIMER>= t

NEXT i

Het zal duidelijk zijn dat de interpollatietechniek het eenvoudigst te implementeren valt wanneer de vereiste transpositie kan worden uitgedrukt als een verhouding van gehele getallen. Voor kromatische transposities in een getemperde stemming moet worden uitgegaan van twaalfde machtswortel uit twee verhoudingen, wat heel wat rekenkracht vergt van de komputer. Het gebruik van look-up tables is hier dan ook de gangbare techniek.

Via de hier vereenvoudigd weergegeven interpollatietechniek wordt het mogelijk een klank te transponeren met behoud van zijn oorspronkelijke tijdsduur. De techniek vindt ook ruim toepassing in de digitale opnametechniek, o.m. voor toonhoogtekorrektie van opnamemateriaal.

Maar, ook met de meest gesofistikeerde wiskundige interpollatietechnieken, kunnen we zelden verder gaan dan 1 oktaaf omlaag of omhoog zonder te stoten op een sterke afwijking van het oorspronkelijke timbre. Kleine verschuivingen zijn anderzijds op het oor nauwelijks waarneembaar als elektronische ingrepen op het signaal.

Gesofistikeerde technieken van pitch-transpositie zouden o.m. kunnen gebruik maken van een formant-analyze van het oorspronkelijke signaal (sample) waarbij na de transpositie, de oorspronkelijke formant via een (programmeerbaar) parametrisch filter terug aan het resultaat wordt toegevoegd. Daarvoor is echter enig inzicht in spektraalanalyze en digitale filters noodzakelijk (zie verder). De hier aangereikte metodes beperken zich tot transposities berekend in het tijdsdomein. Meer gesoftikeerde metodes voeren de transpositie uit in het frekwentiedomein. Daarbij worden de sample data (tijdsdomein) via een Fourier transformatie omgezet naar een spektrum dat vervolgens wordt verschoven op een veelal inhoudsafhankelijke wijze (bvb. de grondfrekwentieband kan worden verschoven terwijl de formant behouden blijft...) . Dit verschoven spektrum wordt dan opnieuw -via een ongekeerde Fourier transformatie- omgezet naar het tijddomein en dus naar sample-data.

Er bestaat geen universele algoritmische metode om toonhoogte transformatie of het hierna aangeraakte time-stretching, te implementeren. Het te gebruiken algoritme hangt volledig af van wat je wil bereiken, over welke rekenkracht je beschikt ...

TIME-Stretching:

De komplementaire bewerking van pitch-transposition staat bekend als time-stretching. Hierbij wordt de toonhoogte van het oorspronkelijk signaal behouden, maar kan de door ervan worden gewijzigd.

Ook hiervoor bestaat een welhaast triviale en voor de hand liggende techniek: looping. De pseudokode ervoor kan eruit zien alsvolgt:

FOR i=1 TO 64

duur= TIMER + Tim

DO

t= TIMER + Period

Uit DAC_adres, Samp(i)

DO: LOOP UNTIL TIMER>= t

LOOP UNTIL TIMER >= duur

NEXT i

Willen we een sample inkorten dan kunnen we het ofwel gedeeltelijkafspelen (FOR i= 1 TO 32 bvb) , ofwel het sample slechts om de n stapjes afspelen.

De eenvoud is ook hier weer de zowat enige kwaliteit die deze techniek siert. Immers looping over het gehele sample hier voor het gros van het klankmateriaal voor gevolg dat er ofwel een 'ritmische' puls ontstaat, ofwel een toegevoegde toon, als naargelang de eigenschappen van het sample en die van de mate waarin we willen stretchen.

Net zoals bij toonhoogtetranspositie zullen we ook hier weer beroep moeten doen op interpollatietechnieken, waarmee 'ontbrekende' samples kunnen worden berekend en tussengevoegd. Ook hier zullen meer gesofistikeerde algoritmen gebruik maken van manipulatie in het frekwentiedomein.

Naar <Digitale Filters>

Geinteresseerde studenten in deze materie raden we aan via internet de even interessante als intense technische diskussies over DSP algoritmiek in digitale audio te volgen. Enkele goede vertrekpunten voor een search biedt de hier overgenomen web-pagina: http://www.prosoniq.com/time-pitch-faq.html (Deze link is geldig vanaf 04.03.1998. De links in de overgenomen pagina die volgt werden (nog) niet opnieuw nagezien).

Time/Pitch Scaling FAQ/References Page

written by Stephan M. Sprenger

Introduction

As opposed to the process of pitch transposition achieved using a simple sample rate conversion, Pitch Scaling is a way to change the pitch of a signal without changing its length. In practical applications, this is achieved by changing the length of a sound using one of the below methods and then performing a sample rate conversion to change the pitch.

There are several fairly good methods to do time/pitch scaling but none of them will perform well on all different kinds of signals and for any desired amount of scaling. Good algorithms allow pitch shifting up to 5 semitones on average or stretching the length by 130%. When time/pitch scaling single instrument recordings you might even be able to achieve a 200% time scaling with no audible loss in quality.

Techniques Used

Currently, there are two different time/pitch scaling schemes employed in most of today's applications:

Phase Vocoder. This method was introduced by Flanagan and Golden in 1966. It uses a Short Time Fourier Transform to convert the audio signal to the complex Fourier representation. Since the FFT returns the frequency domain representation of the signal at a fixed frequency grid, the actual frequencies of the partial frequencies have to be found by converting the relative phase change between two FFT outputs to actual frequency changes (note the term 'partial' has nothing to do with the signal harmonics. In fact, a FFT will never give any information about true harmonics if you are not matching the FFT length the fundamental frequency of the signal - and even then is the frequency domain resolution quite different to what our ear and auditory system perceives). The timebase of the signal is changed by interpolating and calculating the new frequency changes in the Fourier domain on a different time basis, and then a IFFT is done to regain the time domain representation of the signal.

Pointers:

Jean Baptiste Joseph Fourier bio

http://capella.dur.ac.uk/doug/fourier.html

Discrete time FT basics

http://cnmat.CNMAT.Berkeley.EDU/~alan/MS-html/MSv2_ToC.html

Phase vocoder algorithms are used mainly in scientific and educational software products (to show the use and limitations of the FFT). They have severe drawbacks and introduce a considerable amount of artifacts (even at low expansion ratios) due to the interpolation that must be used to change the timebase.

Related topics
There often is a certain confusion between a 'regular' (channel) and the phase vocoder. Both of them are different in that they are used to achieve different effects. The channel vocoder also uses two input signals to produce a single output channel while the phase vocoder has a one-in, one-out signal path. In the channel vocoder as applied to music processing, the modulator input signal is split into different filter bands whose amplitudes are modulating the (usually) corresponding filter bands splitting the carrier signal. More sophisticated (and expensive) approaches also separate voiced and unvoiced components in the modulator (or, for historical reasons 'speech') input, i.e. vowels and sibilancies, for independent processing.The channel vocoder can not be successfully applied to the time/pitch scaling problem, in musical context it mainly is a device for analyzing and imposing formant frequencies from one sound on another. Both are similar in that they use filter banks (the FFT can be seen as a filter bank consisting of steep and slightly overlapping filters) but a maximum of 22 are typical for channel vocoders while a phase vocoder usually employs a minimum of 512 or 1024 filter bands.

Pointers:

The MIT Lab Phase Vocoder

http://mars.let.uva.nl/gather/accci/60/60-index.txt.html (Manual)
http://mars.let.uva.nl/gather/accci/60/60-read.txt.html (References to more literature)

Pointers:

SMS sound processing package (incl. C source code)

http://www.iua.upf.es/eng/recerca/mit/sms/

Lemur (Mac program along with references and documentation)

http://datura.cerl.uiuc.edu/Kelly/ICMC95/TimbreManipulationTool.html

However, in today's commercial music/audio DSP software you will most likely find the technique of

Time Domain Harmonic Scaling. This is based on a method proposed by Rabiner and Schafer in 1978. The Short Time Autocorrelation of the signal is taken and the fundamental frequency is found by picking the maximum (alternatively, one can use the Short Time Average Magnitude Difference function and find the minimum, which is faster on an average CISC based computer systems but usually yields lower audio quality). The timebase is changed by copying the input to the output in an overlap-and-add manner while simultaneously incrementing the input pointer by the overlap-size minus a multiple of the fundamental frequency. This results in the input being traversed at a different speed than the original data was recorded at while aligning to the basic period estimated by the above method. This algorithm works well with signals having a prominent basic frequency and can be used with all kinds of signals consisting of a single signal source. If it comes to mixed-source signals, this method will produce satisfactory results only if the size of the overlapping segments is increased to include a multiple of cycles thus averaging the phase error over a longer segment making it less audible.
For Time Domain Harmonic Scaling the basic problem is estimating the basic pitch period of the signal, especially in cases where the actual fundamental frequency is missing. Numerous pitch estimation algorithms have been proposed and can be found in the following references:

With C source code

- 'C Algorithms for Realtime DSP' by Paul M. Embree, Prentice Hall, 1995
- 'Numerical Recipes in C' by W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Cambridge University Press, 1988/92 (click title to read it online)

Without source code

- 'Digital Processing of Speech Signals' by L.R. Rabiner and R.W.Schafer, Prentice Hall, 1978

There are other methods (including some wavelet based timebase conversion algorithms) not mentioned here but they are currently of minor importance in practical applications since they mostly are still under development. For a free PowerMacintosh time/pitch scaling developer object library visit our InTimePro page.

Timbre Correction

Since timbre (formant) correction is actually a pitch scaling related topic, it will also be discussed here. Formants are prominent frequency regions produced by the resonances in the instrument's body that very much determine the timbre of a sound. For human voice, they come from the resonances and cancellations of the vocal tract, contributing to the specific characteristics of a speaker's and singer's voice.

For more details, visit our Formant Correction page.

If the pitch of a recording is shifted, formants will be moved thus producing the well known 'Mickey-Mouse' effect audible when upshifting pitch. This is usually an unwanted side effect since the formants of a human singing at a higher pitch do not change their position.

To compensate for this, there exist formant correction algorithms that restore the position of the formant frequencies after or during the pitch scaling process. They also allow changing the gender of a singer by scaling formants without changing pitch.
Some of them work on the basis of a method detailed in
'A Detailed Analysis of a Time-Domain Formant-Corrected Pitch-Shifting Algorithm', by Robert Bristow-Johnson, Journal of the Audio Engineering Society, May 1995.

This paper is soon to appear online (at least it is announced), click the title to go there. Note that most of these algorithms only allow for correcting timbre of musically monophonic material. A method for timbre manipulation on polyphonic material can be found in the Emagic Time Machine II.

Other methods have been proposed in

http://ccrma-www.stanford.edu/CCRMA/Overview/node34.html

or can be achieved using the SMS package found at

http://www.iua.upf.es/eng/recerca/mit/sms/

Commercial software applications of these or related algorithms can be found at

http://www.motu.com (product:Digital Performer 1.7)
http://www.emagicusa.com (product:Notator Logic Audio 2.6)
http://www.emagic.de (product:Notator Logic Audio 2.6)
http://www.prosoniq.com (product:sonicWORX Artist and Studio)
http://www.steinberg.de (product:TimeBandit 2.5)
http://www.wavemechanics.com (product:PurePitch TDM PlugIn)

The following newsgroups can be acessed for more information and help on the time/pitch scaling topic.

comp.dsp
comp.music.research

Filedate: 960728/971129/1998-09-07

Terug naar inhoudstafel kursus: <Index Kursus>	Naar homepage dr.Godfried-Willem RAES	Terug naar <Klanksyntheze>
		Naar <Digitale Filters>