Jasmine Music Technology - JMT Tips for using Yamaha Vocaloid

yv enhancer demos

some thoughts before testing vocaloid,
12.10.2003

yv enhancer questionnaire

basic scientific and methodological provisions

jmt tips for using yamaha vocaloid

yv enhancer 1.0

related links

JMT Tips for using Yamaha Vocaloid

1. These notes were made on the basis of experiments only with LEON library. However, we assume that they are also true for LOLA è MIRIAM, since they deal with the synthesis itself.

2. These observations first of all concern acoustic parameters in synthesis control and do not deal with phonetic or linguistic peculiarities of Vocaloid.

3. Yamaha Vocaloid is in fact a special synthesizer with much more complicated control parameters, different from these of ordinary synthesizers. Vocaloid parameters can be regarded as purely acoustic and are not meant to be user friendly. So by increasing Amplitude of Resonance at will you will get output overload while that could be avoided if a change in one parameter was automatically compensated by the change in another. Since all Vocaloid parameters are closely interconnected we do not give exact parameter values necessary to achieve certain effect, we only point in the direction for their change.

How can I export Vocaloid User Dictionary that I have created to another PC (e.g. to share it with a friend), or import someone's user dictionary to my PC?
User created dictionary is saved as a <file name>.udc file in the folder UDIC in VOCALOID directory of your PROGRAM FILES. You just have to copy this file to any storage device such as flash memory (the file size is relatively small so you can even email it) and then paste it to the corresponding UDIC folder on another PC. Note that UDIC folder is normally hidden, so to see it you have to choose option SHOW HIDDEN FILES.

Why does the result have little in common with live singing?
Indeed, unless you add Vibrato, Attack and other "expressive elements", Vocaloid sounds very far from natural. In fact, without introducing "expressive elements" it only provides "correct static synthesis of vocal speech", i.e. correct melody and correct speech. However:
1) No vocalist in the world can produce a static sound. An amateur singer will have plenty of instabilities and distortions in pitch, volume and spectrum. A professional will also have a lot of instabilities, but they will be the aforementioned "expressive elements", which "correct static synthesis of vocal speech" lacks.
2) Every professional singer and even an amateur vocalist possesses his or her own unique performing nuances. Acoustic parameters-wise, they are coordinated changes in Pitch, Spectrum, Volume accents and breathing, i.e., it is not a synthesis of a static sound. Besides, very often these changes are insignificant and remain unnoticed by an unskilled ear. However, as in cooking, a tiny amount of spice may go unnoticed but without it the food seems less delicious.
3) "Expressive elements" are more important for "creating" a performer than his initial samples/spectrum. We would never recognize Tom Jones, Mariah Cary, Sting or Celine Dion, should their voices be deprived of the "expressive elements" while static samples/spectrum remained intact. Generally speaking, the result you get in Vocaloid without Vibrato, Attack and other additions is the illustrative example of this.

What does "Resonances" mean?
With the reproduction (synthesis) of different vowels human vocal cords produce perfectly identical oscillations, i.e. the waves for A, E, I, O, U are the same but due to the filtration of these waves by speech resonators the vowels get their characteristic nuance. We may consider the vowels in Vocaloid as waves already subjected to this filtration and Resonances as additional filters, allowing to correct the sounding nuance. Vocaloid has four such filters (speech resonators) with frequencies from about 350 to 3500 Hz. The first one is the lowest and the fourth one is the highest. Every filter has three parameters: Frequency, Width and Amplitude. Those familiar with classic Synthesizers like Moog, can imagine instead of one reconfigurable filter (so called Moog-filter or Bandpass filter) having four reconfigurable filters. Increasing the Frequency value leads to the rise of frequency and accordingly decreasing this value leads to its reduction.
Increasing the Width value leads to the broadening of bandwidth (i. e. more harmonies will get in the active zone of the filter) and accordingly, decreasing it will lead to the contraction of bandwidth.
Increasing the Amplitude value allows more harmonies to get in the active zone of the filter, and accordingly, decreasing it will reduce the number of harmonies.
By changing Resonances parameters you can sufficiently change the character of sounding down to achieving special effects such as "Wau-Wau", "Robot Voice", "Throat Singing" and so forth.

How can I get a sounding nuance, characteristic of professional academic singers, i.e. "pressure singing"?
Simply put, we have two fundamentally different ways of singing:
a) strong pressure singing
b) breathy singing (with weak pressure).
Acoustics-wise, "strong pressure" shows in the predominance of "high singing formant" of about 2500 Hz. In fact, academic manner of singing leads to the distortion of vowels when compared to ordinary speech, i.e. to get such effect in Vocaloid you have to increase the value of Amplitude of Resonances 3 è 4, simultaneously decreasing the value of Width for these Resonances. Note that intelligibility here will decrease, as it happens in real academic singing. Also take into account that the same presets do not give satisfactory result for different ranges. Accordingly, they must be corrected for medium-high and low registers. (You can try the settings used for our example "The Phantom of the Opera" at the link http://www.jasminemusic.com/vocaloid/06-02-2004.htm).

How can I get a sounding nuance, characteristic of Pop and Soul singers, i.e. "breathy singing"?
This effect is more difficult to achieve with LEON library, though perhaps the initial material was not recorded in "strong pressure singing" manner. You have to decrease the value of Amplitude of Resonances 3 è 4, simultaneously increasing the value of Width (which will lead to expanding the band) for these very Resonances. You should also increase the value of Noise and Gender Factor. Also take into account that the same presets do not give satisfactory result for different ranges. Accordingly, they must be corrected for medium-high and low registers. (You can try the settings used for our example "Touch Me Lola" at the link http://www.jasminemusic.com/vocaloid/05-31-2004.htm ).

How can I make Vocaloid sing a vowel in one-syllable words ending in a consonant like MAN or SUN on several notes?
Write the word on the first note and add a hyphen to the end of the word. With every proceeding note write a hyphen (-) above it then on the last note in the melisma write a forward slash (/) above it. For example "sun" would be written sun- - - - / Alternatively, for the words that are not in Vocaloid's dictionary, you can enter phonemes with [sV] on the first syllable, [V] repeated on all the following syllables and [Vn] on the closing syllable.
You can also solve this task using Pitch drawing, however in this case the sound will acquire certain spectral distortions similar to the ones introduced to the real voice by Autotune Tools.

Can I make Vocaloid sing in a language other than English or Japanese?
That is possible, at least for European languages; however, due to the difference in phonemes of different languages you will unavoidably get foreign accent, the more phonemic difference the stronger an accent. The end result will remind a native song performed by a foreign artist not actually understanding the lyrics he or she sings, as it sometimes happens. To do that you will need to enter each phoneme manually choosing from Vocaloid Phoneme Editor the one that sounds closest to your language phoneme. Better try to avoid entering similar sounding English words in the Lyrics view (e.g. tall car instead of the Russian word tolko (just)) as this will only increase the accent. That will require a lot of experimenting, and often compromise, but the result can be rewarding though sometimes you will need a sense of humor to fully appreciate it. You can listen to Vocaloid singing in Russian here ( http://www.jasminemusic.com/vocaloid/09-07-2004.htm )

Why is volume more affected by Harmonics than by spectral frequency contents?
Indeed, changes in Harmonics lead to proportional changes in Volume, and this parameter can be used to change Volume. To change spectral frequency contents you have to, simultaneously with increasing/decreasing harmonics, increase /decrease the value of amplitude of Resonances.

How can I increase intelligibility?
In general, the intelligibility depends on Volume balance between consonants and vowels. To shift this balance towards consonants you can increase the value of Noise parameter in the beginning and in the end of a syllable (a note), where the consonants are usually situated, and also decrease the value of Amplitude of Resonances in the same positions.

Why are Pitch changes in Vocaloid not as noticeable as in ordinary synthesizer?
The point is that Pitch parameter as a tone changing instrument can influence the sound that has clearly distinguishable pitch, i.e. the vowels in case with Vocaloid. Pitch influences consonants at much less extent. And since most syllables include both vowels and consonants, Pitch doesn't affect the consonants zone. So, consonants conceal the influence of Pitch.

Why do short notes sound in staccato manner?
Normally, the tempo of speech (and singing) doesn't affect the length of consonants so strongly as it does with vowels, i.e. in a short syllable consonants will sound relatively longer than in the same syllable, only sung on a longer note. By creating a succession of short notes in a vocal, you leave little space for vowels which are essentially perceived as sung notes. That is why between these consonants (at the interfaces between notes) there are places with no tone, which creates staccato-like effect.
By the way, it seems that Vocaloid has some more serious restrictions there, for example, if you try to get four 1/16 notes with word STRANGE (on legato in a raw) in Tempo=80, it will sing only the first and the second of them sounding quite correctly, but the 3rd and the 4th notes will not sound at all.

Which acoustic parameters are changed with adding "Vibrato"?
"Vibrato" in Vocaloid is realized as a "complex object" which includes periodical modulation not only in frequency, but in Volume and Spectrum as well. In that sense Vocaloid Vibrato is closer to a real vibrato of a live vocalist, than to Modulation parameter of ordinary synthesizers, when you choose Pitch Modulation, Amplitude Modulation or Spectrum Filter Frequency Modulation.

Which acoustic parameters are changed with adding "Attack"?
Attack" in Vocaloid is realized as a "complex object" which includes non-periodical modulation not only in frequency but in Volume and Spectrum as well, which is close to real accents of a live vocalist defining stressed sounds, as well as characteristic techniques, such as à microapproach to a note or its melisma (mordent etc.).