A Lot Of People Try To Explain This Without Knowing Anything About How Voice Synthesis Works, So Here's

A long infographic with visual aids starting with the conversation:
"Is Miku AI?"
"No."
"Are vocal synths ethical?"
"Yes."
"How so?"
First section is Compensation.
Hatsune Miku is made out of recordings by Saki Fujita. Saki Fujita is contracted to record Miku samples, and is paid for her work.
Section: Recording Method. 
This is why Miku is not AI: Saki Fujita records from a list of sounds. It's necessary to have at least one recording per sound Miku should be able to sing. (Visual aid has examples of these sounds, such as "kaka".)
She can also sing the recording list a second time in a different octave, so that she sounds more natural. 
Section: Labelling. 
The samples Saki Fujita sung are then labelled with what sound they make. These sounds are then reproduced by the engine. This is how Vocal Synth software such as VOCALOID and UTAU work. This model is called "concatenative". (Visual aid shows how "kaka" is split into "k" and "a", which is how it looks in the VOCALOID software.)
Section: User interfacing.
These voicebanks are very flat. Users must adjust the vocals themselves in order to produce singing. This is referred to as "tuning". If you listen to "Tuning BLANK in the style of Vocaloid producers", you can see there are countless ways to tune Hatsune Miku. It is considered a form of artistic expression. 
Compare Scratchin' Melodii's original songs to the updated versions. This is the result of hiring an experienced Vocaloid tuner.
Question: How do AI Vocal Synths work?
Answer: They are actually extremely similar!
Section: Compensation.
Let's use the Synthesizer V Studio library "Solaria". Solaria is made out of recordings by Emma Rowley. Emma Rowley is contracted to record Solaria samples, and paid for her work.
Section: Recording.
Emma Rowley then records several hours of singing data. This is the substance of the library.
Section: Base model.
The AI needs a base to understand what it's interpreting. Unlike images, there is a large amount of volunteer voice data out there. It's typically assumed that base models are trained ethically. (Visual aid shows Dreamtonics, the developer company behind Synthesizer V, asking a university "Can I use this voice data you made for TTS research?" and observing a person saying "Hi! Here is a few hours of singing data you can use for voice technology.")
Section: Labelling.
Labelling is also the same. The singing is broken up into phonemes the engine will interpret. 
Header Section: Deep Learning.
In casual speech, "AI" refers to computer learning/sorting algorithms. "Diffusion" AI is the result of DNN; Deep Neural Network. It is the most drastic difference between concatenative and AI voicebanks.
Section: Teaching the base model.
The computer must be taught what the sounds are. The concept it builds is the "base model". (Visual guide is a cartoon of two computers talking. "Here's a british man saying 'bath'." "Added to my concept of 'a'." "Here's a Japanese girl saying 'baka'." "Added to my concept of 'a'.")
Section: Training the voice model.
Emma Rowley's recordings are then made into a reference point. This will make it so it will only render based on what it knows about Emma Rowley's singing. (Visual aid is a similar cartoon where a person talks to a computer while giving it a drive. Computer: "Now that I know what 'a' is, how should it sound?" Person: "I've labelled every time Emma Rowley says 'a'. Use this!")
Section: Diffusion.
The Solaria model uses everything it learned from Emma Rowley's recordings and the base mdoel to determine how 'a' sounds based on what note it's sung on, what's next to it, etcetera.
Section: Interfacing.
Tuners have been mixed on this; it sounds much clearer, yet the AI also has voice pitch models, so there's not as much as an incentive to develop your own personal flair.
Question: Are voice changers ethical?
Answer: Oh geez.
Section: ARE they ethical?
We don't need to break this down a third time. Voice changers are the generative AI of voice synthesis. It requires a lot less work of both the developer and the user, a simple applicator of everything the machine knows onto a piece of audio. What are the ranges of ethics?
Vocaloid 6 is packaged with a voice changer. It is only for AI libraries, voiced by people who agreed to this and were compensated. This is definitely ethical.
If you bought Hatsune Miku, you're nominally permitted to use the results as you see fit. Is tuning Miku and then creating a voice changer of her singing ethical? I genuinely don't know.
There's also a question of art. If you were to project the voice actor onto your own personal tuning work, isn't that still artistic expression? A voice is different from an art style. Where is human expression being interrupted by automation? I can't make an explainer for those subjective concepts.
I hope you're now educated enough to think on it yourself. End of image transcription.

A lot of people try to explain this without knowing anything about how voice synthesis works, so here's my breakdown on No, Hatsune Miku Is Not AI, And No, AI Voice Synthesis Is Not Bad.

More Posts from Skelerose and Others

11 months ago
This “everyone is using it” “resistance is futile” advertising that Ai companies are doing is because:

1) No one’s really buying Ai and they are PANICKING (😂)

2) They don’t know how else to advertise it because it doesn’t provide value https://t.co/uXTXQT4IEn

— Lucas Brown Eyes (@LucasBrownEyes) May 26, 2024

skelerose - Angel

Related

skelerose - Angel
1 year ago
Reblog In 5 Seconds For Good Luck
Reblog In 5 Seconds For Good Luck

Reblog In 5 seconds for good luck

1 year ago
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions
Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions

Studio: MADHOUSE – Birdy The Mighty (1996) Birdy’s Facial Expressions

5 months ago
Playing Around With Lines Today

Playing around with lines today


Tags
11 months ago
By Kenva
By Kenva
By Kenva
By Kenva
By Kenva
By Kenva
By Kenva
By Kenva
By Kenva
By Kenva

By kenva

9 months ago
Still Working On Getting Back Into Drawing. It's Been A Very Long Time.

Still working on getting back into drawing. It's been a very long time.


Tags
1 year ago
Some Memories Never Fade Even You Drink The Night Away 🥃✨🏙️

Some memories never fade even you drink the night away 🥃✨🏙️

  • thebsdude
    thebsdude reblogged this · 2 weeks ago
  • black-market-wd4o
    black-market-wd4o liked this · 2 weeks ago
  • zer0-g
    zer0-g reblogged this · 2 weeks ago
  • zer0-g
    zer0-g liked this · 2 weeks ago
  • madonnawhorecom
    madonnawhorecom reblogged this · 2 weeks ago
  • biinarysttars
    biinarysttars reblogged this · 2 weeks ago
  • toxicwancho
    toxicwancho reblogged this · 2 weeks ago
  • circeofjagd
    circeofjagd reblogged this · 2 weeks ago
  • quinnactually
    quinnactually reblogged this · 2 weeks ago
  • xhrystal-vampire
    xhrystal-vampire reblogged this · 2 weeks ago
  • xhrystal-vampire
    xhrystal-vampire liked this · 2 weeks ago
  • perf3ct-everything
    perf3ct-everything liked this · 2 weeks ago
  • spatef
    spatef reblogged this · 2 weeks ago
  • spatef
    spatef liked this · 2 weeks ago
  • lilou-the-world-builder
    lilou-the-world-builder reblogged this · 2 weeks ago
  • lilou-the-world-builder
    lilou-the-world-builder liked this · 2 weeks ago
  • portmanteaublerone
    portmanteaublerone reblogged this · 2 weeks ago
  • blinkbats
    blinkbats liked this · 2 weeks ago
  • thesiiiy
    thesiiiy liked this · 3 weeks ago
  • soul0femptiness
    soul0femptiness liked this · 3 weeks ago
  • eebydweeby
    eebydweeby liked this · 3 weeks ago
  • bagelgoirl
    bagelgoirl reblogged this · 3 weeks ago
  • bagelgoirl
    bagelgoirl liked this · 3 weeks ago
  • saintivey
    saintivey liked this · 3 weeks ago
  • kiwigii
    kiwigii reblogged this · 3 weeks ago
  • kiwigii
    kiwigii liked this · 3 weeks ago
  • astroelle
    astroelle reblogged this · 3 weeks ago
  • himshark
    himshark reblogged this · 3 weeks ago
  • laudogen
    laudogen reblogged this · 3 weeks ago
  • imsavednow
    imsavednow reblogged this · 3 weeks ago
  • woomysans
    woomysans liked this · 3 weeks ago
  • boosttt
    boosttt reblogged this · 3 weeks ago
  • neowanderseternally
    neowanderseternally liked this · 3 weeks ago
  • ehehegirl
    ehehegirl liked this · 3 weeks ago
  • ihavenopeopleskills
    ihavenopeopleskills liked this · 3 weeks ago
  • ghostcat404
    ghostcat404 reblogged this · 3 weeks ago
  • toastghost522
    toastghost522 liked this · 3 weeks ago
  • little-old-timey
    little-old-timey reblogged this · 3 weeks ago
  • little-old-timey
    little-old-timey liked this · 3 weeks ago
  • yumeme04
    yumeme04 liked this · 3 weeks ago
  • amia-after-dark
    amia-after-dark reblogged this · 3 weeks ago
  • amia-after-dark
    amia-after-dark liked this · 3 weeks ago
  • definitelymaybelilac
    definitelymaybelilac liked this · 3 weeks ago
  • selkiesmile
    selkiesmile liked this · 3 weeks ago
  • thecutesyfishtank
    thecutesyfishtank liked this · 3 weeks ago
  • yunatumbling
    yunatumbling reblogged this · 3 weeks ago
  • yunatumbling
    yunatumbling liked this · 3 weeks ago
  • whimzyhead
    whimzyhead liked this · 3 weeks ago
skelerose - Angel
Angel

28 | she/they | artist

204 posts

Explore Tumblr Blog
Search Through Tumblr Tags