Home | audiopub

new tts just dropped (just kidding)

Download

Played 115 times

Uploaded by: michaelos

Upload date: 2/17/2025

Description:

michaelOs tts... Well, this is a proof concept. This was created with the help of my voice and earcons and speech rules.
the only caveat here and huge drawback is hat to make it pronounce things in english, you cannot do it normally. What you can do is instead is write it phoneticaly to be read phonetically and pronounced like so by the synthesizer. As this is a proof of concept, it only can speak spanish, and even then, there are things to be changed when wanting to pronounce it correctly.
the speech might be unintelligible
For example, que and qui. those would be pronounced "kay" and "kee" but if you input them like they are, they will be pronunced something like "que and qui" if you have an english speech synthesizer then you'll notice how they would be pronounced. Same happens with gue and gui, those special combinations of letters which change the pronounciation of the g
same happens with c, this is why you'll not hear the synthesizer proof of concept pronouncing audacity as ah oo dah see tee but instead will hear it pronouncing it like a oo dah kee tee, so there's that.
There's another drawback, and it's that it sometimes won't shut up with control. That's a problem, yes it is, but this applies only to certain situations.
and here are the steps on making it
1: create a portable copy of nvda with earcons and speech rules. If you have it already, you an create a portable copy from your current nvda installation with earcons and speech rules.
2: record your voice. At least 26 clips for each letter and vowel, from a to z. A, e, i, o and u should be pronounced similar to ah, a, ee, o and oo. The other letters can be pronounced phonetically like how you'd normally pronounce them
I haven't yet discovered a way to make numbers unless I prerecorded my own and made those, so for now, that's how it might be
3: make a new folder inside user config\addons\phoneticPunctuation\sounds\. It should be named character. In here, export all the audio files you created of your prerecorded voice uttering the vowels and characters, each one with the letter's name.
Make sure to edit and cut. If you've recorded a whole audio of you uttering them, from a to z, then you should cut and split into different tracks. Or you can go the longer route of making separate tracks. Or if you don't have something that allows for multitrack editing then you can record one, export, record another, export. It depends on your workflow and editor preference
4: once that is done...
replace letters
go to nvda settings, earcons and speech rules. In here, press on the button to add a rule. Your first letter will be an a. Put that into the pattern box, go to the category combo box, and press c to get to characters. Immediately press tab to head yourself to the next combo box to select the wav file, and select the file of yourself uttering the vowel a
do this with the subsequent letters replacing as you go patterns with the audio, til you get to z. From there, go back to a, but this time, caps. To z again
5: extras
you can also make a silence track, export that, and use that as punctuation. In my case I just manually inserted silences
this is a very, very rudimentary of doign this sutff. If you want higher quality, you could likely make a long audio of your voice and somehow make a piper tts version of your voice, of which you're left on your own because I've never done it, so yeah. i don't know any other alternatives, especially local ones. Local voice cloning likely requires a good gpu, the voice cloning thing could be done with rvc I believe, I don't exactly know, or eleven labs if you're into that and prefer online voice cloning.

Comments

patricus

that sounds horrible!
also that's not English tts roflflflflf

jim_pickens

so, when does the nvda addon drop? Sounds like it'd be a good eloquence replacement, certainly sounds better than eloquence anyway.

MichaelOs

I like the irony in one of the comments. Well, addressing that.
the synthesizer is bad because it's a proof of concept, and a proof of concept is just the beginning. For now I cannot make it speak english in another way hat is not phonetically, as pronouncing it as if it's written is not easy to understand. Second, I don't have a gpu to run rvc. Third, I don't even know how to use and run rvc, at  least for now.
if it sounds horrible it's because i replaced all the letters of the alphabet including accented letters and ñ with recorded phonemes of my voice. It's only meant to be an experiment. A proof of concept. I used earcons and speech rules, also known as phonetic punctuation. If I could code in python and make a sort of synthesizer which has folders each one with different characters then I could create multiple voices, but for now, I can only change characters to other spliced prerecorded voices with another portable copy of nvda.
anyway, thx. Mwahaha

danestange

use chatgpt and make this thing awesome!! This is a very cool concept and with work it could be great! 

danestange

if you release the code or speech samples with a readme, maybe I can play with it and make a version and share it with you. 

MichaelOs

perhaps I might release the speech samples, because that's what I used. I'd have to check the files. And about chatGPT...
perhaps I might try to see if it will generate some python script for a ssynthesizer, but I don't exactly trust this stuff, as I don't know what I'd be getting into. The only thing I can understand to a degree is html, and even then, I get confused around things like borders and button labels. In the descriptions there''s a step by step to making your own "speech synthesizer" with the earcons and speech rules nvda addon. The step by step is similar to making entries for a dictionary.
earcons and speech rules apparently supports the creation of folders, so this is how I got this working. Putting the characters folder into the sound folders of the portable I made. Perhaps I might be able to upload the nvda portable with the whole thing but I don't promise it will be good.

MichaelOs

I don't promise I might do all of that, as I am often forgetful and sometimes distracted by stuff, though.

patricus

also, what language are the samples?

MichaelOs

it's more like phonetic. One can make it speak any language that has the vowels included. It cannot speak english perfectly, of course not, but at least it can provided you spell thigns the right way to be poronounced the right way. Perhaps with more phonemes I might be able to make it speak some other languages that require extra hponemes. It can speak english with a very horrible accent, and spanish, though the new version is harder to understand due to me decreasing the vowel lengths

patricus

I meant, that didn't sounded like an English tts, phonemes sounded like some non English stuff.

MichaelOs

it isn't meant for english, as english is not phonetically spelled. To make it speak a very horrible english I had to spell the whole thing phonetically else it wouldn't. It says thigns like they're spelled, and it was originally meant for spanish as it's pretty phonetic except for some other things, so yeah

patricus

rofl, the same as Polish, really phonetical, Polish is phonetical 99% of the time.

MichaelOs

right. Vaguely reminds me of gregor except gregor sounded even worse, at least in my opinion, though this one seems to use similar techniques as gregor to produce rudimentary tts.

MichaelOs

transcription of the audio (quotes means there's a grammatical error and also it occasionally means false statements):
speech mode talk
blank
this is an ew synthesizer
blank
espaci...
blank
espacio (space)
this is a new synthesizer but it only supports spanish for now
show hidden icons button
audacity
audacity
blank
let's read a text in spanish to show you
the power of this synthesizer
show hidden...
documents
documents file explorer
books
AI gener...
old spanish documents
AI generated local content
bo...
Unti... Auda...
I have to modify the document to "read it correctly" so apologies, and to read in english you have to "literally" write it phonetically.
Warning the following document is in spanish, so only proceed if you understand spanish. Take note the spanish here is a horrible attempt at using the vosotros forms and I "made it" when I was a little kid so excuse me for "the very cringy things"
without further ado here we go
translation of spanish text:
how to make industrial potato chips
hi folks. Today I'll teach y'all how to make industrial potato chips, so buckle your belts, that this recipe will leave your jaws dropped.
first, you peel the shells from the potatoes, "dredging them in flour and egg" after you're done with them so that when they're fried they look so amazing and crunchy that you sh*t yourself.
y'all gotta fry them in a frying pan while "stirring the flour and the egg mixture"
you have to use a machine to leave them tatters thin with the final touch
"dredge with a bit of salt and stir well"
oh, how cool it came out!
finish y'all's industrial potato chips.
share em with your family
bruh, see how they ended out? they're cool, right? my mouth is watering!
Original spanish text:
cómo hacer patatas fritas de fábrica
hola chavales, hoy, os enseñaré como hacer patatas fritas de fábrica. Así que abróchense los cinturones, que esta receta los dejará con la boca abierta
primero, os sacaros las cascaras de las patatas y poneros huevo y harina para que tengan un buen aspecto chulo y crugiente que te cagas. os tenéis que freírlas  en una sartén y os revolveros el huevo con la harina
para que queden tan chulas como las de fábrica, tenéis que usar una máquina para dejar finas las patacas con el toque final
echaros sal y os revolveros muy bien. ¡o que chulo quedó! terminaros buestras patatas fritas de fabrica
comparteros con tú familia
¡os tío cómo ha quedado! molan mucho y se me hace agua la boca

Athlon

yeah I would never have figured that out from the synth