Want to get better, more realistic voices out of AI text-to-speech systems? Want to impress your friends and colleagues with your "Professional Voiceover?" You're in the right place. Here are 10 basic tips to improve the quality of your text-to-speech outputs.
Video and instructions below. Pick your poison!
1) Pick an AI voice you love
The first step is to pick an AI voice that you love. It will not get better with use, so do not delude yourself. Take time to explore the voice libraries and find a voice that really resonates with you and your personality. This is step is really important because you are going to be hearing this voice a lot!
2) Plan your work
Before you start entering your text first stop and think about how you really want the words to sound. What words do you want to emphasize? Where do you want pauses?
Start with the plan of how you want the output “to sound” before you start entering the text.
3) Emphasize words
You can use "" or ALL CAPS to emphasize words. Don't worry about using punctuation in a weird way. No one's gonna see it. They're only gonna hear it. Same goes with ALL CAPS; you don't need to be concerned that writing in all caps that it's going to come out like a scream. It won't.
I can't believe he ate the "whole" thing.
I can't believe he ate the WHOLE thing.
4) Break up the text
The next thing you need to consider is the length of a single sentence. If you have a particularly long sentence, you may want to break it up into smaller sentences.
Why do we do this? Sometimes when the AI is presented with a very long sentence it can sound like it's running out of air toward the end of the sentence. For example, it may start talking really fast. Or, it may sound like you, when you try to you try to read out loud for as long as you can, in one breath. You start to sound flat towards the end.
You may be asking, "why on earth does it do this?" Remember, these models were trained on real humans. So they work with the data they've been fed. I'm already seeing improvements in this area, so I'm sure in the next year these smaller issues will be addressed. Meanwhile, use shorter sentences.
5) Slowing it down
If you want your AI to pause or slow down, you can do this by adding a comma, an ellipse, or a double dash.
Crime, and punishment
Crime…and punishment
Crime--and punishment
You can also start a new paragraph if you're wanting a more significant pause in between sentences or topics period.
6) Grouping words and phrases
When it comes to titles topics and concepts sometimes AI can struggle to understand when group of words need to be grouped together. This is particularly obvious when using industry terms or business jargon that the model may not have been trained on.
There are three ways that you can solve this problem:
-1- Capitalize each of the words to indicate that it is a proper noun or similar. E.g. Goody Two Shoes
-2- Put quotes around the group of words so it knows it's a single group of words to be said together. E.g. "goody two shoes"
-3- You can use dashes to hold the words together. E.g. goody-two-shoes
7) Using bulleted or numbered lists
If for some reason your AI text-to-speech is not recognizing your numbered or bulleted list, there are a few ways to address this.
First: when formatting a bulleted list make sure the sentence running up to the list has a colon at the end of it. Then each one of the items in the list should have its own line with a comma at the end. The last item on the list has a "comma and" followed by the final item on the list with a period at the end. You need the period at the end otherwise if you have content below the bulleted list it will just create a run on sentence.
There are a few ways to cook and egg:
scrambled,
fried,
poached, and
you can make and omelete.
Second: AI can sometimes be challenged with recognizing and saying the numbers in a numbered list. Use the same technique as in the first example, and start each line with the written word for the number see example below.
How to get stuff done:
One, write down all tasks,
Two, break down large tasks,
Three, prioritize tasks,
Four, limit the number of items, and
Five, organize by energy level.
Lastly: You can format it like a sentence but, use caution. If your list is too long, you may run into the problem described in Number 4 above.
8) Acronyms and industry jargon
As you play around with AI text-to-speech, you're going to realize that there are words that the AI does not know because they are not words...they are acronyms or business jargon. In these cases you may need to sound it out phonetically for the AI to interpret it.
NAACP = in double a see pea
NFL = in eff elle
NASA - nass-sah
ASAP = Readers, please let me know…I spent 30 minutes trying everything I could dream up and I never could get it to sound like ASAP.
9) Special Characters
AI will struggle with & and other special characters $, %. Since the models were trained on human language, and ampersands and dollar signs are actually special characters they are not great at recognizing them. When you input your text you may need to write out the word dollar or the word and to make sure that the AI speaks it correctly.
10) Meaningful file names
My final piece of advice is to ensure you give your audio files a meaningful name. The file names coming directly out of the AI text-to-speech systems can be long and meaningless. My recommendation is to use a few meaningful words from the recording such as: the topic of the recording, the first few words, or the last few words of the recording. Pick one that works best with your memory recall.
Go get started
And that's it for the basics. The best way to learn is by doing, so get out there and start practicing.
NOTE: The system we use at Medusaas 11 Labs. It's my personal favorite and I think the voices are pretty darn good right out-of-the-box. That said, with AI competition the way it is, you never know when a new leader in this space will emerge. Stay informed and do you're own research.
Comments