Date: July 16, 2024
Microsoft has developed an Advanced Artificial Intelligence Speech Generator that it feels might be too dangerous to place in the public’s hands.
Deepfake is an AI innovation that terrorized the world recently for over a year. While the technology was limited to creating video replicas of people that seem almost indistinguishable, now deepfake voices have emerged to add to the nightmare. What would you do if someone on the internet started using your voice to share offensive content? Mimicking is an act of impersonating someone’s unique voice without making it noticeable.
Microsoft has recently mastered the art of text-to-speech, marking a significant leap in the field of AI development. However, this may not be in the public’s best interest. The AI tool VALL-E 2 has achieved a marvelous feat in generating lifelike human speech that is almost impossible to recognize as generated.
What makes the AI so believable is the Repetition Aware Sampling, which ensures that the AI does not end up in a loop of monotonous speech through similar pronunciation. By addressing repetitions of tokens, the AI has developed, in a way, its own units of words and syllables.
Another advanced feature of VALL-E 2 is Grouped Code Modeling, which allows the AI to reduce the process sequence length. In simple words, the AI tool has formed smaller thoughts for each speech or sentence. This helps the tool process conversations faster and keep them as separate records, just like an action and reaction.
According to individual researchers who gained access to the AI tool, it is the first AI text-to-speech generator to have achieved such robustness, naturalness, and similarity with the human speaker. This breakthrough can be useful in many aspects, but it can also create chaos quite seamlessly.
"VALL-E 2 is purely a research project. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public," said the researchers at Microsoft.
While the world can benefit from a wide range of applications in the education and entertainment sector, giving the general public access to tech can be extremely dangerous. Microsoft has decided to restrict the AI tool to public use, aiming to prevent misleading, fake, and scandalous content. Its advanced voice cloning capability can bypass many security systems that identify fake voices and masked actors.
By Arpit Dubey
Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. Armed with a Bachelor's in Business Administration and a knack for crafting compelling narratives and a sharp specialization in everything from Predictive Analytics to FinTech—and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.
Reddit Unveils AI-Powered Search Tool for Smarter Results
Reddit launched Reddit Answers, an AI-powered search tool that curates and summarizes discussions to enhance user experience and reduce reliance on Google.
OpenAI Scraps o3 Model, Pushes for Unified GPT-5 in a Major AI Overhaul
OpenAI is canceling its o3 AI model and merging it into GPT-5 for a simpler, more powerful system. A big move to stay ahead in the AI race.
Virtual Reality in Healthcare: Revolutionizing Patient Care
Experience the power of virtual reality in healthcare as it transforms medical training, patient care, and treatment methods with immersive technology for better accuracy, efficiency, and improved outcomes.
Google I/O 2025: Dates Announced for the Tech Giant’s Biggest Event of the Year
Google I/O 2025 is set for May 20-21! Expect big AI reveals, Android 16 updates, and more. Registrations are open for keynotes, demos, and game-changing tech innovations!