India's largest platform and marketplace for AI & Analytics leaders & professionals

Sign in

India's largest platform and marketplace for AI & Analytics leaders & professionals

3AI Digital Library

Microsoft Announces Limited Access to its Custom Neural Voice

3AI February 15, 2021

Microsoft announced limited access to its neural text-to-speech AI called Custom Neural Voice. The service allows developers to create custom synthetic voices.

.

The Custom Neural Voice is a Text-to-Speech (TTS) feature of Speech in Azure Cognitive Services that allows users to create a one-of-a-kind customized synthetic voice for their brand.  Since the preview last year in September, the feature helped several customers such as AT&T, Duolingo, Progressive, and Swisscom to develop branded speech solutions for their customers. The feature is generally available (GA), yet access for customers to Custom Neural Voice includes technical controls to prevent misuse of the service – they have to apply for it.

Microsoft’s underlying Neural TTS technology for Custom Neural Voice consists of three major components: Text Analyzer, Neural Acoustic Model, and Neural Vocoder. The first component, Text Analyzer, is responsible for generating natural, synthetic speech from text. The text is first input into Text Analyzer, which provides output in the form of phoneme (a basic unit of sound that distinguishes one word from another in a particular language) sequence. Next, the phonemes sequence defines the pronunciations of the words provided in the text, which goes into the Neural Acoustic Model to predict acoustic features that define speech signals, such as the timbre, speaking style, speed, intonations, and stress patterns. And finally, the Neural Vocoder converts the acoustic features into audible waves to generate synthetic speech.

Neural TTS voice models are trained using deep neural networks based on real voice recording samples. With Custom Neural Voice’s customization capability, customers can adapt the Neural TTS engine to fit their user scenarios better. To leverage custom neural voice, customers will need an Azure account and subscription. Subsequently, after approval for using the feature, they can start a custom voice project, upload data, train, test, and deploy the voice model.

There are various use cases possible for customers to benefit from the Custom Neural Voice, such as customer service chatbots, voice assistants, online learning, audiobooks, public service announcements, and real-time translations. One earlier adopter, Swiss.com, wanted to create more engaging customer experiences by building a voice assistant that uniquely represents its brand. In a Microsoft Switzerland news item, the author wrote: 

Using the Speech service, Swisscom has given its customers access to an intelligent, multilingual voice assistant, helping improve the customer experience and accelerate its own digital transformation.

Qinying Liao, principal program manager at Microsoft, described in an Azure AI blog post the benefit of leveraging Custom Neural Voice:

Empowered with this technology, Custom Neural Voice enables users to build highly-realistic voices with just a small number of training audios. This new technology allows companies to spend a tenth of the effort traditionally needed to prepare training data while at the same time significantly increasing the naturalness of the synthetic speech output when compared to traditional training methods.

In addition, Holger Mueller, principal analyst and vice president at Constellation Research Inc., told InfoQ:

In order to make computers more human, speech is a crucial ingredient, and in 2020 enterprises need to depart from the robotic and standardized voices, accents of synthetic speech in the past. The cloud enables this level of personalized creation of personalized voice experience – with availability, cheap compute, and operational capacity. So it is a widespread use case across the IaaS / PaaS players – and suitable for enterprises and their customers, and even employees as they get a more human experience.

Lastly, besides the capability to customize TTS voice models, Microsoft offers over 200 neural and standard voices covering 54 languages and locales.

Picture from freepik.com

    3AI Trending Articles

  • How Augmented Analytics is Transforming the Analytics Ecosystem

    Author:  Sidharth Sivasailam, Vice President – Products, Course5 Intelligence | LinkedIn – https://www.linkedin.com/in/sidharthsiva/ The world of Business Analytics is at an inflection point. Trillions of bytes of data are being generated every day; however, companies continue to struggle with harmonizing this data, analyzing the data of various shapes and sizes they are storing, determining what’s most […]

  • Unleashing the Power of AI in CPG: A Unified Approach for Transformative Growth

    Featured Article Author: Chiranjiv Roy, Course5 Intelligence In the ever-competitive Consumer Packaged Goods (CPG) landscape, where consumer preferences shift like sands and market dynamics evolve relentlessly, brands are in a constant quest for differentiation and growth. The disruptive wave of Artificial Intelligence (AI) offers a beacon of innovation, with models like Gemini, Claude, and GPT-4 […]

  • AI to outmanoeuvre human drivers

    As driverless cars become a mainstream reality, AI is aiding it greatly to  remove all barriers to autonomous function and humanization of its operation. The biggest challenge self-driving cars will have to overcome on the road is being able to react to the randomness of traffic flow, other drivers, and the fact that no two […]

  • Data for AI -Optimizing AI Governance and Implementing Key Performance Indicators for Success

    Featured Article Author: Prabhu Chandrasekharan Artificial Intelligence (AI) is revolutionizing industries by driving automation, optimization, and innovation. As AI systems become more complex, establishing robust governance frameworks focused on ‘Data for AI’ is essential. This ensures data quality, security, and neutrality, leading to reliable AI outcomes. Effective AI governance hinges on several Key Performance Indicators […]