Voice-based AI platforms flourished in pandemic, breaking language barriers

The Covid-19 pandemic sparked greater innovation and proliferation of speech technologies in India
Voice-based artificial intelligence (AI) platforms offered by tech giants like Google and Amazon have been around for a few years. However, though the need for speech technologies has been always been felt in India, the country’s 22 regional languages and hundreds of dialects had made it tough to crack the language barrier.

 
The Covid-19 pandemic sparked greater innovation and proliferation of speech technologies in India. The move has been led by several new-age companies that are coming up with practical solutions suited to Indian conditions.

 
Take Axis Bank. Though it had been trying to build a speech-based platform for over a year, it was only after the pandemic led to falling call volumes that the bank redoubled its efforts and succeeding in rolling out its voice-based AI platform, AXAA, along with partner Vernacular.ai.

 
“IVR systems are painful. Customers have to manoeuvre the system to reach the service they want, and the entire process is time-consuming. Conversational bots powered by AI bring in much more efficiency, as they are able to understand and make your interaction much sharper,” says Ratan Kesh, executive vice-president and head of retail operations and service at the bank.

 
Axis Bank receives some 70,000 calls per day and half of them are terminated at the IVR system. Since AXAA’s deployment in June, it has handled about nine million calls, and the bot has learnt to understand the intent of callers, which is helping it to route calls to a suitable queue.

 
“We are getting close to 90 per cent accuracy with conversational bots,” Kesh adds. At present AXAA caters to English, Hindi and Hinglish.

 
Navigating the internet

 
For Sourabh Gupta, co-founder and CEO of Vernacular.ai, getting Axis Bank on board is just the first step towards solving the Indic language conundrum. After graduating from IIT-Roorkee, Gupta and fellow co-founder Akshay Deshraj noticed how people in smaller cities struggle to navigate the internet owing to a lack of knowledge of English.

 
Says Gupta: “At Kanakapura, a town about 55 km from Bengaluru, we met a farmer who told us that though he receives SMSs from his bank, to decipher them he has to travel 15-20 km to his bank branch, which means loss of a day's income.” That’s when they realised, he adds, that if internet usage is to be expanded, users will need voice to navigate through it.

 
Building a voice engine requires the creation of a speech-to-text engine, which in turn requires all possible data on that language. For instance, when creating a Hindi language database, all possible variations of Hindi spoken across India must be included.

 
So, the co-founders spent the first few months in figuring out how to create this database. Failing to collect voice samples from the co-working space where they started out from, they built a crowdsourcing platform where people could give the company data from across India. At present, the platform has over one million users.

 
Vernacular.ai’s next challenge was code mixing. Many Indians are bilingual and mix the vernacular language with English when speaking. “We realised that we had to build a text-to-speech engine as well, because India is a country of many languages and over 700 dialects,” says Gupta.

 
At Bengaluru-headquartered Gnani.ai, established at the same time as Vernacular.ai, the idea was to create proprietary technology that would beat the global giants. Founders Ganesh Gopalan and Ananth Nagaraj, former colleagues at Texas Instruments, spent the first two years building four capabilities — core algorithms, a database for voice, tuning the algorithms and using linguists to understand the language. The firm now supports 14 Indian languages and is expanding globally as well.
Gnani.ai has largely focused on solving specific problems for certain industries. For instance, it has created conversational collection bots for the BFSI (banking, financial services and insurance) sector that takes into consideration the customer’s emotions.

 
“Voice bots have a persona that evolves, based on the messaging. For instance, when the moratorium was announced, we had to make calls, but the bot's voice had to be more considerate, kind and careful. It would start by asking the health of the customer and their family, and would then come to the topic of the moratorium,” says Gopalan.

 
Now, the bots are more focused on the task of collection. After the experience of the lockdown, Gnani is trying to make its bots more human-like in their responses, and is working with linguists to train them.

 
“It’s not always about algorithms, but how you tune it. When we work with linguists, this helps us to come up with things which are not there in the written world. For instance, in Kannada for every root word, the colloquial language will have 60-70 variations. Understanding linguistic nuances such as phoneme and dialects is a challenge,” Gopalan explains.

 
Voice assistants in e-commerce

 
The importance of conversational bots is even more visible in e-commerce, as the major players try to enter semi-urban and rural areas. In June, Flipkart launched a virtual voice assistant for grocery shopping, which was based on a five-month ethnographic study in multiple cities and towns.

 
The voice-first conversational AI platform was built by Flipkart’s in-house technology team with solutions for speech recognition, natural language understanding, machine translation, and text-to-speech for Indian languages. These solutions are capable of understanding Indian languages, e-commerce categories, and tasks such as product searches, as well as product details and order placement. The platform can also understand users’ intent in real-time, ensuring engaging shopping-related conversations in various Indian languages.


Business Standard is now on Telegram.
For insightful reports and views on business, markets, politics and other issues, subscribe to our official Telegram channel