Web Exclusive
How IBM researchers enabled AI platform Watson to understand native Hindi

Topics IBM | IBM India | India languages

Gargi Dasgupta, director, IBM Research India and chief technology officer, IBM India and South Asia
With work-from-home becoming the new normal due to the coronavirus pandemic, Gargi Dasgupta was able to pick up gardening and planting as a new hobby. She is also catering to the needs of her family as everyone is at home and making efforts to maintain a work-life balance. However, as the director of IBM Research India and the chief technology officer of IBM India and South Asia, Dasgupta has been busy in realising the vision of the future of computing. She is leading her team to do this through the infusion of artificial intelligence and Blockchain into the enterprise ecosystem. In her CTO role, she is helping IBM India lead with innovations in areas such as cloud and cognitive.

She says IBM's focus is on helping clients innovate across all three foundations of AI – natural language processing, building trust, and automation. In the area of natural language processing – by using a combination of AI-deep learning, shallow semantic understanding models, researchers at IBM have enabled Watson to understand the Hindi language natively.

Watson Discovery now includes support for 10 new languages including Hindi. With this capability, it can now natively understand the meaning of Hindi text written in Devanagari script. It extracts useful information from it such as business keywords, named entities, relationships, perform sentiment analysis, and other such NLP features. IBM Research-India collaborated with its AI Horizon Network partner IIT Bombay to lead this effort.

“Watson Discovery can natively understand documents in Hindi and extract meaning, without having to resort to any translation to English,” says Dasgupta. “This is an outcome of our collaboration with IIT Bombay’s Center for Indian Language Technology as a part of IBM’s AI Horizon Network.”

This way India has taken positive steps to establish a leadership role in AI. The IBM-IIT Bombay AI Horizon Network (AIHN) project is an important step - both for science and technology and for the country’s progress. 

Prof Pushpak Bhattacharyya, Professor-Department of Computer Science and Engineering, IIT- Bombay said the collaboration is focused on automation of cognition and perception. There is an emphasis on cutting edge research, high-quality publications and open and widely usable AI resources. There are social and commercial needs, whose servicing requires user interaction and information dissemination in Indian languages (IL). However, the complexity of Indian languages, low corpora and other constraints makes adoption and adaptation of English centric NLP very difficult for IL-NLP.

“Through our partnership with IBM, we have been able to use machine learning for IL-NLP and address challenges related to the low resource, understanding of Hindi language sense, intent, sentiment and more,” says Bhattacharyya. “In two years, the endeavour has seen stellar publications coming out of the collaboration and creation of quality software (in NLP, speech and multimodal AI).”

These capabilities would help enterprises get more insights from their data and have immediate use in different areas. These also include customer care, where the frequent use of casual language has made accurate understanding, classification and fine-grained analysis difficult.

At a time when Covid-19 has wreaked havoc on humans and businesses globally, IBM’s Dasgupta also sees great applications of natural language processing technology to address the challenges posed by the pandemic.

The National Health Mission, under the Government of Andhra Pradesh, has collaborated with IBM to deploy Watson Assistant for citizens. It provides Covid-19-related information for citizens on the response efforts and measures by the Andhra Pradesh Government. The Watson virtual agent on the IBM public cloud brings together Watson Assistant, natural language processing capabilities from IBM Research, and enterprise AI search capabilities with Watson Discovery. This helps to understand and respond to common questions about Covid-19 in English, Telugu and Hindi.

Also, Indian Council of Medical Research (ICMR) collaborated with IBM to implement Watson Assistant on its portal to respond to specific queries of front line staff and data entry operators from various testing and diagnostic facilities across the country on Covid-19. The queries could be related to nature and process of data to be captured by test labs. This includes how to record inventory of test kits and reagents, the process of reporting to various Government agencies and references to the latest guidance, in addition to responding to queries on Covid-19 in general. “We helped their frontline agents actually respond to a deluge of requests that they were getting on the covid testing procedures,” says Dasgupta.

The innovation is important as less than 10 per cent of the population in the country knows the English language. This way the majority of the population is left out from the benefits of the technology. 

“If you're serious about Indians, you have to be serious about Indian languages,” says Karthik Sankaranarayanan, senior manager, AI for Interaction, IBM Research India “Hindi which is being spoken by almost like 50 to 60 per cent of the population, was naturally our first goal post to go after.”

With the new technology, Watson can natively understand Hindi written in Devanagari and all such information available on the internet. This data includes text messages, emails and medical reports. The firm would be focusing on other Indian languages as well including Bengali, Kannada and Gujarati. The ability to understand any language natively is very different compared to learning a new foreign language like Mandarin or Spanish where one first translates it into English and then interprets the meaning.

Watson can presently understand Hindi in written format. Sankaranarayanan said IBM is advancing the AI technologies to understand spoken languages as well. “We're advancing Watson to understand spoken Hindi,” he said. “You'll see announcements related to advancements next year.”


Business Standard is now on Telegram.
For insightful reports and views on business, markets, politics and other issues, subscribe to our official Telegram channel