Genesis of Bhashini: India’s First AI Mission
A
conversation with Prof Rajeev Sangal by Shivanand Kanavi
Prof. Rajeev Sangal, a pioneering computer
scientist and former Director of IIT (BHU) Varanasi and founder Director of
IIIT Hyderabad offers a masterclass on AI and language technology. A
distinguished alumnus of IIT Kanpur and the University of Pennsylvania, Prof.
Sangal is a world-renowned expert in computational linguistics, best known for
his groundbreaking work on the Computational Paninian Grammar framework for
Indian languages.
He conceived the Mission Bhashini and continues to guide it, as the founding Chair of Executive Committee of the Mission. He provides a rare, behind-the-scenes look at
the conception and execution of India’s ambitious Bhashini mission for
speech-to-speech translation, detailing its visionary strategy, the ethical
dilemmas of AI opacity, the future of capturing linguistic nuance, and a
strategic roadmap for India to achieve global leadership in domain-specific AI
applications, making it an essential read for anyone interested in technology,
governance, and innovation.
-------------------------------------------------------
One of the dreams of founders of Artificial Intelligence who met in Dartmouth College, in New Hampshire USA in 1956 and produced a “Manifesto of AI” was speech to speech translation using computers. Are we close to achieving it today? Will Mission Bhashini be the answer?
---------------------------------------
Shivanand Kanavi: Tell us how was Mission
Bhashini conceived?
Rajeev Sangal: Mission Bhashini was thought of
by the Prime Minister’s Science Technology and Innovation Advisory Council
(PM-STIAC). I was approached by Prof K Vijayraghavan, then chairman of the
council, in September 2018, asking me if language translation could be
addressed by technology, and to draw up a plan for it, particularly for S&T
content in English.
I was happy that language technology was coming
at the forefront of national priorities. I had demonstrated the machine
translation technology we had developed to our PM, in February 2016 at BHU,
when I was the Director of IIT (BHU) Varanasi.
Shivanand Kanavi: How did you go about
conceptualizing the Mission?
Rajeev Sangal: When I was thinking about the
mission, I had to look at the current situation - the state of the technology,
access to devices by people, and their needs that could be fulfilled.
By now, smart phones had come in the hands of a
large population, they were wanting to access content in their own languages on
the internet. At that time, the volume of Indian language content was in even
smaller quantities than today.
Even today the total content in all the Indian
languages put together is less than 0.1% of the content on the internet. Yes,
less than 0.1% !
So, providing English content translated in
local languages to common man would become desirable if the technology could be
made ready.
The prevailing mindset at the time of
conception of Bhashini in 2018-19 was that India did not have a demonstrated
working machine translation much less any speech to speech machine translation
system, and many people thought that there is no way India can catch up with
the MNC tech giants.
These sceptics did not realize that Indian
academia had the technical know-how, of building prototype models. This was a
result of research and development of the past 30 years with government
funding. Now, it was a matter of rebuilding those systems using the latest
tools and approaches, and engineering the models for large scale use.
India had also gained experience in building
Aadhaar (2009) and UPI (2016). What is today known as Digital Public Infrastructure.
With the above capability and experience, it is no wonder that the Bhashini
Mission has delivered a working technology at large scale, which is as good as
or better than the one with MNC tech giants.
Shivanand Kanavi: What were the key ideas in
the conception of the Mission?
Rajeev Sangal: One had to work out the scope,
tasks, and types of uses. I felt that we should take speech to speech machine
translation (SSMT), and not limit ourselves to text to text machine translation
(MT). This might look like a simple expansion of the scope, but researchers in
MT and in speech processing are quite separate - they work separately, and were
often located in two different departments. Would it be possible to make them
work together, towards a single goal?
A workshop was organized with leading researchers from both the areas in January 2019 at IIIT Hyderabad, to discuss possible approaches to be taken in the Mission. Consulting colleagues from both the areas, I felt confident that there was a willingness to work together. I knew that already there was a high level capability in both the areas in India. It is a testimony to the strength of Indian academia and proper governmental funding in the past under Technology Development for Indian Langauges (TDIL) program of Meity (Ministry of Electronics & IT).
It is a testimony to the strength of Indian academia and proper governmental
funding in the past, even though in the recent past, there was a lull in
funding.
I decided to take the plunge for Speech to
Speech technology !
Educational courses on NPTEL / Swayam, besides
web-sites, were the prime targets identified to be used for training in Machine
Learning. It would allow students across the country to access content in
higher education in their own languages.
Moreover, translation of formal lectures would
be easier than conversations, because conversations use very short or partial
sentences, and are highly contextual.
Complex technologies like speech processing and
text to text translation make errors. So the system was designed to take human
inputs as well, including corrections. It would normally function as a human machine
combination, although as its quality improves it could also be used in a fully
automatic mode.
Finally, it was also decided that all 22
official Indian languages together with English would be covered. The SSMT
capability would be developed to translate among all these languages. When technology
development is left to the MNCs, they choose to develop technology only in
those languages for which there is a market need. As a result, almost one third
of the languages are left out completely. Here, we would cover all 22 official
Indian languages.
Shivanand Kanavi : What kind of technology was
decided to be developed?
Rajeev Sangal : It was decided to develop AI
models for spoken language translation. This included developing automatic
speech recognition (ASR or speech to text) models, text to text machine
translation (MT), and text to speech (TTS) models. These technologies when put
in a pipeline would give the SSMT system. The pipeline would also contain, as
needed, ancillary models for disfluency (breaks in speech) correction, named
entity recognition, lip synchronization, etc.
Parts of the pipeline would also be usable as a
standalone MT system for text to text translation, or a transcription system
for speech. Human intervention would also be possible at every stage.
Such intervention would be important for
correcting errors in recordings, though not in online live use. This basic
technology would also open up the market for tools and support applications of
various kinds, such as summarization, LLMs (which have come later), sentiment
analysis. It was also decided to build OCR technology for recognition of Indian
language text from images.
The above SSMT pipeline would be built for all
22 official languages of India, and go even beyond these languages later. If
the technology is developed within the country, one has full control over it
and one can put it to myriads of uses.
Shivanand Kanavi : What were some of the
strategic elements in the design of the Mission?
Rajeev Sangal : A question arose as to how the
technology built indigenously (’Made in India’) can compete with those
developed by MNC tech giants like Google, Microsoft and Meta. They have the
Indian language data (hundred times more than what we possess), the compute
resources, and have captured the market as well.
When AI systems are tested under standard
artificial benchmarks, they perform very differently compared to use in real
life situations. Each “real life area” is a niche area. The Mission should have
a mechanism for supporting the niche areas. For each domain or application
area, the Mission should be able to help enhance technology and nurture
startups. The support would be provided through “Technology Acceleration
Centres”. Startups in these areas can compete and win against MNC tech giants.
These ideas were built into the Mission document.
On the question of building strong research
teams, the idea of “consortium” of academic institutions was used to build
critical mass of researchers in a project. Language technology area needs
computer scientists, linguists, Sanskrit grammarians, and also language experts
of the concerned languages, all working together. A single institution usually
lacks the required expertise. The 13 approved consortia included 70+ research
groups located in 30+ institutions covering 22 Indian languages.
It was possible to run such a large distributed
Mission, only because of the consortia approach. Even though at times the accounting
software in Meity and other ministries is making it very difficult for
consortia projects to work. This method of work has been crucial for progress.
On the issue of data, it was clear that a large
amount of money would have to be budgeted for the creation of data for all 22
official Indian languages. This would mean capturing spoken data and its
transcription for all the languages, and parallel sentences in the original
language and its translation. It was also felt that to make the data freely
available to Indian researchers and Indian startups, the data would be made
open and freely downloadable by anybody.
Ironically this openness in the project would
mean that this high quality data would be available to MNC tech giants also,
for free.
They have much larger amount of data but of
poor quality, gleaned from their users or from the internet. Therefore, I had
reservations on this count, but mechanisms to restrict distribution of data do
not work; they end up denying it to Indian researchers and startups. Not only
the data, the models were also made open source and freely downloadable by
anyone.
The goal of Bhashini was not just to deliver a
technology, but to build an eco-system for language technology, with all these
elements.
Shivanand Kanavi : What were the elements of
the eco-system of language translation that were identified?
Rajeev Sangal : The eco-system that Bhashini
seeks to develop consists of R&D groups, data creation and collection
groups, technology acceleration centres, mechanisms for technology transfer,
incubation of startups, participation by other companies, state governments,
and the users including publishers, course-ware developers, government
departments, end users, etc.
This eco-system would be nurtured by Meity
using Bhashini funds.
One can think of them as being a part of three
different cycles in society: (a) technology cycle, (b) market cycle, and (c)
social cycle. Each of the cycles had to be made active, and moreover, these
cycles have to be mutually reinforcing.
Shivanand Kanavi : What are these cycles ? Can
you explain.
Rajeev Sangal : The first cycle is the
technology cycle. It provides linkages between R&D and startups. R&D
does research, finds new ways of doing things, builds lab prototypes to field
prototypes - leading to new technology development and its demonstration.
Startups and existing companies take this
technology, convert it into products, and service the customers.
However, for the technology to be transferred
to companies, the technology has to be “engineered” for robustness,
ruggededness, and adaptation to needs of real life customers. This task needs
to be done by a separate entity, call it technology accceleration centres
(TACs). They have to connect with startups and help them solve problems which
come in the way of adaptation of “new” technologies. This is called a cycle
because there is a two way flow between the two.
The second cycle is the market cycle. It
involves, for example, the content providers such as publishers giving services
to their customers or end users. However, they need modern translation tools as
well as other AI tools, to make their tasks easier. This is where technology
based startups come in, in making the content in multiple languages or provide
new kinds of services, including voicebots. These help the providers in
reaching their end users.
The third is the social cycle. The task here is
to get a large number of people into creating Indian language digital content -
both original and translated, proliferation of use of language tools,
contributing to languages through teaching, contests, games, etc.
The principle actors here are schools,
colleges, language departments and academies, culture departments, students,
state governments, and general masses. This cycle yields love for culture and
languages, encourages language aware and digitally trained manpower including
e-translators, and of course, precious data. Linking with state governments is
an important step in this cycle.
These cycles are driven by their inner
dynamics. Technology cycle is driven by knowledge, market cycle by money, and
the social cycle by service.
In the Mission, major progress has been made
currently in the development of technology, and some progress with central
government or its ministries as user. The market cycle and the social cycle
need to be specially energized, as they are much delayed.
Shivanand Kanavi : What are the outcomes of the
Bhashini Mission so far?
Rajeev Sangal : Mission Bhashini has led to the
development of a range of technologies for SSMT (speech to speech machine
translation) for Indian languages. These technologies have been made ready not
just as a lab or field prototype, but engineered for large scale use. These are
the result of R&D and good engineering. OCR technology is also under
development.
The above technologies are available in 20+
Indian languages, with 350+ different AI models. Bhashini app has been
available for free download for some time and provides basic services over
mobile phone.
A large number of government ministries are
using these technologies, provided as a free service by Meity. Many of these
are as voicebots to assist the users in availing online services, including enquiries
about government schemes, filling forms, etc.
Bhashini technology has been used in
translating lectures and course material for higher education available on
NPTEL and Swayam platforms. This has been accomplished by video to video
translation of lectures from original English into some 8 Indian languages.
More than two hundred courses have been translated. Subtitling facility has
also been made available. More languages and courses are being covered as an
ongoing activity.
Open sourcing of data and models, has allowed
Indian language data to be used by a large number of individuals, institutions,
and startups as free downloads.
What needs to be done now is to develop the
market cycle by nurturing startups. They really need to provide services to
their customers; governments or private. Various sectors are waiting to be
exploited, such as health, agriculture, school education, etc.
Technology acceleration centres planned under
the Mission can go a long way in energizing the startups in the eco-system.
R&D needs to continue the exploration of
completely new ways of building technology which can handle prosody in speech
processing, and discourse in machine translation.
Shivanand Kanavi: What is prosody in speech ?
Rajeev Sangal: Prosody in speech refers
to the rhythm, stress, and intonation of spoken language, or the
"music" of speech, and it conveys meaning and emotional nuance beyond
individual words. It uses features like pitch, loudness, and duration to
signal things such as a statement versus a question, the speaker's emotions,
sarcasm, or emphasis on certain words.
In future, these systems would utilize features
from prosody such as tonal changes, pauses, emphasis, and sentiments in Indian
languages. Similarly, MT would not be limited to sentence to sentence translation
at a time but do paragraph to paragraph translation. Indian academia is well
poised to do it.
Finally, the social cycle needs to be jump
started. It would mean involving the common man in building content for their
languages, become adapt in using translation tools under Bhashini, and finally
become the creator of original content in Indian languages. State governments
can play a major role in this. A revolution in Indian languages is waiting to
be unleashed.
(to be continued)
Shivanand Kanavi, a frequent contributor
to Rediff.com, is a theoretical physicist, business
journalist and former VP at TCS.
He is the author of the
award winning book Sand
to Silicon: The Amazing Story Of Digital Technology and edited Research by Design: Innovation and TCS.