'A Revolution In Indian Languages
Is Waiting To Be Unleashed'
( https://www.rediff.com/news/interview/mission-bhashini-must-read-interview/20251003.htm )
October 03, 2025
'The Bhashini Mission has delivered a
working technology at large scale, which is as good as or better than the one
with MNC tech giants.'
One of the dreams of the founders of Artificial Intelligence who met in Dartmouth College in New Hampshire, USA, in 1956 and produced a Manifesto of AI was speech to speech translation using computers.
Are we close to achieving it today?
Will Mission Bhashini be the answer?
Professor Rajeev Sangal,
a pioneering computer scientist, former director of IIT Varanasi and founder
director of IIIT Hyderabad, offers a masterclass on AI and language technology.
A distinguished alumnus of IIT Kanpur
and the University of Pennsylvania, Professor Sangal is a world-renowned expert
in computational linguistics, best known for his groundbreaking work on the
Computational Paninian Grammar framework for Indian languages.
He conceived Mission Bhashini and
continues to guide it, as the founding Chair of the mission's executive
committee.
He provides a rare, behind-the-scenes
look at the conception and execution of India's ambitious Bhashini Mission for
speech-to-speech translation, detailing its visionary strategy, the ethical
dilemmas of AI opacity, the future of capturing linguistic nuance, and a
strategic roadmap for India to achieve global leadership in domain-specific AI
applications, making it an essential read for anyone interested in technology,
governance, and innovation.
"Many people thought there is no
way India can catch up with the MNC tech giants. These sceptics did not realise
that Indian academia had the technical know-how, of building prototype
models," the distinguished professor tells Shivanand Kanavi in
a must read conversation.
Photograph: Kind courtesy bhashini.gov.in
How was Mission Bhashini conceived?
Mission Bhashini was thought of by
the Prime Minister's Science Technology and Innovation Advisory Council (PM-STIAC).
I was approached by Professor K
Vijayraghavan, then the chairman of the council, in September 2018, asking me
if language translation could be addressed by technology, and to draw up a plan
for it, particularly for S&T (science and technology) content in
English.
I was happy that language technology
was coming at the forefront of national priorities.
I had demonstrated the machine
translation technology we had developed to our PM in February 2016 at BHU, when
I was director of IIT Varanasi.
How did you go about conceptualising
the mission?
When I was thinking about the
mission, I had to look at the current situation -- the state of the technology,
access to devices by people, and their needs that could be fulfilled.
By now, smart phones had come in the
hands of a large population, they were wanting to access content in their own
languages on the Internet.
At that time, the volume of Indian
language content was in even smaller quantities than today.
Even today the total content in all
the Indian languages put together is less than 0.1% of the content on the
Internet. Yes, less than 0.1%!
So, providing English content
translated in local languages to the common man would become desirable if the
technology could be made ready.
The prevailing mindset at the time of
conception of Bhashini in 2018-2019 was that India did not have a demonstrated
working machine translation much less any speech to speech machine translation
system. Many people thought there is no way India can catch up with the MNC
tech giants.
These sceptics did not realise that
Indian academia had the technical know-how, of building prototype models.
This was a result of research and
development of the past 30 years with government funding.
Now, it was a matter of rebuilding
those systems using the latest tools and approaches, and engineering the models
for large scale use.
India had also gained experience in
building Aadhaar and UPI. What is today known as Digital Public Infrastructure.
With the above capability and
experience, it is no wonder that the Bhashini Mission has delivered a working
technology at large scale, which is as good as or better than the one with MNC
tech giants.
IMAGE: Professor Rajeev Sangal with Prime Minister Narendra Modi.
What were the key ideas in the
conception of the mission?
One had to work out the scope, tasks,
and types of uses. I felt that we should take speech to speech machine
translation (SSMT), and not limit ourselves to text to text machine
translation (MT).
This might look like a simple
expansion of the scope, but researchers in MT and in speech processing are
quite separate -- they work separately, and were often located in two different
departments.
Would it be possible to make them
work together, towards a single goal?
A workshop was organised with leading
researchers from both the areas in January 2019 at IIIT Hyderabad, to discuss
possible approaches to be taken in the mission.
Consulting colleagues from both the
areas, I felt confident that there was a willingness to work together.
I knew that already there was a high
level capability in both the areas in India.
It is a testimony to the strength of
Indian academia and proper governmental funding in the past under the
Technology Development for Indian Langauges (TDIL) programme of Meity
(Ministry of Electronics & IT).
It is a testimony to the strength of
Indian academia and proper governmental funding in the past, even though in the
recent past, there was a lull in funding.
I decided to take the plunge for
Speech to Speech technology !
Educational courses on NPTEL/Swayam,
besides websites, were the prime targets identified to be used for training in
Machine Learning.
It would allow students across the
country to access content in higher education in their own languages.
Moreover, translation of formal
lectures would be easier than conversations, because conversations use very
short or partial sentences, and are highly contextual.
Complex technologies like speech
processing and text to text translation make errors. So the system was designed
to take human inputs as well, including corrections.
It would normally function as a human
machine combination, although as its quality improves it could also be used in
a fully automatic mode.
Finally, it was also decided that all
22 official Indian languages together with English would be covered.
The SSMT capability would be
developed to translate among all these languages. When technology development
is left to the MNCs, they choose to develop technology only in those languages
for which there is a market need.
As a result, almost one third of the
languages are left out completely. Here, we would cover all 22 official Indian
languages.
What kind of technology was decided to be developed?
It was decided to develop AI models
for spoken language translation. This included developing automatic speech
recognition (ASR or speech to text) models, text to text machine
translation (MT), and text to speech (TTS) models.
These technologies when put in a
pipeline would give the SSMT system.
The pipeline would also contain, as
needed, ancillary models for disfluency (breaks in speech) correction, named
entity recognition, lip synchronisation, etc.
Parts of the pipeline would also be
usable as a standalone MT system for text to text translation, or a
transcription system for speech.
Human intervention would also be
possible at every stage.
Such intervention would be important
for correcting errors in recordings, though not in online live use.
This basic technology would also open
up the market for tools and support applications of various kinds, such as
summarisation, LLMs (which have come later), sentiment analysis.
It was also decided to build OCR
technology for recognition of Indian language text from images.
The above SSMT pipeline would be
built for all 22 official languages of India, and go even beyond these
languages later.
If the technology is developed within
the country, one has full control over it and one can put it to myriads of uses.
What were some of the strategic
elements in the design of the Mission?
A question arose as to how the
technology built indigenously ('Made in India') can compete with those
developed by MNC tech giants like Google, Microsoft and Meta.
They have the Indian language data
(hundred times more than what we possess), the compute resources, and have
captured the market as well.
When AI systems are tested under
standard artificial benchmarks, they perform very differently compared to use
in real life situations. Each 'real life area' is a niche area.
The mission should have a mechanism
for supporting the niche areas.
For each domain or application area,
the mission should be able to help enhance technology and nurture startups.
The support would be provided through
'Technology Acceleration Centres'.
Startups in these areas can compete
and win against MNC tech giants. These ideas were built into the mission
document.
On the question of building strong
research teams, the idea of 'consortium' of academic institutions was used to
build critical mass of researchers in a project.
Language technology area needs
computer scientists, linguists, Sanskrit grammarians, and also language experts
of the concerned languages, all working together.
A single institution usually lacks
the required expertise. The 13 approved consortia included 70+ research groups
located in 30+ institutions covering 22 Indian languages.
It was possible to run such a large
distributed mission, only because of the consortia approach.
Even though at times the accounting
software in Meity and other ministries is making it very difficult for
consortia projects to work.
This method of work has been crucial
for progress.
On the issue of data, it was clear
that a large amount of money would have to be budgeted for the creation of data
for all 22 official Indian languages.
This would mean capturing spoken data
and its transcription for all the languages, and parallel sentences in the
original language and its translation.
It was also felt that to make the
data freely available to Indian researchers and Indian startups, the data would
be made open and freely downloadable by anybody.
Ironically this openness in the
project would mean that this high quality data would be available to MNC tech
giants also, for free.
They have much larger amount of data
but of poor quality, gleaned from their users or from the Internet.
Therefore, I had reservations on this
count, but mechanisms to restrict distribution of data do not work; they end up
denying it to Indian researchers and startups.
Not only the data, the models were
also made open source and freely downloadable by anyone.
The goal of Bhashini was not just to
deliver a technology, but to build an eco-system for language technology, with
all these elements.
What were the elements of the
eco-system of language translation that were identified?
The eco-system that Bhashini seeks to
develop consists of R&D groups, data creation and collection groups,
technology acceleration centres, mechanisms for technology transfer, incubation
of startups, participation by other companies, state governments, and the users
including publishers, course-ware developers, government departments, end
users, etc.
This eco-system would be nurtured by
Meity using Bhashini funds.
One can think of them as being a part
of three different cycles in society: a. technology cycle; b. market cycle, and
c. social cycle.
Each of the cycles had to be made
active, and moreover, these cycles have to be mutually reinforcing.
What are these cycles? Can you explain.
The first cycle is the technology
cycle. It provides linkages between R&D and startups. R&D does
research, finds new ways of doing things, builds lab prototypes to field
prototypes - leading to new technology development and its demonstration.
Startups and existing companies take
this technology, convert it into products, and service the customers.
However, for the technology to be
transferred to companies, the technology has to be 'engineered' for robustness,
ruggededness, and adaptation to needs of real life customers.
This task needs to be done by a
separate entity, call it technology accceleration centres (TACs).
They have to connect with startups
and help them solve problems which come in the way of adaptation of 'new'
technologies.
This is called a cycle because there
is a two way flow between the two.
The second cycle is the market cycle.
It involves, for example, the content providers such as publishers giving
services to their customers or end users.
However, they need modern translation
tools as well as other AI tools, to make their tasks easier.
This is where technology based
startups come in, in making the content in multiple languages or provide new
kinds of services, including voicebots.
These help the providers in reaching
their end users.
The third is the social cycle. The
task here is to get a large number of people into creating Indian language
digital content -- both original and translated, proliferation of use of
language tools, contributing to languages through teaching, contests, games,
etc.
The principle actors here are
schools, colleges, language departments and academies, culture departments,
students, state governments, and general masses.
This cycle yields love for culture
and languages, encourages language aware and digitally trained manpower
including e-translators, and of course, precious data.
Linking with state governments is an
important step in this cycle.
These cycles are driven by their
inner dynamics. Technology cycle is driven by knowledge, market cycle by money,
and the social cycle by service.
In the mission, major progress has
been made currently in the development of technology, and some progress with
central government or its ministries as user.
The market cycle and the social cycle
need to be specially energised, as they are much delayed.
What are the outcomes of the Bhashini
Mission so far?
Mission Bhashini has led to the
development of a range of technologies for SSMT (speech to speech machine
translation) for Indian languages.
These technologies have been made ready
not just as a lab or field prototype, but engineered for large scale use.
These are the result of R&D and
good engineering. OCR technology is also under development.
The above technologies are available
in 20+ Indian languages, with 350+ different AI models.
The Bhashini app has been available
for free download for some time and provides basic services over mobile phone.
A large number of government
ministries are using these technologies, provided as a free service by Meity.
Many of these are as voicebots to
assist the users in availing online services, including enquiries about
government schemes, filling forms, etc.
Bhashini technology has been used in
translating lectures and course material for higher education available on the
NPTEL and Swayam platforms.
This has been accomplished by video
to video translation of lectures from original English into some 8 Indian
languages.
More than 200 courses have been
translated. Subtitling facility has also been made available.
More languages and courses are being
covered as an ongoing activity.
Open sourcing of data and models, has
allowed Indian language data to be used by a large number of individuals,
institutions, and startups as free downloads.
What needs to be done now is to
develop the market cycle by nurturing startups. They really need to provide
services to their customers; governments or private.
Various sectors are waiting to be
exploited, such as health, agriculture, school education, etc.
Technology acceleration centres
planned under the mission can go a long way in energising the startups in the
eco-system.
R&D needs to continue the
exploration of completely new ways of building technology which can handle
prosody in speech processing, and discourse in machine translation.
What is prosody in speech?
Prosody in speech refers to the
rhythm, stress, and intonation of spoken language, or the 'music' of speech,
and it conveys meaning and emotional nuance beyond individual words.
It uses features like pitch,
loudness, and duration to signal things such as a statement versus a question,
the speaker's emotions, sarcasm, or emphasis on certain words.
In future, these systems would
utilise features from prosody such as tonal changes, pauses, emphasis, and
sentiments in Indian languages.
Similarly, MT would not be limited to
sentence to sentence translation at a time but do paragraph to paragraph
translation. Indian academia is well poised to do it.
Finally, the social cycle needs to be
jump started. It would mean involving the common man in building content for
their languages, become adapt in using translation tools under Bhashini, and
finally become the creator of original content in Indian languages.
State governments can play a major role in this. A revolution in
Indian languages is waiting to be unleashed.
No comments:
Post a Comment