BharatGen AI to Support All 22 Scheduled Indian Languages by June 2026


BharatGen AI to Support All 22 Scheduled Indian Languages by June 2026
  • Centre announces BharatGen will cover all 22 scheduled Indian languages by June 2026, with 15 languages targeted by December 2025
  • BharatGen, India’s first government-led AI model initiative, supports text, speech, and vision-language applications in agriculture, governance, and defense
  • Led by IIT Bombay under NM-ICPS, the BharatGen consortium includes top institutions like IIT Madras, IIIT Hyderabad, and IIM Indore for model development and deployment

The Centre has revealed that BharatGen, India's first government-sponsored effort to create core AI models specifically designed for Indian languages and social environments, is going to cater to all 22 scheduled languages of India by June 2026. BharatGen models currently cover nine languages Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada.

Referring to a question in the Lok Sabha on Wednesday (6 August), Union Minister of Science and Technology Jitendra Singh, in a written response, said that BharatGen involves AI technologies operating over various modalities such as large language models for text, speech-to-text and text-to-speech systems, and vision-language models. The second phase is targeted at December 2025, at which time the models will have a cumulative addition of 15 languages, with the rest of the scheduled languages added by mid-2026.

"Henceforth, BharatGen models span 9 Indian languages including Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu, and Kannada", the minister stated.

"By December 2025, all 15 Indian Languages (Assamese, Bengali, Gujarati, Hindi, Kannada, Maithili, Malayalam, Marathi, Nepali, Odia, Punjabi, Sanskrit, Sindhi, Tamil and Telugu) will be covered", he said. "All 22 scheduled Indian languages will be covered by June 2026", Singh added.

Also Read: Capgemini India to Hire 45,000 in 2025 with Strong Focus on AI Training

The minister elaborated that BharatGen has created sector-specific apps in agriculture, governance, and defense. They have been tested in targeted areas and are to be implemented in all states and districts upon completion of the deployment phase. The project works under the Department of Science and Technology's National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS).

The deployment is guided by two Technology Innovation Hubs (TIHs): the TIH Foundation for IoT and IoE at IIT Bombay, which manages the national program and guides model development, and the IITM Pravartak Technologies Foundation at IIT Madras, which targets real-world deployment in sectors such as governance, media, and security. The BharatGen consortium consists of some of the premier academic and research institutions.

IIT Bombay is the lead institution, overseeing research integration among partners. IIIT Hyderabad takes the lead in vision-language modelling, IIT Madras has the speech model development responsibility, IIT Kanpur has legal AI and multilingual tokenization methods, IIT Hyderabad handles vocabulary optimization for large language models, IIT Mandi has inclusive model creation and effective training strategies, and IIM Indore has model testing and multilingual data collection to deal with.