top of page
Call for R&D project Proposals

Rolling call and received proposals will be evaluated every month.

VISHLESHAN I-HUB FOUNDATION, IIT Patna is the nodal centre and a Technology Innovation Hub (TIH) for technology development and activities in the core areas of ‘Speech, Video, and Text Analytics Technologies’ in synergy with ‘wireless, sensor, and IoT technologies, material sciences etc.’ under National Mission on Interdisciplinary Cyber-Physical Systems (NMICPS). TIH, IIT Patna aims to promote translational research in CPS technologies, with a central focus on ‘Speech, Video and Text Analytics’ combining the fields of:

  1. AI & Internet of Things (IoT).

  2. Drone and counter drone technologies.

  3. Point care diagnostics and health care.

  4. Embedded systems, nanotechnolgies- sensors, drug delivery.

  5. Civil and defence cyber-security aspects.

  6. RF & Telematics.


To become a key contributor to Digital India by promoting translational research, innovation, and entrepreneurial support systems in CPS technologies to provide commercially viable solutions.


  1. To foster technological innovation and excellence for the benefit of humanity.

  2. To foster research & innovation towards good governance, ease of living, health and family welfare, Data management and analysis related products.

  3. To continuously identify needed areas and provide solutions by CPS technologies.

  4. To assist and motivate young, talented, and ambitious individuals to pursue entrepreneurship.

  5. To offer a top-notch interactive networked platform to all CPS innovation ecosystem participants.

Eligibility & Application Process

  1. The applicant should be Indian nationals.

  2. The proposal can be submitted, either individually or in collaboration, by Principal Investigator (PI) from Academia (as per point (a)) or by Project Leader (PL) on behalf of Industry (as per point (b)).

    1. Research Institute, University with a well-established support system for research. The institute should have been established in India and have NAAC/UGC/AICTE or any equivalent recognition certificate or any other Public/ Government supported organization/Institute of National Importance.

    2. Company (Start-up, Small, Medium or Large)- PVT LTD/LLP incorporated under the Indian Companies Act, 1956/2013 or under the Limited Liability Partnership Act, 2008. 

  3. TIH will be the co-owner of the developed technologies, technology product, publication of patents, or research papers related to the project.

  4. The applicant must demonstrate a minimum working prototype of the proposed idea. Applicants satisfying this condition will be given priority.

  • Funding: 

  1. The project shall be for two years’ duration.

  2. The maximum funding support for this call is 25 lakhs.

  3. The funding depends on the information of the year-wise recurring and non-recurring heading expenditures included in the project proposal.

    1. In the expenses the applicant(s)  must mention  the expected amount of funds  from TIH-IIT Patna.

    2. The applicant(s) must mention if there is a co-funding source to support the above R&D activity.

*Note: Each applicant(s) can submit only one proposal. All the proposals of the applicant(s) would be disqualified if multiple applications are received.


Review: There will be a review by the TIH once every six months. The review will cover both the technical progress and the financial expenditure. The TIH has all the rights to withdraw support based on the progress.

Problem Statements

Speech Analytics

Audio classification

Description: Audio classification is among the most in-demand speech processing task. As deep learning focuses on building a network that resembles a human mind, sound recognition is also essential. While image classification has become much more advanced and widespread, audio classification is still a relatively new concept. It can be used in multiple applications in the field of AI and data science such as chatbots, automated voice translators, virtual assistants, music genre identification, and text-to-speech systems.

Audio fingerprint generation

Description: One of the most recent and impressive technologies is audio fingerprinting. When we generate an audio signal by extracting the relevant acoustic features from a piece of audio, then condense the specific audio signal, we call this process audio fingerprinting. We can say that an audio fingerprint is a summary of a particular audio signal. The name ‘fingerprint’ is given here because every audio fingerprint is unique, just like human fingerprints. Some of the major applications of audio fingerprinting include content-based audio retrieval, broadcast monitoring, etc.

Separate Audio Sources

Description: A prevalent task in speech processing is the separation of audio sources. In simple terms, audio source separation focuses on distinguishing different types of audio source signals present in the midst of signals. We perform audio source separation every day. A rough example of audio source separation in real life is when you distinguish the lyrics of a song. In that instance, we separate the lyrics’ audio signals from the rest of the music. We can use deep learning to perform this as well.

Music tagger to automatically identify and organize digital music collection

Description: This task is similar to the audio classification. However, there’s a slight difference. Music tagging helps in creating metadata for songs so people can find them easily in an extensive database. In music tagging, we have to work with multiple classes. So, we have to implement a multi-label classification algorithm. One of the applications of this task can be to provide a platform for deaf persons to listen to and understand the music and songs. 


Recommender System for Music

Description: People now have access to music collections on a never-before-seen scale thanks to the expansion of digital content distribution. Commercial music collections have well over 15 million songs, far more than any one person could possibly listen to. There are millions of music available, which can sometimes make individuals feel overwhelmed. Therefore, in the best interests of music service providers as well as users, an effective music recommendation system is required. Users won't have to struggle with choosing what music to listen to, and music companies will be able to keep their current customer base and draw in new ones by enhancing user pleasure.


Gender Recognition through voice

Description: With this task, we can provide a facility to recognize the gender of a person through their voice. Gender recognition is a technique that is often utilized to determine the gender category of a speaker by processing speech signals. Speech signals taken from a recorded speech can be used to acquire acoustic attributes such as duration, intensity, frequency, and filtering. Some applications where gender recognition can be useful are speech emotion recognition, human-to-machine interaction, sorting of telephone calls by gender categorization, automatic salutations, muting sounds for gender, and audio/video categorization with tagging.

Speech Detector to separate speech from background sounds, noise, etc.

Description: A primary complaint of hearing-impaired (HI) listeners is poor speech recognition in background noise. This issue can be quite debilitating and persists despite considerable efforts to improve hearing technology. The primary limitation resulting from sensorineural hearing impairment of cochlear origin involves elevated audiometric thresholds and resulting limited audibility. Because intense sounds are often perceived at normal loudness, these listeners often have reduced dynamic range and display a steep growth of loudness as signal intensity is increased.

Plug-in development to enable voice command for existing applications

Description: As we know, a physically impaired (ortho) person cannot access information from the internet. They require a voice command controlling system that can be plugged in with any application to give the instructions. This plug-in gives permission to add voice abilities to your applications. They will be able to control everything in the application using your voice. 


Indian Sign Language Translator

Description: Sign language is a visual language that is used by deaf people as their mother tongue. Unlike acoustically conveyed sound patterns, sign language uses body language and manual communication to fluidly convey the thoughts of a person. Due to the considerable time required in learning Sign Language, it becomes difficult to communicate with these specially-abled people, and thus creates a communication gap. Hence, there is a need of software that can takes in live speech or audio recording as input, converts it into text, and displays the relevant Indian Sign Language images or GIFs. 

A real-time speech translator for efficient communication

Description: A real-time speech translator can enhance the communication between two or more persons with different language regions. Real-time voice translation is equally incredible and can act as an intermediary for two people holding a conversation using different languages. Real-Time Speech Translation is one of the difficult areas in speech recognition technology that can be used to build speech-to-speech translator software. There are many companies that have used this technology to develop various kinds of applications that can be used to build a real-time voice communication system between foreigners such as Skype Translator, Google Translate, YouTube Subtitles, Alexa Translate, and IBM Watson Language Translator. However; This proposal investigates the performance of real-time speech translation in different applications such as the entertainment industry, international conferences, international events and news industry, education, etc. that are able to capture words in one language and reproduce them in another.


End-to-End Modular Multilingual Audio Transcription system for Airborne Platform

Description: Automatic speech recognition (ASR) systems are a technology for identifying and processing human voices with the help of computer hardware and software-based techniques. We can use it to determine the words spoken or authenticate the person’s identity. In the case of defence surveillance, we can use it for gathering intelligence from intercepted communications, regarding personnel or aircraft identity. The state-of-the-art ASR system recognizes wholly spontaneous speech that is natural, unrehearsed, and contains minor errors or hesitation markers.


Text Analytics

Hate Speech and Offensive Content Identification


 Description: Due to their accessibility and user-friendliness, social media platforms like Twitter and Facebook give users a platform to voice their opinions. These platforms are overloaded with data because users of all ages use them to share every moment of their life. In addition to these positive aspects, social media also has drawbacks. The prevalence of hate speech and other undesirable and offensive content online presents societies with significant challenges. Derogatory, harsh, insulting, or vulgar language directed at one person but available to others compromises the objectivity of conversations. As this kind of rhetoric becomes more prevalent online, disputes become more extreme. To shape public opinion, intelligent critical discourse is necessary. Objectionable content can pose a threat to democracy. At the same time, open societies need to find an acceptable way to react to such content without imposing rigid censorship regimes. As a consequence, many platforms of social media websites monitor user posts. This leads to a pressing demand for methods to automatically identify suspicious posts. Online communities, social media enterprises, and technology companies have been investing heavily in technology and processes to identify offensive language to prevent abusive behavior on social media. Furthermore, a conversational thread can contain hate and offensive content, which is not apparent just from a single comment or the reply to a comment but can be identified if given the context of the parent content. Moreover, the contents on such social media are spread in many different languages, including code-mixed languages such as Hinglish (Hindi+English), German-English code mixed, and many more. So, it becomes a huge responsibility for these sites to identify such hate content before it disseminates to the masses.


Information Extraction from Microblogs during Disasters

The only long-term treatment for the COVID-19 pandemic appears to be the widespread immunization of the population. Many people, however, have reservations about the use of vaccines for a variety of reasons, including the politicization of the issue and the hasty development of the vaccines. Understanding public opinion on vaccines is crucial, and social media can be used to swiftly gather a ton of information about how people feel about vaccines. The initial stage in any analysis of vaccine stance is to create an efficient classifier to predict the user stance (towards vaccinations) from social media posts (for example, microblogs). 

Information Extraction from Microblogs during Disasters

The NLP research community has paid surprisingly little attention to automatic text summarizing for Indian languages. Large-scale databases are available for several languages, including English, Chinese, French, German, and Spanish, however, none of the Indian languages have any comparable datasets. The majority of current datasets are either private or too small to be of much use. By developing reusable corpora for Indian Language Summarization and developing abstractive summarization models, we want to close the current gap.



Image/Video Analytics

AI Vision-Based Social Distancing Detection

With its devastating spread to more than 180 nations, 3,519,901 confirmed cases, and 247,630 fatalities worldwide as of May 4, 2020, the rogue coronavirus disease 2019 (COVID-19) has sparked a global disaster. The population is more vulnerable due to a lack of effective therapeutics and a lack of protection against COVID19. Since there are no vaccines for this pandemic, social isolation is the only practical strategy. It calls for the creation of a deep learning-based system to automate the process of observing social estrangement using surveillance footage. 

Human Activity Recognition 

Human activities recognition has become a groundwork area of great interest because it has many significant and futuristic applications; including automated surveillance, Automated Vehicles, language interpretation, and human-computer interfaces (HCI). In recent times exhaustive and in-depth research has been done in this area. The idea of this project is to develop a system that can be used for surveillance and monitoring applications. 

Image processing-based Tracking and Counting Vehicles

In smart cities, there is a greater need for innovative and effective technology to tackle many of the problems that are visible on the surface, as well as to make cities less crowded. Finding a parking spot is one of the most aggravating issues for drivers. Particularly in public venues such as retail malls, five-star hotels, and multiplex cinema halls. Even within the park, drivers waste time and fuel hunting for a spot to park their automobiles. This will damage the driver's emotions as well as pollute the environment while searching for a parking spot. Thus, we need to create and construct a smart parking system that effectively addresses these issues. The demand for innovative and effective technology is growing in smart cities to tackle many of the problems that are visible on the surface, as well as to make cities less congested. 

An image descriptor for visually challenged people

In recent years, because of the widespread use of the Internet and the massive use of audio-visual information in digital format for communications, designing the systems for describing the content of multimedia information in order to seek and classify them is really important. In computer vision, image descriptors describe elementary characteristics such as shape, color, texture, or motion of images, which are visual features of images. One of the applications for image descriptors can be to give the whole description of an image to visually impaired people so that they may visualize the given image.

Paddy crop disease detection using machine learning

Crop infections are one of the main issues contributing to the fact that farmers today are experiencing crop output losses. Due to the loss of harvests from numerous crop diseases, many farmers commit suicide. Insufficient understanding of the disease and the variety of pesticides available to control diseases are to blame for this. However, it is difficult and requires expert guidance, which is time-consuming and expensive, to locate the most recent disease, and appropriate, and effective pesticide to control the infected illness. We require software to identify the illness and offer proper treatment in order to resolve the aforementioned problem.

Multimedia Networking, Multimedia Traffic Management

Multimedia traffic streams have high bandwidth requirements. The best-effort Internet model does not provide any mechanism for applications to reserve network resources to meet such high bandwidth requirements and also does not prevent anyone from sending data at such high rates. Uncontrolled transmissions at such high rates can cause heavy congestion in the network, leading to a congestion collapse that can completely halt the Internet. There is no mechanism in the best-effort Internet to prevent this from happening (except using a brute force technique of disconnecting the source of such congestion). It is left to the discretion of the application to dynamically adapt to network congestions. Elastic applications that use TCP utilize a closed-loop feedback mechanism (built into TCP) to prevent congestion (this method of congestion control is called reactive congestion control). However, most multimedia applications use UDP for transmitting media streams; UDP does not have any mechanism to control congestion and has the capability to create a congestion collapse.

Image/Video Security and Privacy

In a previous couple of decades, video monitoring has become much more prevalent. Modern video surveillance systems are outfitted with tools that make data traversal effective and efficient, providing operators tremendous control and potentially jeopardizing the privacy of anyone being monitored by the device. Thus, a number of methods to safeguard people's privacy have been put forth, but relatively little research has concentrated on the unique security needs of video surveillance data (in transit or in storage) and on permitting access to this data.

Real-time image and video processing

Real-time image and video processing is a challenging problem in smart surveillance applications. It is necessary to tradeoff between high frame rate and high resolution to meet the limited bandwidth requirement in many specific applications. Thus, image super-resolution become one commonly used technique in surveillance platforms. The existing image super-resolution methods have demonstrated that making full use of the image prior can improve the algorithm performance. However, the previous deep-learning-based image super-resolution methods rarely take images prior into account. Therefore, how to make full use of image prior is one of the unsolved problems for deep-network-based single image super-resolution methods.


Robust IT connectivity and digitalization for smart cities

Technology is among the recent furors in India while the government-sanctioned 98,000 Cr. for the transformation of 100 cities in the country into Smart Cities. Living and Lifestyle in these cities will be facilitated by intensive use of information and communication technology (ICT). The central concept of these cities is centered on improving lifestyle through efficient management of resources without any wastage. The role of digital technology is critical in this transformation, as it provides the best possible solution to monitor the wide array of aspects of city living and amenities, connecting every citizen to policy-making and administration. The government is on the way to substantiating a concrete infrastructure for e-governance through the Digital India Program. This will be the essential supportive framework for the success of smart city implementation in the country. The Digital India platform will be the base of connectivity between the diverse authorities, public bodies, private entities, and other players in the scenario. The vision of the program is to empower society through digital means and to drive the knowledge economy of the country. The program will act as the backbone of information sharing and connectivity in Smart Cities. It will render an effectual impetus to shape a healthy and aware society. It will facilitate, people’s participation in governance and administrative programs, inducing growth in human and social capital.


We request applicants from the earlier Proposals to align their proposals with the above areas or the Thrust areas given below and resubmit.



  1. The applicant from Academia (PI)/Industry (PL) must submit an Endorsement Letter in Prescribed format mentioned in the document below from their parent organization.


   2. For Terms and Conditions see the document below.

For any query, please contact to TIH, IIT Patna at: 

Apply Here

Download the proposal format by clicking the button here

Click Here to submit your proposal Or click on the submit Application button below to be director to the online form.  You can give additional information related to the project proposal (if not mentioned in format).




Terms and Conditions

Equipment purchase

  1. All the Assets acquired from the funds will be the property of the TIH and should not be disposed of, encumbered or utilized for purposes other than those for which the funds had been sanctioned.

  2. At the conclusion/ termination of the project, the TIH will be free to sell or otherwise dispose off the Assets which are the property of the TIH.

  3. By the end of the project, the equipment should be returned to TIH in good condition, if not returned, within the time period mentioned by the TIH, the rent will be charged to the Principal Investigator (PI)/Project Leader (PL) on a basis as decided by the TIH.

  4. In case of damaged equipment, PI/PL is responsible for the repair or to provide new equipment to the TIH.

  5. The TIH has the discretion to give the Assets to the Institutions or transfer them to any other institution if it is considered appropriate.

  6. Major capital expenditure will be done directly via TIH or with TIH approval.


Funds milestone

  1. Funds will be released half-yearly and based on milestones.

  2. The staff hired by the project funds will be only working for the project activities, not for institutional activities. Also, the staff salary will be released on a quarterly basis and staff will be on the payroll of the TIH.

  3. The PI/PL must bring a share of the funds/matching funds from the industry.



  1. TIH may designate a Scientist/Specialist or an Expert Panel to visit the Institution periodically to review the progress of the work being carried out and suggest suitable measures to ensure the realization of the objectives of the Project.

  2. During the project's implementation, the Institution will provide all facilities to the visiting scientist/specialist or the Expert Panel by way of accommodation, etc., at the time of their visit.

  3. The technical evaluation of the project would be done on a half-yearly basis.


Patents and Publications:

All patents and publications require a prior written consent from TIH. Depending on the relevance, the TIH reserves the right to reject patent/publication permissions.

  1. All the patents and publications should be published through TIH or by the review from the TIH. Patents, IP, and other intellectual properties emerging from the project will be governed by the IP Policy of the TIH.

  2. TIH will be the co-owner of the developed technologies, technology product, publication of patents, or research papers related to the project.

bottom of page