Vaaniverses

hemant Kadam

Collaborators

CHIRAG YADAV

@erchirag129213

morvi

@morvi

hemant Kadam

Vaaniverses

28%
FindCoder AI-Powered Review (Beta)

Seamless Transcription, Translation, and Text-to-Speech for Multilingual Dynamics.

Designed With 😇 :

  • JavaScriptJavaScript
  • NextjsNextjs
  • PyPy

Features

  • Automatic Speech Recognition (ASR): Converts audio input into text using OpenAI Whisper.
  • Translation: Facilitates text translation between Indian languages and English with IndicTrans.
  • Text-to-Speech (TTS): Generates audio output from text using Google TTS.
  • Dynamic Language Support: Automatically detects and adapts to various languages.
  • Model Integration: Supports tools like LangFlow for workflow optimization and Datastack for efficient data management.
  • Transformers: Utilizes state-of-the-art transformer models like BERT for enhanced NLP tasks.

Components

  1. OpenAI Whisper: Recognizes and transcribes speech into text.
  2. IndicTrans: Handles language translation between supported Indian languages and English.
  3. Google TTS: Converts translated text into audio output.
  4. Transformers: Utilizes powerful transformer-based models like BERT for improved language understanding.

Modes of Operation

1. Transcription Mode

Input audio ➔ Transcription (ASR) ➔ Translation ➔ Audio output (TTS).

2. Translation Mode

Input audio ➔ Transcription (ASR) ➔ Translation ➔ Text output.

Getting Started

Prerequisites

  • Python 3.7 or higher.
  • Required packages: torch, transformers, google-cloud-texttospeech.
  • Access to APIs for OpenAI Whisper, IndicTrans, and Google TTS.

Installation

  1. Clone the repository:git clone <repository-url>
    cd transcription-translation-pipeline
  2. Install dependencies:pip install -r requirements.txt
  3. Configure API keys and environment variables:
    • Add your API keys for Google TTS.
    • Ensure access to OpenAI Whisper and IndicTrans models.

Running the Pipeline

Transcription Mode:

python pipeline.py --mode transcription --input <audio_file_path>

Translation Mode:

python pipeline.py --mode translation --input <audio_file_path> --target_language <language_code>

Dynamic Language Handling

  • Whisper detects the input language automatically.
  • IndicTrans dynamically translates text based on the desired target language.
  • Google TTS generates audio in the target language.
  • Workflow optimization and data management are supported through LangFlow and Datastack integrations.


Problem it solves 🙅‍♂️

  • Problem Solved This pipeline addresses the challenge of bridging language barriers in real-time communication. Its practical applications include: Accessibility: Facilitates communication for individuals with hearing or speech impairments. Education: Assists in creating multilingual learning materials. Media Localization: Supports dubbing and subtitling for content creators. Global Collaboration: Enables effective communication across diverse team

Challenges I ran into 🙅‍♂️

  • Challenges Faced Dynamic Language Detection: Issue: Handling varied accents and dialects. Solution: Fine-tuned Whisper for region-specific datasets. Integration with APIs: Issue: Ensuring seamless communication between models. Solution: Implemented robust error handling and asynchronous processing. Performance Optimization: Issue: High latency during transcription and translation. Solution: Used optimized batch processing and caching mechanisms.
Comments (0)