- About
- Version & Release Notes
- Prerequisites
- Installation
- Usage
- Error Handling
- Testing AWS Configuration
- Troubleshooting
- Contributing
- License
This application transcribes audio files to text using AWS Transcribe, Google Cloud Speech-to-Text, Azure AI Speech, or local Whisper. It supports various audio file formats and allows for language specification. It also includes diarization where supported by the selected provider.
It is intended to run from the command line in macOS as a Python script, using an audio file as input and producing a markdown transcript as output.
For secure GCP local setup, see docs/gcp_config.md.
- Improved Error Handling: Enhanced error validation and user experience
- Added file format validation before AWS API calls
- Converts AWS
ClientErrorexceptions to user-friendly messages - Validates speaker count range (1-10) with clear error messages
- Added graceful exit with
sys.exit(1)instead of raw exceptions - Clear error messages with specific guidance on how to fix issues
- Visual indicators using ❌ emoji for error messages
- Supported formats:
amr,flac,wav,ogg,mp3,mp4,webm,m4a
For detailed release notes and technical changes, see CHANGELOG.md
- added diarization functionality
- user can define the number of speakers in the audio file (default=2)
- with option --no-diarization the script will not do diarization
- added automatic language identification
- added optional parameter to define the language of the audio (supports ISO code like es-ES, fr-FR, en-US, etc)
- takes an audio file and transcribes it, output format of the transcription in markdown
- Python 3.x
- AWS account with appropriate permissions for AWS Transcribe and S3
- Virtual environment (recommended)
-
Clone the repository:
git clone https://github.com/yourusername/transcribe_app.git cd transcribe_app -
Create and activate a virtual environment:
python3 -m venv venv_transcribe source venv_transcribe/bin/activate -
Install the required packages:
pip install -r requirements.txt
In the command line, in your local directory:
-
Activate the virtual environment:
source ~/development/venvs/venv_transcribe/bin/activate
-
Navigate to the project directory:
cd ~/Documents/code/transcribe_app
-
Run the transcription script:
python3 ./scripts/mytranscript.py {input_audio_file.mp3} {output_transcript_file.md} --language en-US -
Run with --help for more options
% python3 ./scripts/mytranscript.py --help Usage: mytranscript.py [OPTIONS] AUDIO_FILE OUTPUT_FILE Transcribe audio file to markdown text Options: -l, --language TEXT Language code (e.g., es-ES, en-US). If not provided, automatic detection will be used. -s, --speakers INTEGER Maximum number of speakers to identify (2-10) --diarization / --no-diarization Enable/disable speaker diarization --help Show this message and exit.
The application now provides improved error handling with clear, user-friendly messages:
- File Format Validation: Automatically checks if your audio file format is supported before uploading
- Clear Error Messages: Instead of technical AWS errors, you'll see helpful messages like:
❌ Unsupported file format: 'mov'. AWS Transcribe supports: amr, flac, m4a, mp3, mp4, ogg, wav, webm - Parameter Validation: Validates input parameters (e.g., speaker count must be between 2-10)
- AWS Error Translation: Converts complex AWS error codes into understandable messages
- Visual Indicators: Uses ❌ and ✅ emojis to clearly indicate success or failure
The script test_aws.py helps to check that your AWS configuration is working:
python3 ./scripts/test_aws.py
Successfully connected to AWS
Available buckets: [<your-list-of-s3-buckets>]If you encounter issues, check the following:
- Ensure your AWS credentials are correct and have the necessary permissions.
- Verify that the input audio file exists and is in a supported format.
- Check the AWS Transcribe service limits and quotas.
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
Go to the DIR ./docs to check for more configuration settings