Transcribe python script

About

This application transcribes audio files to text using AWS Transcribe, Google Cloud Speech-to-Text, Azure AI Speech, or local Whisper. It supports various audio file formats and allows for language specification. It also includes diarization where supported by the selected provider.

It is intended to run from the command line in macOS as a Python script, using an audio file as input and producing a markdown transcript as output.

For secure GCP local setup, see docs/gcp_config.md.

Version & Release Notes

v.0.4 - release date 2025-07-19

Improved Error Handling: Enhanced error validation and user experience
- Added file format validation before AWS API calls
- Converts AWS ClientError exceptions to user-friendly messages
- Validates speaker count range (1-10) with clear error messages
- Added graceful exit with sys.exit(1) instead of raw exceptions
- Clear error messages with specific guidance on how to fix issues
- Visual indicators using ❌ emoji for error messages
- Supported formats: amr, flac, wav, ogg, mp3, mp4, webm, m4a

For detailed release notes and technical changes, see CHANGELOG.md

v.0.3 - release date 2025-01-07

added diarization functionality
user can define the number of speakers in the audio file (default=2)
with option --no-diarization the script will not do diarization

v.0.2 - release date 2024-12-22

added automatic language identification
added optional parameter to define the language of the audio (supports ISO code like es-ES, fr-FR, en-US, etc)

v.0.1 - release date 2024-11-21

takes an audio file and transcribes it, output format of the transcription in markdown

Prerequisites

Python 3.x
AWS account with appropriate permissions for AWS Transcribe and S3
Virtual environment (recommended)

Installation

Clone the repository:

git clone https://github.com/yourusername/transcribe_app.git
cd transcribe_app

Create and activate a virtual environment:

python3 -m venv venv_transcribe
source venv_transcribe/bin/activate

Install the required packages:
```
pip install -r requirements.txt
```

Usage

In the command line, in your local directory:

Activate the virtual environment:

source ~/development/venvs/venv_transcribe/bin/activate

Navigate to the project directory:
```
cd ~/Documents/code/transcribe_app
```

Run the transcription script:

python3 ./scripts/mytranscript.py {input_audio_file.mp3} {output_transcript_file.md} --language en-US

Run with --help for more options

% python3 ./scripts/mytranscript.py --help
Usage: mytranscript.py [OPTIONS] AUDIO_FILE OUTPUT_FILE

Transcribe audio file to markdown text

Options:
-l, --language TEXT             Language code (e.g., es-ES, en-US). If not
                                provided, automatic detection will be used.
-s, --speakers INTEGER          Maximum number of speakers to identify
                                (2-10)
--diarization / --no-diarization
                                Enable/disable speaker diarization
--help                          Show this message and exit.

Error Handling

The application now provides improved error handling with clear, user-friendly messages:

File Format Validation: Automatically checks if your audio file format is supported before uploading

Clear Error Messages: Instead of technical AWS errors, you'll see helpful messages like:

❌ Unsupported file format: 'mov'. AWS Transcribe supports: amr, flac, m4a, mp3, mp4, ogg, wav, webm

Parameter Validation: Validates input parameters (e.g., speaker count must be between 2-10)
AWS Error Translation: Converts complex AWS error codes into understandable messages
Visual Indicators: Uses ❌ and ✅ emojis to clearly indicate success or failure

Testing AWS Configuration

The script test_aws.py helps to check that your AWS configuration is working:

python3 ./scripts/test_aws.py
Successfully connected to AWS
Available buckets: [<your-list-of-s3-buckets>]

Troubleshooting

If you encounter issues, check the following:

Ensure your AWS credentials are correct and have the necessary permissions.
Verify that the input audio file exists and is in a supported format.
Check the AWS Transcribe service limits and quotas.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_REVIEW.md		CODE_REVIEW.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcribe python script

Table of Contents

About

Version & Release Notes

v.0.4 - release date 2025-07-19

v.0.3 - release date 2025-01-07

v.0.2 - release date 2024-12-22

v.0.1 - release date 2024-11-21

Prerequisites

Installation

Usage

Usage

Error Handling

Testing AWS Configuration

Troubleshooting

Contributing

License

Read more

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transcribe python script

Table of Contents

About

Version & Release Notes

v.0.4 - release date 2025-07-19

v.0.3 - release date 2025-01-07

v.0.2 - release date 2024-12-22

v.0.1 - release date 2024-11-21

Prerequisites

Installation

Usage

Usage

Error Handling

Testing AWS Configuration

Troubleshooting

Contributing

License

Read more

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages