Table of contents
Transcribing audio files can be a tedious task, especially if the files are long or if you have multiple files to process. To tackle this challenge, I developed a Python-based application that automates the transcription of M4A files using the OpenAI Whisper model. This blog post will walk you through the project, highlighting its key components and functionality.
Project Overview
The project consists of several Python scripts:
split_
m4a.py
setup_
transcriber.py
Together, these scripts form a comprehensive solution for splitting M4A files into manageable segments, transcribing the audio, and presenting the results in a user-friendly GUI. Let's dive into the details of each component.
1. setup.py
The setup.py
script is responsible for setting up the application using py2app
, a Python setup tool for creating standalone macOS applications.
from setuptools import setup
APP = ['app.py']
DATA_FILES = []
OPTIONS = {
'argv_emulation': True,
'packages': ['pydub', 'argparse', 'tkinter'],
'includes': ['tkinter']
}
setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)
2. app.py
The app.py
script is the core of the application, containing the GUI implementation and the main logic for processing M4A files. The application uses tkinter
for the GUI and integrates OpenAI's API for transcription.
import tkinter as tk
from tkinter import filedialog, messagebox, ttk
import os
import threading
from split_m4a import split_m4a
import subprocess
import openai
# Initialize OpenAI client
openai.api_key = os.getenv("OPENAI_API_KEY")
def transcribe_file(file_path):
command = [
'curl',
'--request', 'POST',
'--url', 'https://api.openai.com/v1/audio/transcriptions',
'--header', f'Authorization: Bearer {openai.api_key}',
'--header', 'Content-Type: multipart/form-data',
'--form', f'file=@{file_path}',
'--form', 'model=whisper-1',
'--form', 'language="en"'
]
result = subprocess.run(command, capture_output=True, text=True)
return result.stdout
def update_status(progress_bar, status_label, current, total, message):
progress_bar['value'] = (current / total) * 100
status_label.config(text=f"{message} ({current}/{total})")
def process_file(progress_bar, status_label):
file_path = filedialog.askopenfilename(filetypes=[("M4A files", "*.m4a")])
if not file_path:
return
def process():
try:
base_name = os.path.splitext(os.path.basename(file_path))[0]
output_directory = os.path.join(os.path.dirname(file_path), base_name)
os.makedirs(output_directory, exist_ok=True)
update_status(progress_bar, status_label, 0, 1, "Splitting audio")
segments = split_m4a(file_path, output_directory, overlap_seconds=10)
total_segments = len(segments)
transcript = ""
for i, segment in enumerate(segments):
update_status(progress_bar, status_label, i + 1, total_segments, "Transcribing segments")
transcript += transcribe_file(segment)
transcript_path = os.path.join(output_directory, base_name + ".txt")
with open(transcript_path, "w") as f:
f.write(transcript)
update_status(progress_bar, status_label, total_segments, total_segments, "Completed")
messagebox.showinfo("Success", f"Transcription saved to {transcript_path}")
except Exception as e:
messagebox.showerror("Error", str(e))
threading.Thread(target=process).start()
def create_gui():
root = tk.Tk()
root.title(".m4a Transcriber")
frame = tk.Frame(root, padx=20, pady=20)
frame.pack(padx=10, pady=10)
label = tk.Label(frame, text="Select an .m4a file to transcribe:")
label.pack(pady=5)
button = tk.Button(frame, text="Select File", command=lambda: process_file(progress_bar, status_label))
button.pack(pady=5)
progress_bar = ttk.Progressbar(frame, orient="horizontal", length=300, mode="determinate")
progress_bar.pack(pady=10)
status_label = tk.Label(frame, text="Status: Waiting for file selection")
status_label.pack(pady=5)
root.mainloop()
if __name__ == "__main__":
create_gui()
3. split_
m4a.py
The split_
m4a.py
script handles splitting the M4A files into smaller segments to make the transcription process more manageable. This is particularly useful for long audio files.
from pydub import AudioSegment
import os
def split_m4a(file_path, output_directory, segment_duration=30000, overlap_seconds=10):
audio = AudioSegment.from_file(file_path, format="m4a")
segment_length = segment_duration - (overlap_seconds * 1000)
segments = []
for i in range(0, len(audio), segment_length):
segment = audio[i:i + segment_duration]
segment_path = os.path.join(output_directory, f"segment_{i // segment_length}.m4a")
segment.export(segment_path, format="m4a")
segments.append(segment_path)
return segments
4. setup_
transcriber.py
The setup_
transcriber.py
script is a setup script for preparing the environment, installing necessary packages, and running the application.
import subprocess
# Define the content of the setup.py script
setup_content = """
from setuptools import setup
APP = ['app.py']
DATA_FILES = []
OPTIONS = {
'argv_emulation': True,
'packages': ['pydub', 'argparse', 'tkinter', 'subprocess', 'threading', 'openai'],
}
setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)
"""
# Create the Python scripts
with open('split_m4a.py', 'w') as f:
f.write(split_m4a_content)
with open('app.py', 'w') as f:
f.write(app_content)
with open('setup.py', 'w') as f:
f.write(setup_content)
# Install required packages
subprocess.run(['pip', 'install', 'pydub', 'argparse', 'tkinter', 'py2app', 'openai'])
# Run py2app to create the macOS app
subprocess.run(['python', 'setup.py', 'py2app'])
How It Works
File Selection: The user selects an M4A file through a graphical interface.
Audio Splitting: The selected file is split into smaller segments to ease the transcription process.
Transcription: Each segment is sent to the OpenAI API for transcription using the Whisper model.
Progress Tracking: The application provides real-time feedback on the transcription progress.
Output: The complete transcription is saved as a text file in the same directory as the original M4A file.
Step-by-Step Guide to Create a Dockable macOS Application for a Python Script
Step 1: Prepare Your Environment
Ensure Python and Dependencies are Installed:
Open Terminal.
Activate your virtual environment:
source /Users/karl.bolinger/myenv/bin/activate
- Install required packages:
pip install pydub
Step 2: Create a Wrapper Script
Create the Wrapper Script:
- In Terminal, navigate to the directory containing your
app.py
script:
- In Terminal, navigate to the directory containing your
cd /Users/karl.bolinger/Documents
- Create the
run_
app.sh
script:
touch run_app.sh
- Edit
run_
app.sh
to include the following:
#!/bin/bash
source /Users/karl.bolinger/myenv/bin/activate
/opt/homebrew/bin/python3 /Users/karl.bolinger/Documents/app.py
- Make the script executable:
chmod +x run_app.sh
Step 3: Create an Automator Application
Open Automator:
- Open Automator from the Applications folder or use Spotlight search.
Create a New Application:
- Choose "New Document" and select "Application".
Add a Run Shell Script Action:
In the search bar, type "Run Shell Script" and drag the action into the workflow pane.
Set the shell to
/bin/bash
.
Enter the Shell Script:
- Enter the following script to run your
run_
app.sh
script:
- Enter the following script to run your
/Users/karl.bolinger/Documents/run_app.sh
- **Save the
Automator Application**: - Save the Automator application to your Applications folder with a name like "MyPythonApp".
Add the Application to the Dock:
Navigate to the saved Automator application in Finder.
Drag the application to your Dock.
Final Steps
Test the Application:
- Click the new application icon in the Dock to ensure it launches your Python script correctly.
Debugging:
- If the application doesn't work as expected, open the Automator application and check the shell script for any typos or incorrect paths.
By following these steps, you will have a dockable macOS application with a custom icon that runs your Python script with the necessary environment activated. If you encounter any issues, feel free to ask for further assistance!
Conclusion
This project demonstrates how Python can be used to automate the transcription of audio files, leveraging powerful APIs and providing a user-friendly interface. The modular design allows for easy customization and extension, making it a valuable tool for anyone who frequently works with audio transcriptions. Whether you're a journalist, researcher, or just someone looking to save time, this application can help streamline your workflow and improve productivity.