Integrating Speech-to-Text Functionality in Django Applications

Integrating Speech-to-Text functionality into Django applications can significantly enhance user experience by allowing audio transcription directly within the app. According to AssemblyAI, developers can leverage their API to implement this feature seamlessly.

Setting Up the Project

To get started, create a new project folder and establish a virtual environment:

# Mac/Linux
python3 -m venv venv
. venv/bin/activate

# Windows
python -m venv venv
.\venv\Scripts\activate.bat

Next, install the necessary packages including Django, AssemblyAI Python SDK, and python-dotenv:

pip install Django assemblyai python-dotenv

Creating the Django Project

Create a new Django project named 'stt_project' and a new app within it called 'transcriptions':

django-admin startproject stt_project
cd stt_project
python manage.py startapp transcriptions

Building the View

In the 'transcriptions' app, create a view to handle file uploads and transcriptions. Open transcriptions/views.py and add the following code:

from django.shortcuts import render
from django import forms
import assemblyai as aai

class UploadFileForm(forms.Form):
    audio_file = forms.FileField()

def index(request):
    context = None
    if request.method == 'POST':
        form = UploadFileForm(request.POST, request.FILES)
        if form.is_valid():
            file = request.FILES['audio_file']
            transcriber = aai.Transcriber()
            transcript = transcriber.transcribe(file.file)
            file.close()
            context = {'transcript': transcript.text} if not transcript.error else {'error': transcript.error}
    return render(request, 'transcriptions/index.html', context)

Defining URL Configuration

Map the view to a URL by creating transcriptions/urls.py:

from django.urls import path
from . import views

urlpatterns = [
    path('', views.index, name='index'),
]

Include this app URL pattern in the global project URL configuration in stt_project/urls.py:

from django.contrib import admin
from django.urls import include, path

urlpatterns = [
    path('', include('transcriptions.urls')),
    path('admin/', admin.site.urls),
]

Creating the HTML Template

Inside the 'transcriptions/templates' directory, create an index.html file with the following content:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>AssemblyAI Django App</title>
</head>
<body>
    <h1>Transcription App with AssemblyAI</h1>
    <form method="post" enctype="multipart/form-data">
        {% csrf_token %}
        <input type="file" accept="audio/*" name="audio_file">
        <button type="submit">Upload</button>
    </form>
    <h2>Transcript:</h2>
    {% if error %}
        <p style="color: red">{{ error }}</p>
    {% endif %}
    <p>{{ transcript }}</p>
</body>
</html>

Setting the API Key

Store the AssemblyAI API key in a .env file in the root directory:

ASSEMBLYAI_API_KEY=your_api_key_here

Load this environment variable in stt_project/settings.py:

from dotenv import load_dotenv
load_dotenv()

Running the Django App

Start the server using the following command:

python manage.py runserver

Visit the app in your browser, upload an audio file, and see the transcribed text appear.

Non-blocking Implementations

To avoid blocking operations, consider using webhooks or async functions. Webhooks notify you when the transcription is ready, while async calls allow the app to continue running during the transcription process.

Using Webhooks

Set a webhook URL in the transcription config and handle the webhook delivery in a separate view function:

webhook_url = f'{request.get_host()}/webhook'
config = aai.TranscriptionConfig().set_webhook(webhook_url)
transcriber.submit(file.file, config)

Define the webhook receiver:

def webhook(request):
    if request.method == 'POST':
        data = json.loads(request.body)
        transcript_id = data['transcript_id']
        transcript = aai.Transcript.get_by_id(transcript_id)

Map this view to a URL:

urlpatterns = [
    path('', views.index, name='index'),
    path('webhook/', views.webhook, name='webhook'),
]

Using Async Functions

Utilize async views in Django for non-blocking transcription:

transcript_future = transcriber.transcribe_async(file.file)
if transcript_future.done():
    transcript = transcript_future.result()

Speech-to-Text Options for Django Apps

When implementing Speech-to-Text, consider cloud-based APIs like AssemblyAI or Google Cloud Speech-to-Text for high accuracy and scalability, or open-source libraries like SpeechRecognition and Whisper for greater control and privacy.

Conclusion

This guide shows how to integrate Speech-to-Text into Django apps using the AssemblyAI API. Developers can choose between blocking and non-blocking implementations and select the best Speech-to-Text solution based on their needs.

For more details, visit the AssemblyAI blog.