ADDING SUBTITLES TO AVENGERS TRAILER WITH AI | AWS TRANSCRIBE, S3 & IAM | OPENCV | PYTHON | 2023

By Kerchow
Picture of the author
Published on
image alt attribute

Write Up

This project focuses on AWS Transcribe, S3, and IAM as well as OpenCV. The goal of the project was to add subtitles to a 30 second trailer of Avengers.

Utilizing AWS Transcribe the goal was accomplished within two different methods.

First being individual subtitles that were overlayed on top of the video while the individual word was being spoken.

Second was utilizing the VTT file that is direct subtitles that are grouped together in sentences and those were overlayed on top of the video when being spoken as well.

The GitHub for this project is here: https://github.com/kerchow-inc/Subtitle-Transcribe-Trailer


Important Functions

The main functions to discuss are "aws_start_transcription" and "convert_trailer_frames_to_transcribed_video_vtt".

aws_start_transcription - Calling the start_transcription_job api so that the transcription job will start. Notice the subtitles.formats and vtt was specified. This is what allows the response to give us the individual word json as well as the vtt file for sentence transcription

convert_trailer_frames_to_transcribed_video_vtt - Function that will insert vtt transcription onto the original trailer. The helper function stitches back the audio onto the trailer that the original helper function did not do. Most importantly the start_time and end_time of the transcription are kept track of to ensure that text does not stay on longer than it should.

# start transcription function
'''
    name(string): name we want our transcription job name to be
    key(string): location in s3 where media is stored
    format(string): format transcription will be handling
'''


def aws_start_transcription(name, key, format):
    transcription = transcribe_client.start_transcription_job(**{
        'TranscriptionJobName': name,
        'Media': {'MediaFileUri': key},
        'MediaFormat': format,
        'LanguageCode': 'en-US',
        'OutputBucketName': 'kerchow-content',
        'Subtitles': {
            'Formats': [
                'vtt',
            ],
            'OutputStartIndex': 1
        },
    })
    return transcription['TranscriptionJob']
# converting trailer frames to transcribed video with audio
'''
  trailer(string): location of local video
  file_name(string): location of vtt file
'''


def convert_trailer_frames_to_transcribed_video_vtt(trailer, file_name):
    cap = cv2.VideoCapture(trailer)

    if (cap.isOpened() == False):
        return {}

    # count the number of frames and fps
    frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
    fps = round(cap.get(cv2.CAP_PROP_FPS))
    cap.release()

    for i in range(int(frames)):
        # dividing the curr_frame by fps to get current time in seconds
        curr_second = i/(fps)
        save_file_name = 'trailer//frame'+str(i+1)+'.jpg'
        # read our current frame in
        photo = cv2.imread(save_file_name)

        # check to see if there is any results within our bounds
        for caption in webvtt.read(file_name):
            # make sure that it has data because some may not have any text
            start_time = convert_vtt_to_seconds(caption.start)
            end_time = convert_vtt_to_seconds(caption.end)
            word = caption.text
            # see if our word is within our time frame
            if float(start_time) <= curr_second <= float(end_time):
                # put the text onto the screen
                cv2.putText(photo, word, (600, 600),
                            cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)
        # write our new file out
        cv2.imwrite(save_file_name.replace('trailer', 'subtitles'), photo)
    # create the movie with audio
    helpers.turn_trailer_back_to_movie(
        'subtitles', 'avengers_subtitles.mp4', 'photos//avengers.mp4')


Watch on YouTube

Want to get a more indepth reasoning behind why certain code is written or missing explanations of the rest of the code?

Watch the video below!