ADDING SUBTITLES TO AVENGERS TRAILER WITH AI | AWS TRANSCRIBE, S3 & IAM | OPENCV | PYTHON | 2023
- Published on
Write Up
This project focuses on AWS Transcribe, S3, and IAM as well as OpenCV. The goal of the project was to add subtitles to a 30 second trailer of Avengers.
Utilizing AWS Transcribe the goal was accomplished within two different methods.
First being individual subtitles that were overlayed on top of the video while the individual word was being spoken.
Second was utilizing the VTT file that is direct subtitles that are grouped together in sentences and those were overlayed on top of the video when being spoken as well.
The GitHub for this project is here: https://github.com/kerchow-inc/Subtitle-Transcribe-Trailer
Important Functions
The main functions to discuss are "aws_start_transcription" and "convert_trailer_frames_to_transcribed_video_vtt".
aws_start_transcription - Calling the start_transcription_job api so that the transcription job will start. Notice the subtitles.formats and vtt was specified. This is what allows the response to give us the individual word json as well as the vtt file for sentence transcription
convert_trailer_frames_to_transcribed_video_vtt - Function that will insert vtt transcription onto the original trailer. The helper function stitches back the audio onto the trailer that the original helper function did not do. Most importantly the start_time and end_time of the transcription are kept track of to ensure that text does not stay on longer than it should.
# start transcription function
'''
name(string): name we want our transcription job name to be
key(string): location in s3 where media is stored
format(string): format transcription will be handling
'''
def aws_start_transcription(name, key, format):
transcription = transcribe_client.start_transcription_job(**{
'TranscriptionJobName': name,
'Media': {'MediaFileUri': key},
'MediaFormat': format,
'LanguageCode': 'en-US',
'OutputBucketName': 'kerchow-content',
'Subtitles': {
'Formats': [
'vtt',
],
'OutputStartIndex': 1
},
})
return transcription['TranscriptionJob']
# converting trailer frames to transcribed video with audio
'''
trailer(string): location of local video
file_name(string): location of vtt file
'''
def convert_trailer_frames_to_transcribed_video_vtt(trailer, file_name):
cap = cv2.VideoCapture(trailer)
if (cap.isOpened() == False):
return {}
# count the number of frames and fps
frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)
fps = round(cap.get(cv2.CAP_PROP_FPS))
cap.release()
for i in range(int(frames)):
# dividing the curr_frame by fps to get current time in seconds
curr_second = i/(fps)
save_file_name = 'trailer//frame'+str(i+1)+'.jpg'
# read our current frame in
photo = cv2.imread(save_file_name)
# check to see if there is any results within our bounds
for caption in webvtt.read(file_name):
# make sure that it has data because some may not have any text
start_time = convert_vtt_to_seconds(caption.start)
end_time = convert_vtt_to_seconds(caption.end)
word = caption.text
# see if our word is within our time frame
if float(start_time) <= curr_second <= float(end_time):
# put the text onto the screen
cv2.putText(photo, word, (600, 600),
cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)
# write our new file out
cv2.imwrite(save_file_name.replace('trailer', 'subtitles'), photo)
# create the movie with audio
helpers.turn_trailer_back_to_movie(
'subtitles', 'avengers_subtitles.mp4', 'photos//avengers.mp4')
Watch on YouTube
Want to get a more indepth reasoning behind why certain code is written or missing explanations of the rest of the code?
Watch the video below!