Content is the king, and in 2020 video content is the ruler of the Internet’s content kingdom. Consumption of video based content has snowballed in the last few years, so much that even the social networks like Twitter and LinkedIn, which were originally based on text and images, have now embraced it. Moreover, new services such as TikTok have emerged which are totally focused on video. One of the key considerations for every content delivery medium is personalization. It mostly revolves around the language. That’s where subtitles have a huge role to play in video based content.
Subtitles for YouTube API is an API which is specifically geared towards extracting the subtitles in a YouTube video. You might be wondering what’s the use of subtitles outside the context of watching the video. We will get back to the use cases of this API towards the end of this tutorial. Let’s first explore the Subtitles for YouTube API and get to know its endpoints and options. This API is directly integrated with Rakuten RapidAPI so you can jump to the API Console to take a look.
In case you don’t have an account on Rakuten Rapid API, sign up now and get your universal API key to access the Subtitles for YouTube API and thousands of other APIs hosted on Rakuten RapidAPI.
Table of Contents
Overview of Subtitles for YouTube API
The Subtitles for YouTube API provides a set of utility endpoints to extract the entire subtitle track of a video hosted on YouTube. It can query the presence of subtitles along with language information, as well extract the entire subtitle.
Check out the API console to get a glimpse of the supported endpoints.
Here is how you can use each of the four endpoints.
GET List languages
The “GET List languages” endpoint is more of an informative endpoint. It returns the supported languages that are available in the API. The languages returned by this endpoint are the ones which the API can detect in the subtitles of a video.
GET Get subtitle in SRT format
The “GET Get subtitle in SRT format” endpoint returns the subtitles in SRT format. The SRT format is a file extension (*.srt) used by a popular subtitle extraction tool called SubRip. It is a text file with an interleaved sequence of each subtitle along with its timing information in the video.
GET List all available subtitles
The “GET List all available subtitles” endpoint is used to query the availability of subtitles in a video. It returns the available subtitle tracks along with their language.
GET Get Subtitle in JSON format
The “GET Get Subtitles in JSON format” endpoint returns the subtitles in a JSON format. It contains all the information that you get from the SRT format. However, JSON is more readable and also easy to integrate with applications that process the subtitle data from videos. Hence this is the preferred way of consuming this API.
API Pricing
The Subtitles for YouTube API is offered on a “freemium” subscription with 100 free API calls a day under the BASIC subscription.
Let’s Play The Subtitle Track
Before you can extract the subtitles of a YouTube video, you must know its video Id. The video id can be obtained from the YouTube URL of the video.
Open youtube.com and search for any video which has subtitles. You can do this by selecting the “Subtitles/CC” option from the filter menu below the search bar.
Let’s search for Arnold Schwarzenegger with the subtitles enabled. Here is one of the the videos about Arnold that pops up at the top of search results.
The video id of this video is in the last part of the URL after the “?v=”. For the above video, the video id is “X5rZHlqIm3E”
With the knowledge of video id, you can do two things. You can query if this video supports subtitles with the help of “GET List all available subtitles” endpoint. Otherwise you can also directly extract the subtitles with “GET Get subtitle in SRT format” or “GET Get Subtitles in JSON format” endpoint.
To query the subtitles in the video, select the “GET List all available subtitles” endpoint and feed in this video id for the ‘videoid’ parameter.
This API returns the list of subtitles with their language and translation options. In the case of video id “X5rZHlqIm3E”, you can see that the API response indicates that the subtitles are present.
Now select the “GET Get Subtitles in JSON format” endpoint and feed in this video id for the ‘videoid’ parameter.
Fire the API and you should see a long JSON array containing all the subtitles of this video.
Each array element is a JSON object which contains a ‘start’, ‘dur’, and ‘end’ key that contains the start time, duration and end time of the subtitle display in the video. These times are in seconds. Finally, the ‘text’ field contains the actual subtitle text.
You can verify the subtitles and their display times by playing the first few seconds of the video.
What’s the Use of Subtitles?
This question was posed at the beginning of this tutorial. You may always think of subtitles as a great addition to personalize your video viewing experience. However the subtitle information is actually a goldmine of data. There are a few ways in which you can leverage this data.
In Video Search
How many times have you watched a long video lecture only to get bored midway and sought for the lecture transcript? If you get hold of the subtitles data of the video, then you can definitely search through the subtitle text for the keywords of your interest. Then you can jump back and forth the video lecture based on the subtitle’s display time, without wasting your time watching the entire video. In video search is a big thing, and there are companies who are building tools which can even search through videos without relying on the subtitles.
Invideo is a chrome extension which allows you to search within YouTube videos.
Video Based Search
If you can search within a video, based on its subtitles, then why not extend it to web search results. With the subtitles information, you can include videos in SERP listings based on the keyword match with subtitle text.
Google possibly uses this technique already but its not clear whether they rely on subtitles or employ some other means.
Topical Visualization
The subtitle information from a reasonably long video spews out hundreds of sentences which contain a few thousands words. Now imagine the scale of this data from millions of videos uploaded on YouTube everyday. This is huge, and it’s an opportunity to establish topical relationships between the words contained in the subtitles from different videos. This can help us build a visual search system where similar videos are categorized together.
Summary
The above use cases are just a tip of the iceberg. With detailed monitoring on video viewership, it is possible to tag subtitles to contain additional information linking them to the most viewed portions of a video. This way, you can build more sophisticated search features for videos.
If you are on the lookout for more options in extracting data from videos then check out our Video API Collection. Please leave your comments below if you have more questions. We will be back soon with yet another interesting API tutorial
Leave a Reply