Video Terminology

Here is some useful video conversion terminology that may assist you.

Bitrate - In computers, data is stored as 1's and 0's. These are called bits. 8 bits makes 1 byte. 1024 bytes makes 1 kilobyte, 1024 kilobytes makes 1 megabyte. As data is stored on disk, the more data uses up more disk space. The bitrate is the number of bits per seconds the file is allowed to use to encode. For example, if the bitrate is 3Mbit, this is 3 Megabit. This means for every second, 3 x 1024 x 1024 bits of data will be stored. Since we do not want humongous video files produced, we can adjust the bitrate to produce a smaller output file. There is a constant trade off between quality and storage. Increasing bitrate increases video quality and reducing it reduces storage needs for the file. Different codecs are able to encode more efficiently and due to this, there generally exists a sweet spot where you can maximize the quality vs file size. For example, on a H264 1280x720, approximately 3Mbit generally produced a very good quality output.

CODEC - CODEC stands for coder-decoder. In short, this is the methodology that is used to encode and decode the video file from raw data into a specific type of video file. When creating the file, the CODEC is responsible for encoding and compressing the file. When viewing the file, the CODEC is responsible for decoding the file into video sequence for playback. At the time of this writing, the H264 CODEC is the most popular CODEC and is widely supported across mobile and computer devices. In videos, there are 2 CODECs being used. One is for the video and the other is used for the audio.

File Type - File Type, or container, refers to the type of file that stores the video and audio streams. For example, H264 video can be stored in an MKV container, an MP4 container among some other containers. The MP4 container is the most widely used container across many different types of devices.

FPS - Frames per second. As video is simply still images being shown rapidly in succession, the FPS is the number of still images that are shown in 1 second of video. The more frames shown, the smoother the video may appear. There are other factors such as the display sync rate of the display. In USA, film frame rates (such as most scripted TV shows and movies) are 24 frames per second (23.976 to be exact), and video frame rates (your news, commercials, non-scripted TV shows) are generally 30 frames per second. The HDTV 720P spec calls for 60 frames per second. On sporting events shown on say FOX or ABC, the action might be extra smooth due to this. However, scripted TV shows on those stations are just frame doubled since they're filmed at 24 fps.

Interlace - Interlacing is the process of taking a video resolution such as 1920x1080 and cutting every other line, every other frame. The result is that every frame only displays half the vertical resolution at any given time. Pre-HDTVs were interlaced. So this technology has been around for ages. 1920x1080i (i for interlaced) would work like this. On the first frame, you get 1920 dots per row and only 540 rows of picture. Those 540 rows might be the odd rows. Then on the next frame you get 1920 dots and 540 rows, but those rows are the even rows. For this reason, the early 1080i HDTV's that came out were actually worse than the 720P TVs that came out later because a 1080i HDTV only required 540 lines of vertical resolution. On a modern 1080p capable TV, a 1080i picture needs to be deinterlaced. On still images, deinterlacing a picture is very easy. You simply show all odd and even lines together. On moving content, you can get a combing effect as the difference between every other line can be seen. Most broadcast HDTV scripted TV and movies are actually broadcast in 1080p but telecined to create a simulated 1080i at 30fps. This content can be reverse telecined to reconstruct the original 1080p at 24fps content.

Progressive - Progressive refers to a natural picture where every line is constantly used - opposed to Interlaced where every other line is used in every other frame. See interlaced for a more detailed explanation.

Resolution - As explained above, a video is a sequence of still images. Each image is actually composed of several colored dots. The number of dots is known as the resolution of the video. A resolution of 1280x720 means that there are 1280 dots per row, and 720 rows of those dots to create the image. The more dots used, the more detail that can be provided in the image, but also the larger the bitrate is needed to keep the quality acceptable. HDTV defines 1280x720 as "720P" and 1920x1080 as "1080P". UHD refers to "4K", which is 3840x2160P (4x the resolution of 1080P).

Sync - Audio and video are stored in a "video file" as separate streams. During playback, the video and audio are decoded and played together. When the sound matches the film, this is considered to be in sync.