Viewing video productions is a very popular activity in this day and age. It might be at the movie theater, over broadcast or cable television, on DVD or VHS tape, or from a video server on a local area network. The promise of streaming video to the majority of home desktop computers over the Internet is just around the corner, as we see broadband networks proliferate. This requires a high-bandwidth connection, a fast microprocessor, and the latest decoding software. There is something mystical and wondrous surrounding the production of high quality video. Coupled with the complexity of production tools, a lack of knowledge helps keep the promise of distributed video to the desktop a thing of the future. Given that production equipment is now within the budget of hobbyists, and the fact that software applications are highly evolved and much easier to use, the door has been flung open to anyone with creative ideas and the incentive to learn how to produce digital video. Creating digital video involves these steps:
First, determine exactly what you hope to accomplish with your project. Evaluate the target audience and the requirements of delivering video to them. With a clear idea of the project goals and audience, choose the delivery methods. Then create project specifications, a storyboard, and a script. If this is a work for hire, the client should approve the project plan in writing prior to production.
What are you trying to communicate to the viewer? How will it be delivered? If it will be online, how many users will need to be able to access it simultaneously? Will the media be viewed in a linear fashion, or will it be part of an interactive experience?
Make these decisions early in the planning process. Higher quality video requires faster machines and newer technologies. Will viewers need to have a particular codec installed? It may be wise to prepare multiple versions of the video files and deliver the most appropriate version to each viewer, depending on the system and bandwidth. The alternative is to produce more highly compressed video of poorer quality intended for the ?lowest common denominator? machine, if reaching a broad audience is the goal.
Make sure the technology can support your goals, required media types, playback platforms, and interactivity. Select the hardware and software that will be used to create the project, including audio/video capture hardware, editing software, compression tools, codecs and encoders, authoring software, CD burners, HTML tools, and media servers. Architectures, such as QuickTime or RealMedia, are system extensions that allow a computer to display video. Applying a codec (compressor/decompressor) makes the video and audio compact enough to play from a CD-ROM or over the web. Each codec has different characteristics and applications. A format is the file description in which files are stored and are part of an architecture. For example, the QuickTime architecture has a QuickTime movie file format (.mov), and it may be compressed with the Sorenson codec.
An architecture controls how dynamic media is handled by a computer, including how movies are displayed. The various architectures have some features in common, but there are differences between them. Some are intended for streaming over the Internet, while others are intended for CD-ROM delivery. Some work best on particular types of computers and operating systems. Selecting the architecture depends on the video application and the delivery platform.
QuickTime, RealMedia, and Windows Media are examples of digital A/V architectures. Each of these includes software components that provide for the creation, storage, and playback of media; each defines standard formats for storing media; and each supports certain codecs for audio and video compression.
QuickTime is a multiplatform, industry-standard, multimedia software architecture developed by Apple Computer. It is used to author and publish synchronized media types, including graphics, sound, video, text, music, VR (virtual reality), and 3D files. QuickTime 4 supports ?real time? streaming.
RealMedia is exclusively intended to deliver audio and video content over the Internet. It supports both live and ?on-demand? video. The RealMedia Server is required to stream videos, but the player may be used in a stand-alone context.
Windows Media is Microsoft?s solution for delivery of multimedia. The Windows Media server supports both live and ?on-demand? video over a TCP/IP connection. The AVI format was originally created for CD-ROM video, although it is also used on the web to some degree. It is no longer supported by Microsoft and has been incorporated into DirectShow. The extension is .asf for Windows Media files.
Emblaze is a Java-based video architecture for distributing video over the Internet. It does not require a plug-in, but it does require the latest version of Java and a fast computer. Recent versions intended for wireless devices are available.
MPEG is a family of compression algorithms, including MPEG-1, MPEG-2, and MPEG-4. MPEG Layer III Audio, commonly referred to as MP3, is a subset of the MPEG specification.
Create a detailed project specification and a master plan to track your progress. Write a script for the video prior to filming. Too much improvisation wastes time and tape. Carefully prepared questions speed up an interview and improve the quality of responses. Make a master form for logging videotape footage with scenes, time code, and comments.
Apply the complete production process from start to finish on a few sample clips. You may learn something later in the process that requires a change early in the process. For example, the face of a presenter may be too small when the movie is reduced to 120 x 160 resolution. Shooting up close would solve this problem, but it is not possible to shoot all the footage again. Play back samples of compressed video on your minimum target machine, looking for dropped frames, skips in the audio track, or lack of synchronization between the audio and video tracks. Simulate the user?s experience by uploading a clip to the server and accessing it at the same data rate on the same type of machine the target user will use.
These recommendations are for creating movies that will compress well since most of the loss in quality occurs during compression. The goal of shooting for compressed video is to produce a crisp video signal with the least noise, camera movement, and fine detail as possible so that the movie will look good in a small window. Before selecting a master tape format, evaluate your options and the technical side of the process.
It is helpful to understand how the information on analog videotape is formatted in order to understand the data that results when it is encoded digitally. Most video recorded or broadcast in the U.S. is in the NTSC format. This specification calls for a broadcast bandwidth of 4 MHz and a color subcarrier frequency of 3.58 MHz. It has 525 horizontal scan lines, 29.97 frames per second, and two fields per frame. These two fields alternate and are interlaced.
There are several differences between how a computer monitor displays an image and how a television screen works. One major difference is that the computer monitor displays images in one pass from top to bottom, called progressive scan, as opposed to the interlaced fields found in the NTSC signal. When converting material on videotape, it is necessary to compensate for these and other differences between television video and computer video.
The size of a video frame on a television does not conform to a 4:3 aspect ratio, as does a computer monitor. The spatial resolution of a television is typically 720 x 486 pixels, and if it is squeezed into a 4:3 computer screen set to 640 x 480, it results in pixels that are not square. Objects may look taller and thinner than they really are. The capture hardware or software may need to compensate for non-square pixels.
A standard NTSC television set is designed to receive and display interlaced video. Each interlaced video frame consists of two fields. Each field is made up of either the even or the odd lines of the image, and these are alternated. When displaying video, a television screen draws alternating fields about 60 times per second. Our vision assembles these fields to create approximately 30 whole frames per second. The images appear smooth to the eye due to a phenomenon known as persistence of vision. On a television, the phosphors respond slowly, which helps to obscure the comb-like patterns of alternating lines. A computer monitor uses faster phosphors with sharper resolution. The effects of interlaced video are undesirable, and they should be minimized during capture and compression.
A computer monitor scans each frame from the top to bottom, left to right, progressively drawing each line. It does not interlace fields or lines of video. This method is referred to as noninterlaced, or progressive scan. There are two deinterlacing techniques. One is to blend the two fields together, which preserves motion well and can produce sharp images. However, if individual frames are composed of two fields, a motion blur or double image may appear in areas of fast motion. The other way to deinterlace a video is to discard either even or odd fields. This removes the motion blur in still frames, but motion may not appear as smooth.
A more complex form of interlacing occurs when material originally shot on film at 24 frames per second (fps) is transferred to video that plays at 30 fps. This conversion is called the telecine process. In order to create six additional frames per second, new frames are made by interlacing repeated frames of the source material. Frames that are derived from two different frames of the original film are called interfield frames. These interfield frames result from 3:2 pulldown, which is another name for telecine. Inverse telecine is the term for removing the 3:2 pulldown.
A computer monitor is capable of a much broader color range, or gamut, than a television screen. Vivid colors and pure blacks and whites are possible. Among the compromises necessary for television are the use of "NTSC safe" colors, which rules out highly saturated reds and oranges. These colors will bleed onto more neutral hues around them.
A composite signal, with color and light information combined on a single channel, contains a lot of video noise. This appears as ?snow,? or a dirty residue on the picture. The VHS format is composite by nature. The RCA connector used to patch the video signal carries all of the information. A component signal, with color and light separated, has less inherent noise. The S-video format provides for two channels, or Y/C video. Hi-8 and S-VHS tapes use this format. It is also the analog output signal from a DVCAM. The highest quality analog format is the three-part component (Y, R-Y, B-Y), which further divides the color spectrum into two channels. It is used in Betacam SP and other master formats.
A high-quality original is the first important step towards a high-quality compressed movie. In addition to lower noise, professional cameras produce a sharper image and better colors with their superior optics and multichip design. Common types of cameras are described below:
Movies that are well-lit and have a low level of contrast between images will compress better than video shot with weak lighting. Low light conditions produce a grainy image that does not compress well. Cinepak is a common codec that performs best with bright images.
The use of a tripod often makes a dramatic impact in the quality of the final movie. Keeping the camera steady reduces subtle differences between frames, improving the temporal compression of the video. Any change in the image will cause the compressor to work harder. This applies both to camera movement and to subject movement. Use hard cuts instead of panning rapidly across a scene. Zoom slowly and only when necessary. Keep subjects as static as possible.
Keeping the detail within a scene to a minimum will help the video compress better spatially. It will also make the video easier to see when the movie is reduced in size for desktop delivery. Ask subjects to wear clothes that don?t have high contrast patterns. Plain colors are best. Stripes and checked patterns can cause moir?atterns when the video is resized and compressed. Keep the background plain for an interview. Painted backdrops are very good. Do not shoot in front of a window (to avoid reflections). You may wish to put the background out of focus to minimize detail. Bushes and trees are a particularly poor choice for the background because of the high degree of detail and motion.
Shooting with a blue curtain or painted background can improve the final results if you composite an actor into a digital still frame and ?key? out the blue. The background has little video noise in it, and it compresses well. However, blue screen video is difficult to produce. One of the secrets to shooting good blue screen video is to light with slightly yellow gels (colored filters) to improve the color spectrum. If it is not necessary to composite an actor over a different background, a painted backdrop is a simple effective option.
The goal is to record a high-quality, noise-free audio signal with a strong level. Use remote microphones whenever possible to reduce camera noise. The internal microphone installed in a camcorder picks up excessive noise from the zoom motor. The operator who handles the camera also introduces noise. Minimize unnecessary noise in the audio signal such as wind and ambient sound. A wireless lavaliere microphone is ideal for recording a speaker?s voice.
The quickest and easiest way to get digital video files into a computer for editing and compression is by transferring them from a DV camera or deck using the Firewire interface. DV cameras already store their video in a digital format, so there is no need to encode or capture from a DV source. One convenient way to digitize video is simply to dub it from an analog tape format to a DV deck or camera. The digital capture operation is performed automatically, without the hazards of dropped frames and other problems that may be introduced by a computer system with a capture card. The Sony Vaio line of computers comes with Firewire ports and software for capture, supporting the "i-Link" feature on their cameras. Macintosh G3 and G4 systems come with built-in Firewire and the ?i-movie? application for capturing and editing DV formats.
A capture card is used to digitize an analog video signal and store it on the computer?s hard drive. Depending on the system, the captured file may be as large as 20 megabytes for each running second. Many gigabytes of free space on the hard drive may be required for a video project. Be sure that the hard drive used for capture does not periodically pause to recalibrate. Higher capture rates yield higher image quality. The quality of a capture card will affect the quality of the final movie. More expensive systems, such as the Media 100, Avid Media Composer, Digital Origin Telecast, and Truevision Targa 2000, provide better image quality and more features than less expensive cards such as the Pinnacle DC-30 or the Truevision Bravado card.
Configuring a system to capture video can be a tedious undertaking. Only minimal system software should be running, and all extensions that are not required should be unloaded. A fast drive that has been defragmented should be prepared to receive the video file in real time. All unnecessary devices, such as scanners and Zip drives, should be removed. Some compressionists place a diskette in the internal floppy drive and a CD in the player to prevent the system from checking those resources.
Most cards are shipped with an application to control the capture process. One of the more popular and robust of these is Adobe Premiere. In Premiere, if the "Warn on Dropped Frames" option is selected, the system flashes a message when frames are lost during capture.
The most common problem in capturing video is dropped frames. This means that some frames of the original video do not get encoded. An attempt to capture NTSC video at the original rate of 29.97 frames per second (fps) is futile if the computer system is only capable of encoding at 15 fps. Dropped frames may be sporadic, which make the digitized product appear to pause randomly. Although NTSC video plays at 29.97 frames per second, some applications attempt to capture at 30 fps. This difference in frame rates may cause a warning that you have dropped frames. Dropped frames may be a result of trying to capture video at a rate that the system cannot sustain in megabits per second. Reducing the quality setting during capture may allow the system to make its best effort without dropping frames.
Generally, it is best to capture at the largest possible resolution. A 640 x 480 capture may yield better results if the final resolution is to be a 320 x 240 movie. When a larger image is reduced to a smaller final size, several pixels are averaged to make each final image pixel. This may reduce video noise and improve the image quality. Some capture cards are optimized to encode at 320 x 240 or 640 x 480. Capture at a default size, in the 4:3 aspect ratio, to avoid dropping frames.
There are a couple of deinterlacing options when video is captured at full-screen resolution, such as blending fields to preserve the motion blur effect of interlacing. If the original source was shot on film and transferred to videotape, capturing at full-screen resolution and the original frame rate will allow the software to remove the 3:2 pulldown and return the material to its original 24 fps. This will look much better when compressed, and the file size will be smaller.
The typical capture process converts the analog video into a digital file and stores it on a hard drive. A quality capture card, a fast hard drive, and a finely tuned system are all interdependent in getting good results. The audio card may be integrated to receive a signal while capturing. Test a few sample clips, and check audio levels carefully on playback. Capture audio at 44.1 kHz, 16-bit resolution, in either mono or stereo depending on the project specification. Audio that is recorded with video in the DV format is normally sampled at 48 kHz by the camera itself.
Captured video often exhibits a black edge on the perimeter, called "overscan" or "edge blanking." If the source material was edited for television and intended for a normal receiver in the underscan mode, the black edges are part of the original content. Almost all capture cards will grab images all the way out to the edge, which includes a black stripe. Black borders are easy to crop, if the captured image is larger than the final movie dimensions. If video is captured at the final size, it will be necessary to scale it up to remove edge noise, which will degrade image quality. There may be edge noise around the perimeter of the captured video. Ragged edges that crawl on playback can also be cropped if the capture size is larger than the desired final size.
The quality setting on the capture system controls how much hardware compression is applied during capture. Larger files with superior image quality are captured on the higher quality settings. If the maximum rate of capture is exceeded by the demands of the quality setting, the capture card will drop frames. Capture video at the maximum quality the system can handle, usually a minimum of 4 to 6 megabits per second. A faster drive, an Ultra SCSI interface card, or a RAID system can improve capture rates.
Digitize directly from master tapes, not copies. Any second-generation tape will have more noise than the original master. Excessive playing of master tapes will degrade their quality, particularly with a fragile Hi-8 master.
Test the audio levels on a few clips before capturing your whole project. If the track is distorted, it will need to be captured again.
Video editing is an enjoyable and creative process. An editor combines segments of video, assembles them in order, and adds transitions and effects. Adobe Premiere is one of the more widely used desktop editing applications.
Many simple capture programs also include the capability to assemble selected cuts into a larger movie. This is generally done by importing the desired segments of the original clips and arranging them in order, along a timeline, to create a new movie. It may be possible to add some simple transitions between cuts, such as fades and wipes. When the assembly is complete with transitions, see that the audio track is synchronized with the video and shows strong levels. Title and save the final movie uncompressed and archive it. If using the QuickTime architecture, save it as "Self Contained." The final step will be to compress the movie.
Adobe Premiere, Avid Cinema, Strata VideoShop, U-Lead Media Studio, and other editing programs can all produce sophisticated results. These programs feature a timeline-based interface that allows a user to put clips in order and apply transitions between the clips. There are many plug-ins made by third parties for Premiere to add special effects, transitions, and filters. Avid Cinema is one of the easiest editing packages to learn, but it lacks some of the advanced features of Premiere. Adobe AfterEffects allows a user to composite various layers of video and to create special effects with plug-ins. It is a powerful and complicated tool, the video equivalent of Photoshop in the graphics world.
Clips with shifts in color, gamma problems, and other problems that appear during shooting or capture should be repaired prior to editing. Ideally, all the clips that are assembled will conform to similar color and contrast levels.
Like using too many different fonts in a document, it is a bad practice to add too many elaborate transitions. They may be distracting to the viewer, and they pose problems in compression. Quick cuts from one scene to the next work well, as do quick cross-fades, keeping both to a minimum. It is possible to create a scene that zooms out from the center, spins, wraps around a cube, and flies away. Only in rare instances is this kind of "eye candy" appropriate to enhance or clarify the message. A simple cross-fade dissolve with a maximum one-second duration will compress well and can be used in just about any editing situation.
Perform all editing and effects processing on uncompressed video at the highest resolution available. Do not resize the video with the editing software. Render, which means to output the final product, at the highest quality possible. If you input DV using Firewire, render to the DV format with the same window size and audio sample rate to avoid losing quality.
Figure 1: Screen from Adobe Premiere editing window
Save the edited file uncompressed or at least in the same resolution and size at which it was captured and edited. Do not save it in the capture hardware format, which is usually lossy (or degraded) and requires the exact same system to open the file. The Animation codec at 100 percent quality is a lossless software-only format. Budget lots of room for archiving this uncompressed, edited master file. If there is not enough space for the backup, a compromise is to use the Photo JPEG codec at 100 percent quality. This will compress it significantly, and some scenes may contain visible artifacts. Among the options for archiving the uncompressed, edited movie are CD, DVD, removable cartridges, and DAT tapes.
Raw video is defined by more data than can be stored or transferred easily. Uncompressed NTSC video requires about 27 megabytes of data for each running second. To arrive at this figure, multiply each frame (720 x 486) times 24 bits of color, times 30 frames per second. At this size, less than 25 seconds would fit on a CD-ROM. The data transfer rate of the fastest CD-ROM player would not be adequate to play uncompressed video without dropping frames. Another bottleneck in desktop video is the speed with which the video card can redraw the screen. Current DirectDraw routines on the Windows platform improve performance, but this is just for the video portion. The data rate of uncompressed CD-quality audio is 150 kilobytes per second (KBps), which is equal to the data transfer rate of a full T1 Internet connection before adding any video.
Compression is a process that reduces the size of files by removing redundant data. Significant reduction is accomplished if some of the less critical data is also removed, which results in degraded images and sound, or lossy compression. Once data is thrown out during lossy compression, it can never be restored. A codec is an algorithm that both compresses and decompresses a media file, and the same codec must be applied at each end of the process. When selecting a codec to apply, a major consideration is whether the same version of the same codec will be previously installed on the target machine, or whether the viewer will be required to download and install a new codec to see the movie.
Video is generally compressed with both spatial (interframe) and temporal (intraframe) techniques to remove redundant data. These are very different processes.
Spatial Compression: This method removes redundant data within a given image or frame. It operates primarily on areas of flat color with very similar pixels. The codec specifies the coordinates of the area and the color without great detail. Spatial compression also occurs when the JPEG algorithm is applied to a still graphic. Removing fine details before compressing can improve the spatial compression of an image. Video noise may appear to be fine detail to the codec, and removing it will improve spatial compression. Shooting video with basic, still backgrounds leads to better compression.
Temporal Compression: In this method, the codec identifies only differences between consecutive frames and stores those differences rather than the entire image. The original reference frame from which the differences are calculated is called a keyframe. A keyframe in any video stream contains the complete image and may be used as an index point. Frames based on the differences between frames are delta frames. They define only the areas that are different from the previous frame and are smaller than keyframes. A new keyframe is placed at regular intervals to compensate for errors in delta frames.
Using a tripod when shooting reduces camera movement, providing a stable background that improves temporal compression. Avoiding complex transitions and frequent cuts that completely redraw the frame can contribute to smoother-looking compressed video.
A codec performs many mathematical calculations that generate each compressed frame, which may take several seconds. Later, the frame must be decompressed fast enough to play in real time at the established frame rate. An asymmetric codec takes longer to compress, but it decompresses without delay. Codecs used for live broadcasts and video teleconferencing must be symmetric, meaning they both compress and decompress in the same amount of time. The H.263 specification is a symmetric codec used for teleconferencing.
There are several ways to determine the codecs currently loaded on a machine. In Windows 95 or later, open the Control Panel and check under Multimedia/Devices/Video Compression Codecs. The list of video codecs there will include the ones that the operating system installed, as well as those that are installed along with a new version of Media Player (or any other media-viewing device). That list may include such items as Cinepak, ClearVideo, Duck TrueMotion, Indeo (versions 3.2 and 5.04), Microsoft H.263, Microsoft MPEG-4, mvicod32, RLE, VDOWave, Video 1, and the Vivo H.263 codecs. When a new version of QuickTime is installed, all of the recent versions of both video and audio codecs are automatically installed on the computer.
Audio codecs are listed in the same location and may include the following: Microsoft CCITT A-Law and u_Law, Fraunhoffer MPEG Layer-3 (MP3), Indeo audio, Microsoft IMA ADPCM, TrueSpeech, Voxware, Windows Media Audio, and Microsoft PCM Converter.
Some codecs are designed to work with Windows Media, and others are designed to work with QuickTime. The RealMedia codec is intended for use when media is streamed from a Real Server, but it can be viewed with the Windows Media or QuickTime player. For wide distribution, the Sorenson Video or the Cinepak codec is used in most cases for compressing QuickTime movies. Both play very well on any platform. For Windows delivery, the Windows Media codecs or the Indeo 5.04 codec are popular choices. Each codec performs best within a specific range of data rates.
The Sorenson Video codec produces high-quality video at any data rate. Because it places considerably high playback requirements on the client machine, Sorenson is a good choice when streaming at data rates of around 100 KBps or less. It is a preferred QuickTime codec for low bandwidth delivery and is widely used to compress movies on the Internet. Conversely, Cinepak has low playback requirements and looks better at data rates of 250 KBps and higher. Cinepak is installed on a wide range of machines and is most commonly used for CD-ROM titles targeting a broad audience.
The recent Windows Media ASF compression algorithms are excellent for both video and audio, whether streaming at low bandwidth or delivering on CD-ROM. The Windows server can automatically scale the transfer to best fit the client bandwidth. Indeo 5.04 is a decent generalpurpose codec that uses YUV color space.
MPEG-1 is a very efficient algorithm for creating highly compressed audio/video multiplexed files and was designed for playback at bit rates between 1 to 3 megabits per second from single-speed CD-ROMs. The native screen resolution of MPEG-1 is 352 x 240, which may be interpolated smoothly to double that size or resized to any resolution. The frame rate by default is 30 frames per second. The Fraunhoffer MPEG Audio Layer-3 codec is the de facto standard for MP3 audio-only files.
MPEG-2 is the higher-speed version of MPEG-1, designed for playback at bit rates between 6 to 15 megabits per second. This is faster than many older computer systems can display smoothly without hardware assistance. It is the standard compressed video format for DVD movies, along with the Dolby AC-3 audio compression format used in the U.S. In Europe, the MPEG-2 audio compression format is more commonly used than AC-3.
Several factors contribute to the apparent quality of a digital video. Among those that are easily improved with software are the contrast, black and white levels, hue and saturation, and undesirable video noise. One of the first steps to perform before compressing is to crop the video frame to eliminate any edge noise or black borders introduced by capturing scenes intended to be viewed in the underscan mode. Sophisticated software can be used to scale the image to your target resolution. The most powerful and widely used tool for processing and compressing digital video is Media Cleaner, or simply Cleaner, from Discreet (formerly Terran). Cleaner will allow all of the following operations to be performed with excellent results.
Most video segments can be improved for desktop delivery by increasing the contrast by 10 or 15 percent. Doing this appears to remove a thick residue from the video screen. It is also a good idea to restore black areas to true black, using the Black Restore feature. Perform a similar operation to restore white areas, improving image quality and compression. The video may be improved by carefully adjusting the brightness or gamma in small increments. After selecting settings for these adjustments, scrub through the video looking for scenes in which the changes introduced may be unnatural or too severe.
Much better spatial compression is achieved if the granular detail in an image is reduced. Video noise appears to the compressor as though it were fine details that should be retained. A blur filter can be applied to reduce the noise and improve compression, but the final result will be less crisp. Adaptive noise reduction is a much better solution, provided by Media Cleaner. The adaptive noise reduction filter blurs areas of low contrast, but leaves edges sharp, improving the compressed result. Any live video will benefit from this filter.
When choosing the settings for a compressed video file, the most important decisions to be made are which codec to use, the size of the video window (screen resolution), the frame rate, the data rate, and the frequency of keyframes. There are limiting factors and trade-offs involved with each choice, and they are usually based on the way in which the audience will access the video files.
There are two very different types of audiences for digital video. In one case, the developer is able to specify and configure the exact system on which the video will be viewed, such as in a kiosk or on a corporate intranet. In the other case, the general public is the audience, and the minimum system requirements for viewing the video will be rather low. Within these two categories, there is the option to deliver the video locally, from a CD-ROM or other media, or to deliver it over a network. The latter choice severely limits the data rate and frame rate in most cases and negatively impacts the quality of the video experience.
The choice of codec is in some cases dependent on the architecture selected. If QuickTime is used, a broad set of options exists. For movies that may need to stream over the Internet at less than 100 kilobytes per second, Sorenson Video will probably yield the best results. Quality is improved considerably if variable bit rate encoding is used. The current Windows Media video compression formats compete favorably with Sorenson.
When using the QuickTime architecture, there are several good options. For streaming complex audio at low bandwidth over the Internet, the QDesign Music codec is a good choice for content containing music. It is possible to select the data rate with the QDesign Music codec. Test the result by first allocating about 1 kilobit per kilohertz. At this rate, an audio file with vocals sampled at 22 kHz will stream at 22 kilobits per second (2.5 Kbps). Instrumental music may only require half of a kilobit per kilohertz. A stereo soundtrack requires a somewhat higher data rate than mono. QDesign produces better results on audio tracks with levels reduced to 70 percent of full (-3db). This can be accomplished by normalizing the track to 70 percent.
As the data rate is increased, the quality improves, but QDesign requires considerable processing power. This can rob cycles from a video track and cause dropped frames if the audio data rate is set too high. As with all audio/video architectures, the audio track will be allocated the computing cycles needed to play smoothly at the expense of dropped video frames. This is a good argument for testing combined audio/video rates and compromising to get the best mix.
The Qualcomm PureVoice codec is designed to produce clearly intelligible speech at extremely low bit rates. This is an excellent choice for video of a talking head. It is not useful for music of complex tracks, because it tries to model everything as speech. It is typically used to compress 8 kHz audio, but higher sample rates yield better results.
IMA 4:1 compression is a good choice for CD-ROM audio since it is widely installed and leaves a lot of computing cycles for video. MPEG-1 is an excellent choice for the highest quality multiplexed audio/video delivered from CD-ROM or DVD and plays on most platforms. For Windows-specific delivery over the Internet, the latest Windows Media codecs for both video and audio yield excellent quality.
The data rate, or bit rate, is the most important factor in determining the quality of compressed video. It determines the file size and must be matched to the method of delivery that is specified for the video. The factors that contribute to the data rate requirements are the media (CD-ROM or DVD) or connection speed (56K, ISDN, or T1) that is used, the amount of storage space available, and the minimum speed of the target machine.
Here are some data rate guidelines. It will be necessary to test playback on the minimum target machine to determine the optimal rate.
The data transfer rate of modems is expressed in kilobits per second. A 56K modem usually processes data at around 40 Kbps, due to error correction and other factors. This translates into five kilobytes per second (KBps), after dividing by eight to convert bits to bytes. Video bit rates are typically expressed in kilobytes per second (KBps), not kilobits per second (Kbps).
A standard 74-minute CD-ROM holds about 650 megabytes (MB) of data. Components other than video, such as installers, read-me files, and a menu-driven program with graphics, might leave 600 MB for video. If a predetermined number of minutes of video must fit on the disc, divide the number of kilobytes of space on the disc by the length of the movie in seconds to determine the maximum bit rate for the movie. A 40-minute video lasts 2,400 seconds. A disc with 600,000 kilobytes of space divided by 2,400 allows a maximum bit rate of 250 kilobytes per second (KBps).
Video compressed with the Sorenson Video codec requires a fast processor to decode video at high bit rates. Video compressed at 200 KBps with the Sorenson codec may require a 300 MHz Pentium II or Macintosh G3 running at 250 MHz minimally to play smoothly. Only testing the compressed video on the minimum target machine will prove whether it can decode without dropping frames.
The frame rate is the number of times per second the computer completely redraws the video window. A high frame rate requires considerable computing power and an extremely high-speed connection if the video is streamed over the Internet. Frame rates for desktop video typically range from 12 to 30 frames per second (fps). Motion appears relatively smooth at about 15 frames per second. For reference, film is shot at 24 fps, PAL video at 25 fps, and NTSC video at slightly under 30 fps (29.97). Choosing the frame rate that seems to suit the content is rather subjective. At a given bit rate, a low frame rate will produce sharper images but jerky motion. At the same bit rate, a high frame rate produces blurred images but smoother motion. There is always a trade-off in creating multimedia. The best way to select the frame rate is to test several for the best compromise between clarity and smoothness. A frame rate that is an even divisor of the source frame rate usually yields best results. For NTSC video, use 30, 15, 10, or 7.5 fps. For PAL video use 25 or 12.5 fps, and for film use 24 or 12 fps.
A number of factors must be considered in making this decision, most importantly the nature of the content itself and the degree of detail that needs to be clearly displayed. A talking head that fills the window may be effective at a small window size, such as 240 x 180. However, a training video demonstrating the performance of a process that requires detail may need to be shown at 320 x 240 or larger. At a given bit rate, larger window sizes will be very pixilated, with poor image quality compared to smaller windows. Many video cards are able to "interpolate" a double-sized image from the original with much better results than are achieved by decoding an image that is twice as large. Avoid expanding the video window on playback to an arbitrary size that is not an exact multiple of the original.
Other factors to consider in determining the optimal window size are the data rate, the frame rate, the codec, and the target machine. Changing any one of these factors will impact the others radically. Use the smallest possible window to get the message across when streaming video over the Internet. The maximum size for a window delivered over a 56K modem is about 160 x 120, over ISDN about 192 x 144, and over a T1 line about 240 x 180. From a CD-ROM, 320 x 240 is most common, or 352 x 240 for MPEG. With a fast processor, 640 x 480 is easily decoded from CD-ROM. A DVD-ROM can play back at 640 x 480. Twice as much content can be stored at 320 x 240 and then doubled on playback with smooth interpolation.
Each keyframe completely defines the image in the frame, while delta frames are approximations based on the best guesses that the compression engine can make. Much more data is required to define each keyframe than the frames between them. It is important to have enough keyframes to support changes in the video scenes and maintain the integrity of images while avoiding unnecessary keyframes that merely add bulk to the size of the video.
The content of the video and the method of compression are factors in choosing the keyframe rate. Using the Cinepak codec, it is common to place a keyframe each second. For video with fast action and rapidly shifting backgrounds, it may be better to place one every half second. In an active clip running at 12 frames per second (fps), it may be best to place a keyframe every six frames. With Indeo 5.04 and Windows Media codecs, the need for keyframes is very closely related to the pace of the action. For a talking head, every few seconds is sufficient, but for fast action this is not enough. The Sorenson Video codec works best with relatively infrequent keyframes, typically every 10 seconds. A keyframe every 150 frames for video running at 15 fps is usually adequate.
If a user will randomly access points in the video, more frequent keyframes will allow more freedom. If a delta frame is accessed, it must be calculated from the nearest keyframe, which may take time to recover. It is best to place cues in the video track at keyframes you plan to make accessible to the user.
To begin, launch Cleaner and open a video clip in the File menu, or drag the source movie onto Media Cleaner?s Process Window. A small icon of the movie appears in the Process Window. View the original movie by double-clicking on the icon of the movie in the Process Window. Click and drag over the movie to set the cropping rectangle, if the image needs to be cropped. The movie controller may be set to start and end points as desired. Return to the Process Window and open the Advanced Settings dialog box by double-clicking in the "Setting" column of the clip.
On the left column of the Settings Window, there are many presets defined for various delivery options, ranging from "QT-1X CD-ROM, Cinepak" to "RealVideo-Web Movie." A good way to learn about Cleaner is to select a preset similar to the project specifications and look closely at the Summary settings for Output, Tracks, Image, Adjust, Compress, and Audio.
The real power of Cleaner lies in the many ways the user can process a video. Under the "Adjust" tab, experiment with increasing the brightness and contrast of the clip. Windows display gamma is different from Mac OS gamma, and video looks much darker in Windows at the same gamma setting. Try setting the gamma in Cleaner 20 to 30 points lower when working in Windows. Also, vary the Hue and Saturation settings to see the results. Test variations in the Black restore and the White restore features to achieve more consistent levels.
Open the "Compress" tab and select a codec, the frame rate, keyframe frequency, and the bit rate for the video. In QuickTime, there is the option of Variable Bit Rate (VBR) encoding, which greatly improves the final product. The movie may also be constrained to a specified data rate. Next, open the ?Image? tab and resize the image as needed, apply Blur or Sharpen filters, and use the Flat Field Adaptive Noise Reduction filter for good results in most situations. Under the "Audio" tab, choose a codec, the sample rate, the bit depth, and data rate. Mono is generally chosen over stereo since it consumes half the space and requires half the bandwidth. There is also a control for the volume level, where the Normalize option is found. Among the other audio processing options are High and Low Filters, Noise Removal, and Dynamic Range adjustments. It is best not to be too aggressive with any of these processes, which can significantly alter the sound quality.
It may be best to apply processes to the video clip to see the results first, before compressing. This may speed up the time it takes to perform the compression. Click on the "Start" button to apply any of the operations chosen in the settings. Cleaner will prompt for a destination for the final files before processing. While processing or compression is being applied to the clip, the results can be previewed using the Before/After slider in the output window. After a few seconds, the process can be cancelled and changes made to the settings as needed. Remember that QuickTime movies will need to be "flattened" to play in the Windows environment if they are created in the Mac OS. To flatten a movie is to remove the Macintosh headers from the data stream and apply the .mov extension.
Figure 2: Screen from Media Cleaner Pro
The two primary options for distribution are removable media and network delivery. The majority of compressed video is designed for delivery from removable media, such as CD-ROM or DVD, rather than over a network. This is because the data that defines video and audio content needs to flow uninterrupted in a fast, reliable way to the computer?s processor, and then to the video and sound cards. The Internet is a packet-driven network. HTTP was never intended to stream data continuously, but rather to move small, discrete packets of data from place to place. The route through which data is passed and the transfer rate are both variables. With faster connections and the Real Time Streaming Protocol (RTSP), streaming media over the Internet is possible, but it still is not as reliable as delivery from local media. When using an RTSP server, replace the http:// with rtsp:// in the URL. A web-CD offers the best of both worlds. It is a hybrid CD-ROM that contains video files for delivery along with HTML-based programs for use while connected to the Internet.
Project specifications often require a context for video files. This may include background information and a navigation system that allows users to interact with the content, to select clips, and to control playback. The most widely used tool for authoring an interactive title is Macromedia Director. QuickTime integrates especially well with Director. Movies can be linked to interactive menus, and the program provides control over text, graphics, animation, and audio media types. It is wise to test the playback of compressed video running within the Director environment to ensure that the bit rate is not too high. This can cause dropped frames, as Director manages other events happening along with the video. It is best to minimize other activity while a video is playing. It is advisable to place the video window on the stage of Director so that the top left coordinate is a multiple of four on both the X and the Y axis. Place nothing on top of the video window, and set the property of the movie to play "direct to stage." Place all of the compressed video files in a directory at the root level of the CD-ROM, and link them as cast members to the Director "projector," which is a stand-alone executable application created by Director for distribution. Another option is to create a Shockwave version of the Director movie; however, embedded links to video files on a server should be carefully tested on various playback platforms.
After testing the compressed video linked to an interactive presentation program, the next step is to record a CD-ROM for mastering or distribution. CD-Recordable (CD-R) drives, or burners, are commonplace and easy to use, and the media is inexpensive. Unless there are extreme time constraints, it is safest to record the master at a slower speed (2X or 4X). This has no effect on the playback speed, but it lets the recording process work at a more reliable pace and avoids buffer overrun. The standard length of a CD-R is 74 minutes, which holds 650 megabytes (MB). The number 74 refers to the number of minutes of stereo Red Book audio (4.1 K, 16-bit) that the disc can contain. The recommended software packages for burning CD-Rs are Easy CD Creator from Roxio for Windows and Toast from Astarte for the Mac OS.
This involves building HTML pages that contain pointers to the video files on the server and placing the files on the server using file transfer protocol (FTP). One very easy way to do both is with the Macromedia application Dreamweaver, which has built-in FTP capability. It can also synchronize new files on a hard drive with a web site. Those who use the Windows environment may also perform these functions with Microsoft FrontPage. It is important to plan and design the information architecture of the site thoroughly before creating any graphics or HTML documents.
The <EMBED> tag is used to link a compressed video file to an HTML document. Most editors, such as Dreamweaver and FrontPage, automate this process. If Media Cleaner is used to compress the video, it creates the <EMBED> tags ready to paste into a web page. An audio file can also be embedded to provide a soundtrack for a web page.
QuickTime allows two options: displaying the movie within the browser or calling up the player. It is necessary to use the Pro version of QuickTime, available from Apple, to create a movie link. To link to an image, open the image (JPEG, GIF, PNG, TIFF, or BMP) in QuickTime, and select Save As from the File menu. Choose Make Movie Self-Contained, and close the dialog box. This static movie has the same dimensions as the image and can be used as a ?poster? on the web page to play the movie in place within the browser. When the movie plays, playback controls will appear, making it necessary to add 16 pixels to the height attribute of the movie. An embed tag for a poster movie might appear as follows:
<embed src="Qtvideo1.mov" height="336" width="240"
controller="false" href="http or rtsp content"
type="video/quicktime" target="myself"> </embed>
The target becomes "quicktimeplayer" if you choose to launch the player. The height and width attributes can be any size, since the player will pop up over the browser window.
Download the MakeRefMovie tool from the Apple site to create a reference movie. The reference movie provides a single link to streaming movies encoded at different bit rates. Another useful tool is the Plug-in Helper from Apple. If a movie is exported from the Plug-in Helper with the ?Disallow Saving? box checked, viewers cannot copy it. Another useful tool is LiveStage Pro from Totally Hip Software. It can be used to create wired sprites for intermovie communication.
True streaming differs from downloading a file and then playing it. Progressive download is still not considered streaming, although a video or audio file can begin playing before downloading is complete. For streaming video, a special server and protocol is required. The RealMedia server was designed for this purpose, along with the RealMedia Player, recently dubbed the "G2" player. Windows Media can be streamed from a properly configured Windows server, and it is capable of sensing the connection rate and "scaling" the data transfer rate of the video to match the client. A Macintosh server can stream QuickTime, and there are several options that can be controlled in this flexible environment. QuickTime movies can be ?wired? for interactivity. Wired movies can include custom controllers, hotspots to jump around in the movie, or links to another URL.
Another popular method of distributing video is through free-standing kiosks, usually with a touch-screen interface. The major advantage in this type of delivery is total control over all the technology used for playback. Hours of full-screen, high-quality movies can be stored on a hard drive in a secure enclosure. Kiosks are popular at museums and nature centers, and they are used to provide point-of-sale information and product demonstration in the marketplace.
With the proliferation of DV camcorders and the relative ease with which video content can be imported into the computing environment, the amount of video that shows up on computers will increase exponentially. And, as broadband Internet service is available to rapidly growing numbers of subscribers, many more individuals will be able to receive videos on their home computers. Before long, compressed video may be as commonly attached to email messages as scanned photos have become. It will be very interesting to see how the Internet community addresses the incredible demand for bandwidth and storage capacity that digital video introduces. Compressed video will continue to have a significant impact on education and training, as well as on business communications. This, in turn, will influence how we humans think about the art of communication itself.