High Resolution Video on the Web
With the advancement of broadband internet services, high quality video on the web becomes possible. Watching movie trailers over the net is now a common activity among movie lovers and many webmasters have begun to enrich their web sites by incorporating video presentations. To support the e-Learning activities, the Computing Services Centre (CSC) has recently upgraded the video streaming service both in the video encoding and the streaming servers such that video teaching materials can be accessed anywhere by taking advantage of the bandwidth increase of Internet access. The following is a brief account of the technologies involved in this upgrade.
Roughly speaking, there are two common methods to publish video materials over the net. The first one is to store video clips as data files on web sites. This requires users to download a video file completely to their local disk (this may take several minutes depending on the file size and the bandwidth) before the video can be played on their client PC. This is definitely not welcomed by users, though it is a simple and inexpensive solution to web developers. The second method is to store video clips as streaming files on a special video server. This allows videos to be accessed in real-time, which means users can watch any parts of a video without the need to wait for a long downloading time. This is usually called Video-on-Demand (VoD). It has been introduced to the University since 1999, and is called the CityVoD.
At the time when the CityVoD was first released, videos were captured at a resolution of 352x288 (it is called CIF, which is the resolution of VCD). Although this resolution was the best one available in VoD service at the time, it is not good enough for some of our video materials, such as recording of lectures and forums. Since lectures and forums might have PowerPoint slide shows and demonstrations, CIF resolution could not show all the details of them. In order to show the content clearly, we would need to quadruple the resolution to 720x576 (it is called full-D1 or SD, which is also the resolution of DVD-Video).
At the time when we launched the CityVoD, it was nearly impossible to adapt such a resolution. However, as technology advanced, we foresaw that streaming full-D1 video over the net would become possible in the near future. Therefore, we kept an eye on the technology of video processing and encoding in the past few years in order to adapt full-D1 resolution on our video services. When powerful servers with huge and fast storage become available, two challenges are left to be solved. The first one is how to convert interlaced videos, which are stored in VHS tapes or DVD-Video discs, to non-interlaced video clips. It is called de-interlacing (or line-doubling). The second is to reduce the bandwidth of encoded files so that our campus network will not be overloaded when many such videos are streaming over it. Let's get into detail one by one.
Video cameras were originally designed for interlaced display such as CRT TV, which displays images by drawing lines (or scanning lines) from top to bottom and in alternating interlaced fields (see figure 1). Each field contains half of the scanning lines of the display. The first field consists of odd lines (1, 3, 5 ¡K), the second field consists of even lines (2, 4, 6 ¡K), and so on and so forth. In PAL video standard, there are 50 such fields per second, and each field consists of 288 lines, which gives a total of 576 lines on the display.
Figure 1: Alternating interlaced fields of CRT TV
Traditionally, people say PAL video has 25 frames (complete images with all scanning lines) per second, and each frame is comprised of two interlaced fields (see figure 2a). This is INCORRECT. To be exact, PAL video has 50 unique fields per second, and each field is an independent snapshot in time (see figure 2b).
Figure 2a: Traditional thinking of interlaced fields of a moving circle in video-based materials
Figure 2b: Interlaced fields of a moving circle in video-based materials
PC, however, uses non-interlaced (progressive) display, which shows all scanning lines (without alternating field) in each time interval. Therefore, to play interlaced video on PC (or progressive display), we have to convert interlaced fields into non-interlaced frames (complete images) in order to display them progressively (see figure 3). This process is called de-interlacing*.
De-interlacing video-based materials is not an easy job†. Many de-interlacers in the market cannot do the job well. Some of them simply scale each field into frame by interpolating "missing" lines from the lines above and below. This results in images that look very soft as the image resolution was lost. Others merge two consecutive fields directly to form a frame. Although this can maintain the image resolution, when there is a moving object, it looks like as if there were spiky lines (like tines of a comb) sticking out from the sides of the object. This is usually called combing or feathering (see figure 4).
† De-interlace movies (film-based materials), on the other hand, is quite easy since movies themselves are series of progressive images. Only simple 3:2 Pulldown detection is needed to de-interlace them, which will not be gone into details here.
Many well-known, good de-interlace techniques, such as Directional Correlational Deinterlacing (DCDi™) by Faroudja, are unfortunately available only in hardware chipsets and unavailable in PC software.
After sourcing and studying both commercial and open-source software solutions for some time, we finally come up with a reasonably good solution by enhancing and integrating several products together.
As you may know, the bandwidth of DVD-Video is about 5 to 6Mb/s on average. It may cause traffic jam over our campus network if a lot of such videos are streaming over it. In order to tackle this problem, we tested the most common encoding methods, e.g. Windows Media, RealMedia, MPEG4, etc., in order to source an efficient one so that the video can be encoded in a much lower bandwidth without sacrificing the video quality. Thanks to the release of RealMedia version 10, a good quality (well developed) video with full-D1 resolution can be encoded to bandwidth as low as 1Mb/s. If the video is in low quality, bandwidth of 2Mb/s is needed.
Higher bandwidth is needed for low quality video mainly because of the noise (technically, it is called mosquito noise) contained in video frames. Most of the encoded video formats do not store frames individually, but store the differences between frames. If a video contains certain degree of noise, the difference between frames will be great and require more bandwidth to encode it. Since many of our recordings were captured indoor (such as classrooms and lecture theatres) without special lighting, this will inevitably introduce certain degree of noise to the recorded videos. To keep the bandwidth required to a minimum, we tried applying de-noise filters to this kind of videos. Unfortunately, simple de-noise filters available in common video encoders may also remove the details with the noise. Thanks to many video enthusiasts all over the world, a lot of open-source de-noise filters are available to tackle different kinds of noise. Finally, a filter named "high quality three-dimensional de-noise filter" was found to work very well in most of our video materials. With the help of the de-noise filter, we can always keep all the video materials to a bandwidth of about 1 Mb/s.
Video streaming with full-D1 resolution is now available in our video services. To experience what this new standard can bring you, simply visit our CityVoD web pages (select "CityVoD" in the box of "News Services" in the e-Portal) and click on the "high resolution version" icon on the right of your desired video. Please note that this version of video is available for on-campus machines only. For smooth playback, it also requires Pentium III 1GHz processor with at least 256MB system memory and RealPlayer 10 installed.