With
the advancement of broadband internet services, high quality
video on the web becomes possible. Watching movie trailers
over the net is now a common activity among movie lovers and
many webmasters have begun to enrich their web sites by incorporating
video presentations. To support the e-Learning activities,
the Computing Services Centre (CSC) has recently upgraded
the video
streaming service both in the video encoding and
the streaming servers such that video teaching materials can
be accessed anywhere by taking advantage of the bandwidth
increase of Internet access. The following is a brief account
of the technologies involved in this upgrade.
Roughly speaking, there are two common methods to publish
video materials over the net. The first one is to store video
clips as data files on web sites. This requires users to download
a video file completely to their local disk (this may take
several minutes depending on the file size and the bandwidth)
before the video can be played on their client PC. This is
definitely not welcomed by users, though it is a simple and
inexpensive solution to web developers. The second method
is to store video clips as streaming files on a special video
server. This allows videos to be accessed in real-time,
which means users can watch any parts of a video without the
need to wait for a long downloading time. This is usually
called Video-on-Demand
(VoD). It has been introduced to the University
since 1999, and is called the CityVoD.
At the time when the CityVoD was first released, videos were
captured at a resolution of 352x288 (it is called CIF, which
is the resolution of VCD). Although this resolution was the
best one available in VoD service at the time, it is not good
enough for some of our video materials, such as recording
of lectures and forums. Since lectures and forums might have
PowerPoint slide shows and demonstrations, CIF
resolution could not show all the details of them. In order
to show the content clearly, we would need to quadruple the
resolution to 720x576 (it is called full-D1
or SD, which is also the resolution of DVD-Video).
At the time when we launched the CityVoD, it was nearly impossible
to adapt such a resolution. However, as technology advanced,
we foresaw that streaming full-D1 video over the net would
become possible in the near future. Therefore, we kept an
eye on the technology of video processing and encoding in
the past few years in order to adapt full-D1 resolution on
our video services. When powerful servers with huge and fast
storage become available, two challenges are left to be solved.
The first one is how to convert interlaced
videos, which are stored in VHS tapes or DVD-Video
discs, to non-interlaced video clips. It is called de-interlacing
(or line-doubling). The second is to reduce the bandwidth
of encoded files so that our campus network will not be overloaded
when many such videos are streaming over it. Let's get into
detail one by one.
De-interlacing
Video cameras were originally designed for interlaced display
such as CRT TV, which displays images by drawing lines (or
scanning lines) from top to bottom and in alternating interlaced
fields (see figure 1). Each field contains half of the scanning
lines of the display. The first field consists of odd lines
(1, 3, 5 ¡K), the second field consists of even lines (2, 4,
6 ¡K), and so on and so forth. In PAL video standard, there
are 50 such fields per second, and each field consists of
288 lines, which gives a total of 576 lines on the display.
Figure 1: Alternating interlaced fields of CRT TV
Traditionally, people say PAL video has 25 frames (complete
images with all scanning lines) per second, and each frame
is comprised of two interlaced fields (see figure 2a). This
is INCORRECT. To be exact, PAL video has 50 unique fields
per second, and each field is an independent snapshot in time
(see figure 2b).
Figure
2a: Traditional thinking of interlaced fields of a moving
circle in video-based materials
Figure
2b: Interlaced fields of a moving circle in video-based materials
PC, however, uses non-interlaced (progressive) display, which
shows all scanning lines (without alternating field) in each
time interval. Therefore, to play interlaced video on PC (or
progressive display), we have to convert interlaced fields
into non-interlaced frames (complete images) in order to display
them progressively (see figure 3). This process is called
de-interlacing*.
Figure
3
De-interlacing video-based materials is not an easy job.
Many de-interlacers in the market cannot do the job well.
Some of them simply scale each field into frame by interpolating
"missing" lines from the lines above and below.
This results in images that look very soft as the image resolution
was lost. Others merge two consecutive fields directly to
form a frame. Although this can maintain the image resolution,
when there is a moving object, it looks like as if there were
spiky lines (like tines of a comb) sticking out from the sides
of the object. This is usually called combing or feathering
(see figure 4).
Figure
4
*
De-interlacing videos is not needed when encoding videos to
CIF resolution since we can simply squeeze each interlaced
field into a progressive frame as resolution is reduced by
half.
De-interlace movies (film-based materials),
on the other hand, is quite easy since movies themselves are
series of progressive images. Only simple 3:2 Pulldown detection
is needed to de-interlace them, which will not be gone into
details here.
Many
well-known, good de-interlace techniques, such as Directional
Correlational Deinterlacing (DCDi) by Faroudja, are
unfortunately available only in hardware chipsets and unavailable
in PC software.
After sourcing and studying both commercial and open-source
software solutions for some time, we finally come up with
a reasonably good solution by enhancing and integrating several
products together.
Bandwidth
As you may know, the bandwidth of DVD-Video is about 5 to
6Mb/s on average. It may cause traffic jam over our campus
network if a lot of such videos are streaming over it. In
order to tackle this problem, we tested the most common encoding
methods, e.g. Windows
Media, RealMedia,
MPEG4,
etc., in order to source an efficient one so that the video
can be encoded in a much lower bandwidth without sacrificing
the video quality. Thanks to the release of RealMedia version
10, a good quality (well developed) video with full-D1 resolution
can be encoded to bandwidth as low as 1Mb/s. If the video
is in low quality, bandwidth of 2Mb/s is needed.
Higher bandwidth is needed for low quality video mainly because
of the
noise (technically, it is called mosquito noise)
contained in video frames. Most of the encoded video formats
do not store frames individually, but store the differences
between frames. If a video contains certain degree of noise,
the difference between frames will be great and require more
bandwidth to encode it. Since many of our recordings were
captured indoor (such as classrooms and lecture theatres)
without special lighting, this will inevitably introduce certain
degree of noise to the recorded videos. To keep the bandwidth
required to a minimum, we tried applying de-noise filters
to this kind of videos. Unfortunately, simple de-noise filters
available in common video encoders may also remove the details
with the noise. Thanks to many video enthusiasts all over
the world, a lot of open-source de-noise filters are available
to tackle different kinds of noise. Finally, a filter named
"high quality three-dimensional de-noise filter"
was found to work very well in most of our video materials.
With the help of the de-noise filter, we can always keep all
the video materials to a bandwidth of about 1 Mb/s.
Reminders
Video streaming with full-D1 resolution is now available in
our video services. To experience what this new standard can
bring you, simply visit our CityVoD web pages (select "CityVoD"
in the box of "News Services" in the e-Portal) and
click on the "high resolution version" icon on the
right of your desired video. Please note that this version
of video is available for on-campus machines only. For smooth
playback, it also requires Pentium III 1GHz processor with
at least 256MB system memory and RealPlayer 10 installed.