
This application report
describes a parallel video restoration system to restore old motion
picture archives. A Gaussian Weighted, Bi-directional 3D Auto-Regressive
(B3D-AR) algorithm is used to alleviate the presence of noise
in the old archives. Common forms of degradation found in such
archives are "dirt and sparkle" and scratches. The distortion
is caused either by the accumulation of dirt or by the film material
being abraded.
While most of the existing
image restoration algorithms will blur edges of moving objects
in the vicinity of occluded and uncovered image regions, this
algorithm is able to suppress mixed noise processes and recover
lost signals in both the covered and uncovered regions in an image
sequence. This video restoration system is tested on the artificially
corrupted image sequences and naturally degraded video (full PAL
image size). Samples of the original and corresponding restored
image sequence are contained in this report.
The B3D-AR algorithm is
parallel implemented on an array of 15 Texas Instruments TMS320C40
processors connected in a tree configuration. Two different parallel
algorithms are implemented in which a close to linear speed-up
is achieved by means of a load-balanced parallel algorithm.
While many old movies are
recorded on flammable nitrate-based negatives that decay rapidly,
modern movies are made with safer acetate-based 35mm films. However,
both types of media are susceptible to degradation such as gouges,
scratches and the accumulation of dirt. The result is a variety
of artefacts that make the old movies look their age.
The deterioration in old
movies can be stopped by adopting digital film archiving technology,
but defects that are already present in the films will be inherited
into the digital storage. Restoration of degraded motion picture
is a highly labour-intensive and extremely costly undertaking.
A much publicised event [1] is the restoration work of Disney's
1937 masterpiece -- Snow White and the Seven Dwarfs, which
was re-released in digital form in 1993. It would be financially
rewarding to reproduce the old movies with as much fidelity to
the original negatives as possible so that the movies can be re-released
in higher quality formats such as video-on-demand, digital video-disk,
and HDTV. Therefore, a video restoration system which can automatically
remove artefacts in film archives will be of useful to the entertainment
and broadcast industry.
This application report
describes an Auto-Regressive (AR) model-based restoration algorithm
and its parallel implementation on a network of Texas Instruments
TMS320C40 processors. The restoration process begins with the
conversion of the degraded film into its digital form with the
aid of a real-time video digitiser. The success of automatic video
restoration relies on the fact that image frames in a movie do
not change significantly from one frame to the next, except for
changes due to moving objects in the scene. This means the frames
preceding and succeeding the current image frame will provide
enough repeated information to allow us to detect the presence
of degraded regions in the image. This same redundancy provides
us a way to mathematically model the image region at the vicinity
of these artefacts so that meaningful information may be used
to fill in the corrupted image regions, resulting in a restored
image frame. The scene changes due to moving objects and uncovered
background signals must be identified to yield an accurate model.
To account for the inter-frame changes that are caused by moving
objects in the scene, the motion of these objects is first computed
by a Motion Estimation algorithm. Once the moving regions have
been compensated for, a 3-Dimensional Auto-regressive (3D-AR)
model is built from the information contained in both the preceding
and succeeding frames. Two of these dimensions describe changes
within the image frame and the third describes changes between
frames. Restoration is done one frame at a time so that the restored
frame could then be used to help restore the subsequent corrupted
frames in the image sequence. The proposed AR model-based approach
has the important advantage over the global filtering strategy
[2] which tends to blur sharp edges or homogenise highly textual
regions in both the distorted and uncorrupted image regions. Statistical
approaches such as Markov Random Field Modelling [3] have produced
good results albeit at a higher computational cost.
The computational demands
required to estimate the motion of moving objects in the image
and formulate the 3D image sequence model is still huge. The time
required to restored just a single PAL frame using one workstation
can run up to a few hundreds of seconds. Timely restoration can
only be achieved through the design of fast (computationally
efficient) algorithm and the use of parallel processing
techniques on a network of Digital Signal Processors (DSP).
In this project, we describe
in detail a fast video restoration system where distorted old
archives can be digitised, restored, and transferred to new storage
media with minimal human supervision.
The schematic diagram of a video restoration algorithm is shown in Figure 1.

Figure 1 Video Restoration
Schematic Diagram
The image is first partitioned
into blocks of E E pixels for the computation of the motion
vectors. The motion is first estimated by some motion estimation
algorithms and then processing is directed along the calculated
motion trajectories. A robust motion estimation algorithm is necessary
for restoration of the image sequences, and it is noted that motion
estimation is a vibrant research field. In this application report,
the motion estimation algorithm used here is a robust Overlapped
Block Matching (OBM) algorithm as shown in Figure 2.
In order to get a reliable and accurate displacement estimate,
the size of blocks for Block-Matching has to be chosen carefully.
Since the image sequences are bound to be degraded, the estimate
will be unreliable and affected by noise if small blocks are used.
On the other hand if large blocks were to be used, the estimate
becomes inaccurate as the displacement vector field inside the
large blocks would not be constant. Therefore small blocks are
required to estimate the displacement vector field sufficiently
local and adaptive. The proposed OBM scheme attempts to circumvent
the above mentioned problems when estimating the motion vectors
for each frame of the image.
First, the whole frame
is divided into blocks of E E pixels. Each block
of E E pixels will require to search for a forward motion
vector (FMV) from a past reference frame (temporal index, t
= -1) and a backward motion vector (BMV) from a future frame
(temporal index, t = +1). For each block of E E pixels,
one motion vector is selected from the pair of FMV and BMV; the
motion vector that yields a smaller sum of absolute error is selected.
The OBM scheme is used to estimate the motion vectors. As shown
in Figure 2, the block matching is done with the overlapped blocks
of D D pixels in the current frame where D >
E and all E E pixels blocks are centred within
the D D pixels blocks. This D D pixels block is
compared with a corresponding block within a search area of size
(D+2P) (D+2P) pixels in the previous frame, and the best
match is found based on the minimum absolute error (MAE) cross-correlation[5].
The motion vectors found by comparing the D D pixels block
in the present frame and the (D+2P) (D+2P) pixels block
in the previous frame are then assigned to the E E pixels
block. The search procedure adopted in the proposed OBM scheme
is based on a threshold exhaustive search[5].

Figure 2. Overlapped Block-Matching
Motion Estimation Algorithm
The 3-Dimensional Auto-Regressive
(3D-AR) model[6] has been successfully applied to remove impulsive
noise and other types of degradation in image sequences albeit
at a higher computational cost in the interpolation process. Kokaram's
3D-AR model[6] was modified to the Bi-directional 3D Auto-Regressive
(B3D-AR) model[4] where the computational cost in the interpolation
process is reduced significantly. In this application report,
B3D-AR as described by equation (1) is used for the detection
of corrupted pixels.
(1)
where:
= Predicted pixel intensity at the location (i,j) in the
nth frame
ak
= Auto-Regressive (AR) model coefficients
N
= Total number of AR model coefficients
[qik
, qjk
, qtk]
= Offset vector that points to each pixel neighbourhood used for
the AR model, as shown in Figure 3. The component of the offset
vector which determines the temporal direction of the supporting
pixel is qtk
and
its value is -1 for a support pixel in the preceding frame and
+1 for a support pixel in the succeeding frame. Therefore
is the pixel intensity at the kth support position for
the pixel at (i,j,n).
=
displacement vector between frame nth and frame mth.
(i,j) denotes that the displacement is a function of the
position in the image.
For parameter estimation,
the task is to choose the parameters in such a manner to minimise
some function of the prediction error
,
as shown in the following equation (2) :
(i,j,n) = I(i,j,n) -
Î(i,j,n) (2)
They are two sets of parameters
to compute (estimate): the model coefficients and the displacement
vectors. The motion vectors are to be computed first using a Motion
Estimation Algorithm. Subsequently, the Least Mean Square (LMS)
approach is used to compute the model coefficients.
The coefficients chosen to minimise the square of the error in equation (2) leads to the normal equations:
Ra
= -r
(3)
Where R is a N
by N matrix of correlation coefficients, a is the
vector of model coefficients and r is a N x 1 vector of
correlation coefficients. The solution to equation (3) yields
the model coefficients [5].
In our implementation,
as shown in Figure 3, each block of 16x16 pixels in the current
frame nth is modelled with a set of 9 AR coefficients.
The predicted intensity of a pixel within the 16x16 block in frame
nth is calculated from its corresponding motion compensated
3x3 support region in either the previous or next frame.

Figure 3 : The support
region selected is based on the value of t obtained from
the bi-directional motion estimator
The position of a local
distortion can be detected by applying some threshold to
,
the square of the error between the actual and predicted intensity
of the pixel at location (i,j,n) which is given by:
(4)
where the predicted intensity
, given in equation (1) is calculated
from the AR coefficients
estimated in
equation (3).
The restoration process
can be seen as a threefold process. First, the pixels which are
detected as "distorted" pixels are weighted according
to a Gaussian Weighting scheme. Second, a set of newly estimated
unbiased AR coefficients are re-computed using the equation (3).
Finally, the "distorted" pixels identified by using
equation (4), are then removed by substituting them with the value
of
calculated with the new set of AR
coefficients.
To restore the "distorted"
pixels, the re-computed model coefficients are required. As shown
in Figure 3, the support region for each predicted pixel has a
size of 3 x 3 pixels only. Thus, the Normal equation (equation
3) must be altered to solve for the AR coefficients using the
Gaussian Weighted coefficients estimation. Normally, the model
coefficients chosen are such as to minimise the expected value
of the squared error at the concerned point. Once detection of
dirt has been done, some of this data is known to be degraded.
Therefore the prediction error at these points may be weighted
by a function
, so that these degraded
portions do not affect the estimation process.
The new weighted error
equation may be written as :
(5)
The Gaussian weighting
function,
is assigned to each degraded
point depending on the magnitude of the error, at location (i,j)
in the nth frame, during the re-computation
of the video model. The rest of the symbols have their usual meaning
as presented in equation (1) and a0 = 1. The Gaussian
weighting function can be described as follows (6):
for
for
for
(6)
The square of the new
Gaussian weighted error equation (3) is minimised with respect
to the coefficients and yield a normal equation similar to equation
(3).
To restore a block of B
B pixels centred within a block of size M M pixels
in the current frame. The M M block's motion estimate
in the previous or next frames must be determined. The
choice of the previous or next frames is decided by the B3D-AR
model as discussed earlier. Then, using these two blocks of pixels,
a set of coefficients ak
are derived by the normal equation. It is assumed that the information
within a block of size M M is stationary enough to enable
the use of one model, i.e. one set of coefficients for all the
M2
pixels within
the block. The model is applied to the B B block and pixels
identified as "noise" are restored.
The support region used for prediction can be represented as x:y. A 9:8 support region means that the support region consists of 9 pixels from the previous frame and 8 from the current frame. We have implemented this model considering information only from the previous or next frame. In other words, we employ the 9:0 or 0:9 model. Each pixel in the current frame is thus modelled by 9 pixels in the previous or next frame. A support region wholly in the previous/next frame is unlikely to be affected by noise around the same relative areas as in the current frame (noise is essentially temporally isolated). The use of the 9:0 support region ensures that the current frame information is not used for detection and cleaning.
The "noisy" pixels are now interpolated with their predicted values after re-calculating the model coefficients. The interpolation equation now used is :
iu = ik Ak (7)
The two vectors iu
and ik,
represent the known and unknown (noisy) pixel intensities, respectively.
Ak
is the matrix of coefficients
a.
The structure of
the two matrices ik
and Ak
has been modified to set up the equation (7) for a simple
and computationally efficient solution. Matrix ik
is of size u
N, and Ak
of size N 1, where u is the total number of unknown
(noisy) pixels in the B B block and N is the number
of model coefficients ak.
The above solution consists of N u operations, which is
O(u), since N is fixed. The cleaning process
is now dependent on the level of noise present in the B B
block. Thus, a less noisy block takes less time to restore than
one with more noise, since less computations are involved.
The parallelism inherent
in image restoration is geometric parallelism. Each frame in the
image can be partitioned into independent sub-blocks. These sub-blocks
are then distributed among the worker (also known as
slave) processors by a master (root) processor. Each
of these blocks will undergo the same restoration operations.
Since a master processor distributes different data packets
to the slave processors, each of which performs the same
sets of operations on it, the parallel machine can be said to
be employing the SPMD (Single Program Multiple Data) paradigm.
The parallel implementation of the B3D-AR model is carried out on a network of fifteen TMS320C40 digital signal processors. Each TMS320C40 has 8 Mbytes DRAM while the root processor has 32 Mbytes DRAM. The processors are connected in a tree configuration. This particular configuration was chosen as it strikes the right balance between efficiency and algorithm simplicity. The architecture of the TMS320C40 also limits the maximum number of possible connections to each processor to 6. The logical configuration is shown in Figure 6.
Figure 6. Logical Arrangement
of Tasks
There are two entities
involved in our system, tasks and processors. The
tasks represent the logical configuration of the system,
while the processors represent the physical configuration.
The physical configuration of the system is decided by the underlying
hardware layout, while the logical configuration is decided by
the parallel algorithm used. Figure 6 shows the logical layout.
There are three different tasks : master, sub-master
and worker. M represents the master task;
SM1-SM4 are the 4 sub-master tasks; W1-W14
are the worker tasks. The master task resides on
the root processor (first level processor) which also communicates
with the host SUN SPARC10 workstation.
A single processor may
have more than one task running concurrently on it. On
the second level of the tree configuration, there are two tasks,
namely sub-master and worker tasks running on the
four processors. The dashed arrows depicted in Figure 6 show a
logical (non-physical) channel that communicates between the sub-master
and worker tasks within a processor. Thus, 10 of the 14
worker tasks are dedicated i.e. the processors they
are designated to perform only the processing job. The 4 remaining
worker tasks are non-dedicated: they are placed
on processors which perform both the distribution and processing
jobs. It is obvious that the performance of the non-dedicated
worker tasks will be lower due to the additional distribution
workload on them. The master task M distributes
packets of work to the sub-master tasks and in turn distribute
it to the workers.
"C" programming
language is used in implementing the restoration algorithm and
compiled using the 3L Parallel C compiler [7].
Load balancing of the entire
workload is the most important consideration in case of parallel
algorithms. We have employed the RILB (Receiver Initiated
Load Balancing) technique[8]. This scheme is characterised
by the fact that the distribution of work is performed only when
an idle task requests for work. The request for work may be explicit
i.e. the passing of a message requesting for work, or implicit
i.e. the task finishes processing the work packet assigned to
it and passes back the results. We use implicit requesting, since
the processed (cleaned) block of data must eventually be sent
back up to the master for re-combination with the rest of the
processed image.
Each workload (packet) consists of a 16x16 pixels block in the current frame as well as its search space in the previous and next frames. The sub-masters serve as work distributors. Initially, they send out work packets to all worker under them. The worker receive this work packet, perform motion estimation, detection of the noise and restore the corrupted pixels. The workers then pass back the pixel positions of the noise and their new 'clean' intensity values for the final image tiling at the master. This is also a signal indicating that the workers have finished with their assigned task. Whenever a sub-master receives such a signal from any worker, it will relay the signal upwards to the master. The sub-master will then receive the next work packet from the master.
The size of each work packet
must be small enough to ensure that while the master is
distributing work packets, no worker has to wait for too
long. Performance degradation would set in if a processor had
to wait for long.
The proposed algorithm is evaluated by applying different image sequences containing different noise processes:
(1) uncovered-background region in an image sequence which is artificially corrupted by single to multiple pixels sized impulses (Salesman Sequence).
(2) occluded region in an image sequence which is artificially corrupted by single to multiple pixels sized impulses (Salesman Sequence).
(3) a sequence that undergoes translational motion and is artificially corrupted by single to multiple pixels sized impulses (Salesman Sequence).
(4) area which undergoes zooming process and is artificially corrupted by single to multiple pixels sized impulses (Corridor Sequence).
(5) real degraded image
sequence (Frankenstein Sequence)
All the artificially added
noise is temporally isolated which is usually the case in real
degraded motion picture [9].
Figure 7 shows an artificially
corrupted frame in the Salesman sequence. The blotches
and scratch line are temporally isolated. The proposed algorithm
(BAR3D) was applied to the Salesman sequence that contain
regions undergoing self-occlusion and the Corridor sequence
which consist of a scene undergoing zooming. Multiple pixel-sized
blotches and artificial scratches were synthetically added to
several frames of each sequence. The picture quality can be seen
from Figures 8 which shows the corresponding restored frame using
B3D-AR model.
Figure 9a shows a magnified
portion of the degraded Salesman frame. Figure 9b shows
the corresponding restored frames using the BAR3D models. Figure
10a shows a degraded Corridor frame. The Corridor
sequence exhibits a motion called zooming. Figure 10b shows the
corresponding restored frames using the BAR3D model.

Figure 7. A corrupted frame
of the 'salesman' sequence with blotches varying from sizes of
22 to 44 and a line of width 2.

(a)
(b)
Figure 9. (a) A magnified
portion of the corrupted 'salesman' frame at the region of self-occlusion
(b) the corresponding restored frame using the bi-directional
3D-AR model.
(a) (b)
Figure 10. (a) a corrupted
frame in the 'corridor' sequence. (b) the corresponding restored
frame using bi-directional 3D-AR model.
The restoration
quality of the algorithm on naturally degraded image sequence
(obtained by digitising the Frankenstein video) are shown in Figures
11a, 11b, 12a and 12b. The video was first digitised from a PAL
format video tape before applying the algorithm onto the image
sequences. The size of each frame in the PAL image sequence is
576 720. The original Frankenstein sequence is heavily blotched
and has been effectively restored by the Gaussian Weighted, B3D-AR
algorithm.

Figure 11a: Sample A -- a selected frame of a noise-corrupted image sequence

Figure 11b:The corresponding restored frame (Sample A) using bi-directional 3D-AR model.
Figure 12a: Sample B -- a selected frame of a noise-corrupted image sequence
Figure 12b: The corresponding
restored frame (sample B) using bi-directional 3D-AR model.
The 2nd level processors
(as shown in Figure 6) actually execute two different tasks: sub-master
and worker tasks. These tasks run concurrently. This means
that if the distribution of the workload exceeds the processing
of the workload in the sub-master processor, drastic performance
degradation could take place.
We implemented two algorithms on these four 2nd-level processors to measure the effect of the distribution work-load on their performance. The first algorithm (Algorithm A) consisted of a simple mechanism where no distinction was made between the concurrently executing tasks. The sub-master processor divides time slices equally between the two tasks.
The second algorithm (Algorithm
B) followed from a careful analysis of the sub-master and
worker processing burdens. It was found that the computational
load of the worker exceeded that of the sub-master
by a substantial margin. It is thus not justified for the two
tasks to receive the same share of processing time. The sub-master
task needs to be active only when a worker under it has
completed a work-task and requires a fresh work packet. Thus,
the worker should receive a larger share of processor time.
This was achieved by using priority[7]. The worker
tasks were given the highest priority. The sub-master task
was accorded a low priority, becoming active only when required.
Otherwise, it remained descheduled. This is in contrast
to Algorithm A, where the distributor task is constantly
running without doing any useful processing.
Figure 13 shows the speedup characteristics of algorithms A and B.

The improvements in the
results for Algorithm B are evident. Up to a network of 4 processors,
the two algorithms display the same speed-up. This is due to the
fact that only the 4 first-level processors are used as dedicated
workers only. When the tree configured network grows into
the 3rd level, the degradation of the performance of Algorithm
A starts to take place. Algorithm B, on the other hand, does not
degrade as much. This clearly demonstrates the precedence that
the worker tasks should take over the distributor
tasks. It is also observed that the performance of algorithm B
is well above the P/log P speed-up that is usually accepted
as good performance for a parallel algorithm.
The video restoration algorithm
and its implementation on a network of 15 TMS320C40s are presented
in this application report. It is shown to have better restoration
quality when tested on a set of image sequences. The results and
analysis show that B3D-AR model is capable of restoring noise
corrupted video. . While, the Gaussian Weighting scheme provides
good spatial support, the bi-directional scheme prevents the progressive
degradation of image sequences due to the corruption in regions
exhibiting different motion processes, such as occlusion, zooming,
rotation and panning. The video restoration has been tested on
different image sequences containing different noise processes
such as variable-size blotches and line scratches. Its effectiveness
in the restoration of these different noise artefacts has been
demonstrated. More importantly, when the system is applied to
a natural degraded (PAL-size) video, the noise level of the image
sequence is significantly reduced while retaining the crisp and
sharpness of the original image sequence.
Parallel implementation
of the proposed algorithm is realised where close to linear speed-up
is achieved on a 15-node TMS320C40 system hosted by a SUN SPARC10
workstation.
[1] B. Fisher, "Digital Restoration of Snow White : 120,000 Famous Frames are Back", Advanced Imaging, pp. 32-36, September 1993.
[2] G. R. Arce, "Multistage Order Statistic Filters for Image Sequence Processing", IEEE Transactions on Signal Processing, 39(5), pp. 1146-1163, May 1991.
[3] R. D. Morris, "Image Sequence Restoration using Gibbs Distributions", PhD. Thesis, Department of Engineering, University of Cambridge, U.K., May 1995.
[4] W. B. Goh, M. N. Chong, S. Kalra, D. Krishnan, "A Bi-directional 3D AR model approach to motion picture restoration", IEEE Int. Conf. On Acoustics, Speech & Signal Processing, pp. 2277-2280, May 1996.
[5] A. K. Jain, " Fundamentals of Digital Image Processing", Prentice Hall, 1989.
[6] A. C. Kokaram, R. D. Morris, W. J. Fitzgerald, P. J. W. Rayner, "Interpolation of Missing Data in Image Sequences", IEEE Trans. On Image Processing, vol 4 no. 11, pp. 1509-1519, Nov. 1995.
[7] 3L Ltd., "PARALLEL C User Guide for Texas Instruments TMS320C40", 1995
[8] Vipin Kumar, Ananth Y Grama and Nageshwara Rao Vempaty, "Scalable Load Balancing Techniques for Parallel Computers", Journal of Parallel and Distributed Computing, pp60-79,1994.
[9] S. Geman and D.McClure,
"A Nonlinear Filter for Film Restoration and other problems
in Image Processing", CGVIP, Graphical Models and Image Processing,
pp. 281-289, July 1992.
This application report
describes a video restoration algorithm and its implementation
on a network of 15 Texas Instruments TMS320C40 Digital Signal
Processors. The video restoration algorithm used is a Gaussian
Weighted, Bi-directional 3D Auto-Regressive (B3D-AR) model. This
restoration algorithm alleviates the presence of noise in the
old video archives. Common forms of degradation found in such
archives are "dirt and sparkle" and scratches. The distortion
is caused either by the accumulation of dirt, or the degradation
of the films due to chemical process, or by the film material
being abraded.
This parallel video restoration
system is shown to have better restoration quality when tested
on a set of image sequences. The results and analysis show that
B3D-AR model is capable of restoring noise corrupted video. While,
the Gaussian Weighting scheme provides good spatial support for
the model, the bi-directional scheme prevents the progressive
degradation of image sequences due to the corruption in regions
exhibiting different motion processes, such as occlusion, zooming,
rotation and panning. The video restoration has been tested on
different image sequences containing different noise processes
such as variable-size blotches and line scratches. Its effectiveness
in the restoration of these different noise artefacts has been
demonstrated. More importantly, when the system is applied to
a natural degraded (PAL-size) video, the noise level of the image
sequence is significantly reduced while retaining the crisp and
sharpness of the original image sequence. The results are in contrast
with most of the existing image restoration algorithms which will
blur edges of moving objects in the vicinity of occluded and uncovered
image regions; the video restoration algorithm described here
can successfully suppress mixed noise processes and recover lost
signals in both the covered and uncovered regions in image sequences.
Parallel implementation
of the proposed algorithm is realised where close to linear speed-up
is achieved on a 15-node TMS320C40 system hosted by a SUN SPARC10
workstation. This application describes a fast video restoration
system that distorted old archives can be digitised, restored,
and transferred into new storage media with minimal human supervision.
TI Home
Search
Feedback
Semiconductor Home