The generic empirical retrieval problem

**
Y = f(X)
**(1)

is essentially a mapping (Krasnopolsky,
1997) which maps a vector of sensor measurements, ** X **in

geophysical parameters

vectors

*
y _{i}* =

may be considered as degenerate mappings
where a vector is mapped onto a scalar (or a vector space onto a line).
This

empirical mapping can be performed using
conventional tools (linear and nonlinear regression) and NNs.

Linear regression is an appropriate tool
for developing many empirical algorithms. It is simple to apply and has
a

well-developed theoretical basis. In the
case of linear regression, a linear model is constructed for transfer function

(TF) ** f,** (2),

This model is linear with respect to both
** a
**and

The most important limitation of such a linear approximation is that it works well over a broad range of variability

of the arguments only if the function which it represents (TF in our case) is linear. If the TF,

regression can only provide a local approximation; when applied globally, the approximation becomes inaccurate.

Because, TFs are generally nonlinear functions
of their arguments ** X**, linear regression and a nonlinear approximation

with respect to

a basis of nonlinear functions {j

Finally, nonlinear regression may be applied.
For example, ** f** in (2) can be specified as a complicated nonlinear

function,

*
y _{i}* =

The expression (4) is nonlinear with respect
to its argument** X** but linear with respect to the parameters

regression (5) is nonlinear both with respect to its argument,

However, in either case, we must specify in advance a particular type of nonlinear function

forced to implement a particular type of nonlinearity a priori. This may not always be possible, because we may not

know in advance what kind of nonlinear behavior a particular TF demonstrates, or this nonlinear behavior may be

different in different regions of the TF's domain. If an inappropriate nonlinear regression function is chosen, it may

represent a nonlinear TF with less accurcy than with its linear counterpart.

In the situation described above, where
the TF is nonlinear and the form of nonlinearity is not known, we need
a more

flexible, self-adjusting approach that
can accommodate various types of nonlinear behavior representing a broad
class

of nonlinear mappings. Neural networks
(NNs) are well-suited for a very broad class of nonlinear approximations
and

mappings. Neural networks consist of layers
of uniform processing elements, nodes, units, or neurons. The neurons and

layers are connected according to a specific
architecture or topology. Fig.1 shows a simple architecture which is

sufficient for any continuous nonlinear
mapping, a multilayer perceptron. The number of input neurons, *n*,
in the input

layer is equal to the dimension of input
vector ** X**. The number of output neurons,

dimension of the output vector

Fig. 1. Multilayer perceptron employing feed forward, fully connected topology.

A typical neuron (processing element) usually
has several inputs (components of vector ** X**), one output,

of two parts, a linear part and a nonlinear part. The linear part forms the inner product of the input vector

vector W

transformation of the input vector

For the activation function, itis sufficient that it be a Tauber-Wiener (nonpolynomial, continuous, bounded) function

(Chen and Chen, 1995). Here we use a standard activation function - the hyperbolic tangent. Then, the neuron output,

(6)

The neuron is a nonlinear element because
its output *z _{j}* is a nonlinear function of its inputs

From the discussion above it is clear that
a NN generally performs a nonlinear mapping of an input vector ** X
**in

(

of the output vector or the number of outputs). Symbolically, this mapping can be written as,

**
Y** =

where * f_{NN}* denotes
this neural network mapping (the NN input/output relation).

For the topology shown in Fig. 1 for a
NN with *k* neurons in one hidden layer, and using (6) for each neuron
in the

hidden and output layers, (7) can be written
explicitly as,

where the matrix W* _{ji
}*and
the vector

the

be seen from (8) that any component (

components of the NN's input vector

Gybenko, 1989) that a NN with one hidden layer (e.g., NN (8)), can approximate any continuous mapping defined on

compact sets in

Thus, any problem which can be mathematically
reduced to a nonlinear mapping as in (1) or (2) can be solved using

the NN represented by (8). NNs are robust
with respect to random noise and sensitive to systematic, regular signals

(e.g., Kerlirzin and Réfrégier,
1995). NN solutions given by (8) for different problems will differ in
several important

ways. For each particular problem, *n*
and *m* are determined by the dimensions of the input and output vectors
** X
**and

The number of hidden neurons,

problem. The more complicated the mapping, the more hidden neurons that are required. Unfortunately, there is no

universal rule that applies. Usually

will reproduce noise as well as the desired signal. Conversely, if

desired signal accurately. After these topological parameters are defined, the weights and biases can be found, using

a procedure called NN training. A number of methods have been developed for NN training (e.g.,Beale and Jackson,

1990; Chen, 1996). Here we use a simplified version of the steepest (or gradient) descent method known as the

back-propagation training algorithm. Although NN training is often time consuming, NN application, after training,

is not. After the training is finished (it is usually performed only once), each application of the trained NN is

practically instantaneous and yields an estimate for (8) with known weights and biases.

Because the dimension of the output vector
** Y**
may obviously be greater than one, NNs are well suited for modeling

multi-parameter TFs (1). All components of the output vector

related through common hidden neurons; however, each particular component of the output vector

output neuron which is unique.

REFERENCES

Beale, R. and T. Jackson,
1990,* Neural Computing: An Introduction,* Adam Hilger, Bristol, Philadelphia
and New York

Chen, C.H. (Editor
in Chief), 1996, *Fuzzy Logic and Neural Network Handbook*, McGraw-Hill,
New York

Chen, T., and H. Chen,
1995, "Approximation Capability to Functions of Several Variables, Nonlinear
Functionals and Operators by Radial Basis Function Neural Networks," *Neural
Networks*, Vol. 6, pp. 904-910,

---,---, "Universal
Approximation to Nonlinear Operators by Neural Networks with Arbitrary
Activation Function and Its Application to Dynamical Systems"*, Neural
Networks*, Vol. 6, pp. 911-917

Funahashi, K., 1989,
"On the Approximate Realization of Continuous Mappings by Neural Networks,"
*Neural
Networks*, Vol. 2, pp. 183-192

Gybenko, G., 1989,
"Approximation by Superposition of Sigmoidal Functions," in *Mathematics
of Control, Signals and Systems*, Vol. 2, No. 4, pp. 303-314

Hornik, K., 1991, "Approximation
Capabilities of Multilayer Feedforward Network",* Neural Networks,*
Vol. 4, pp. 251-257

Kerlirzin, P., P. Réfrégier,
1995, " Theoretical Investigation of the Robustness of Multilayer Perceptrons:
Analysis of the Linear Case and Extension to Nonlinear Networks", *IEEE
Transactions on neurl networks*, Vol. 6, pp. 560-571

Krasnopolsky, V., 1997,
"Neural Networks for Standard and Variational Satellite Retrievals", Technical
Note, OMB contribution No. 148, NCEP/NOAA

Please send comments and questions to Vladimir Krasnopolsky Vladimir.Krasnopolsky@noaa.gov

Last changed 1 May 1999