Radial Basis Functions

(1)

Radial Basis Functions

M.J.D. Powell

February 10, 2005

Professor Mike J. D. Powell spent three weeks at IMM in November – December 2004.

During the visit he gave five lectures on radial basis functions. These notes are a TeXified version of his hand-outs, made by Hans Bruun Nielsen, IMM.

(2)

(3)

Lecture 1

Interpolation in one dimension Let values off :R 7→ Rbe given.

x1 x

2 x

3 x

n x

f(x)

Figure 1.1. Data points.

The data aref(xi), i= 1,2, . . . , n, wherex1< x2 <· · ·< xn.

Pick a functions(x), x₁ ≤x≤x_n, such thats(x_i) =f(x_i), i= 1,2, . . . , n.

Piecewise linear interpolation. Let sbe a linear polynomial on each of the intervals [x_i, x_i+1], i= 1,2, . . . , n−1. Then scan be expressed in the form

s(x) = Xn j=1

λ_j|x−x_j|, x₁≤x≤x_n .

To see that, define λ_j, j = 2,3, . . . , n−1, by the first derivative discontinuities ofs, and then pickλ1 and λn to satisfys(x1) =f(x1) and s(xn) =f(xn).

Extension to two dimensions

Nowf :R² 7→ Rand the data are f(xi),i= 1,2, . . . , n, wherexi = ξi

η_i

∈ R².

ξ η

Figure 1.2. Points in 2D and triangulation.

Method 1. Form a triangulation and then employ linear polynomial interpolation on each triangle.

Method 2. Sets(x) = Xn j=1

λ_jkx−x_jk, x∈ R² ,where the coefficientsλ_j,j = 1,2, . . . , n, have to satisfys(xi) =f(xi),i= 1,2, . . . , n.

(4)

Clearly, sis different in the two cases; one way of showing this is to consider where the gradient∇sis discontinuous.

Higher dimensions

Let f : R^d 7→ R for some positive integer d. Method 2, but not Method 1 allows large values ofd.

Radial basis function interpolation

Pick a functionφ(r), r≥0, for exampleφ(r) =r. Then let shave the form s(x) =

Xn j=1

λ_jφ(kx−x_jk), x∈ R^d ,

where k · k denotes the Euclidean norm (the vector 2-norm). The parameters λj should satisfy

Φλ=f , (1.1)

where Φ is then×nmatrix with elements

Φ_ij =φ(kx_i−x_jk), 1≤i, j≤n ,

and where f∈ Rⁿ has the components f(x_i), i= 1,2, . . . , n. Does (1.1) have a unique solution,ie is Φ nonsingular ?

In these lectures we consider the following cases of radial basis functions, where α and care given positive constants.

Gaussian φ(r) =e⁻^αr², r≥0, α >0 inverse multiquadric φ(r) = (r²+c²)⁻^1/2, r≥0, c >0

linear φ(r) =r, r≥0

multiquadric φ(r) = (r²+c²)^1/2, r≥0, c >0 thin plate spline φ(r) =r²logr, r≥0

cubic φ(r) =r³, r≥0

Table 1.1. Cases of radial basis functions.

The inverse multiquadric case is illustrated on the front page. We do not consider φ(r) = r², r≥ 0, because thens(x) =P_n

j=1λ_jφ(kx−x_jk), x∈ R^d, would always be a polynomial of degree at most two, so there would not be enough freedom to interpolate the values f(xi), i= 1,2, . . . , n, for sufficiently largen.

In the Gaussian casewe have the following theorem.

Theorem 1.1. If dis any positive integer and if the points xi∈ R^d, i= 1,2, . . . , n, are all different, then the matrix Φwith elements Φij =e⁻^α^k^xⁱ⁻^x^j^k² is positive definite.

(5)

Remarks. Each diagonal element of Φ is one. If the points are fixed and α is increased, then Φ becomes diagonally dominant.

The method of proof employs contour integration.

Corollary 1.2. If φ can be written in the form

φ(r) = Z _∞

0

w(α)e⁻^αr²dα ,

where w(α) ≥ 0 for α ≥ 0 and R_∞

ε w(α)dα > 0 for some ε > 0, then Φ is positive definite.

Proof. Ifv∈ Rⁿ, we can write v^TΦv =

Xn i=1

Xn j=1

vivjφ(kxi−xjk) = Z _∞

0

w(α) Xn

i=1

Xn j=1

vivje⁻^α^k^xⁱ⁻^x^j^k²dα ,

which is positive forv6= 0.

Example 1.1. The matrix Φ corresponding to inverse multiquadric rbf interpolation is positive definite because of the identity

Z _∞

0

α⁻^1/2e⁻^α(r²^+c²⁾dα=√

π r²+c²₋1/2

, r≥0.

Completely monotonic functions Ifφ(r) =R_∞

0 w(α)e⁻^αr²dα, r≥0, wherew≥0, we consider the derivatives of the function ψ(r) =φ(r^1/2) =

Z _∞

0

w(α)e⁻^αrdα .

We findψ(r)≥0 and (−1)^kψ^(k)(r)≥0,r ≥0, for all positive integersk. Such a function is called a“completely monotonic function”.

Theorem 1.3. φ(r) can be expressed as R_∞

0 w(α)e⁻^αr²dα, where w ≥0, if and only if ψ(r) =φ(r^1/2), r ≥0, is completely monotonic.

Example 1.2. φ(r) =e⁻^βr² ⇒ ψ(r) =e⁻^βr, which is completely monotonic for fixedβ >0.

φ(r) = r²+c²₋1/2

⇒ ψ(r) = r+c²₋1/2

, which is also completely monotonic.

Linear rbf. Nowψ(r) =r^1/2,r ≥0, and we find (−1)^kψ^(k)(r)<0,r >0, for all positive integersk. Hence the function M −ψ(r),r ≥0, “tends” to be completely monotonic for large enough M. Thus the matrix with elementsM − kxi−xjk, 1≤i, j ≤n, is positive definite for large enoughM.

Lemma 1.4. If Φ_ij =kx_i−x_jk, 1≤i, j≤n, and if v is a nonzero vector in U =

v∈ Rⁿ| P_n

i=1v_i= 0 , then v^TΦv <0.

(6)

Proof. For sufficiently large M we have 0<

Xn i=1

Xn j=1

v_i(M− kx_i−x_jk)v_j =−v^TΦv .

Theorem 1.5. Let d be any positive integer and n≥2. If the points xi∈ R^d, i=1,2, . . . , n, are all different, then then×nmatrixΦwith the elementsΦij=kxi−xjk is nonsingular. It has one positive and n−1 negative eigenvalues.

Proof. According to the lemmav^TΦv <0 for any nonzerov∈ U ⊂ Rⁿ. The dimension of U isn−1, so there aren−1 negative eigenvalues. Moreover, Φ_ii= 0,i= 1,2, . . . , n, implies that the sum of the eigenvalues is zero. Therefore one of the eigenvalues is positive.

Multiquadric case

Now ψ(r) = (r+c²)^1/2, r ≥ 0, and (−1)^kψ^(k)(r) < 0, r ≥ 0, for all positive integers k.

Hence Φ hasn−1 negative and one positive eigenvalues as before.

Multiquadrics with a constant term Letshave the form

s(x) = Xn j=1

λ_j kx−x_jk²+c²1/2

+µ, x∈ R^d ,

with the constraint Pn

j=1λj = 0, which we write as 1^Tλ= 0, where 1 is the n-vector of all ones. Now the parameters have to satisfy the equations

Φ 1 1^T 0

λ µ

= f

0

. (1.2)

The matrix is nonsingular. Indeed if λ

µ

is in its nullspace, then premultiplication of Φλ+µ1 = 0 byλ^T givesλ^TΦλ+µλ^T1 =λ^TΦλ= 0. Since Φ is nonsingular, λ= 0 holds, which impliesµ= 0.

This form of multiquadric interpolation will be important in lectures two and three.

(7)

Lecture 2

The integer m

Given φ(r), r ≥ 0, we define ψ(r) = φ(√

r), r ≥ 0. We write ψ⁽⁰⁾ ≡ ψ, and for each positive integer k we write ψ^(k)(r), r ≥ 0 for the kth derivative of ψ. The “integer m of φ” is the least nonnegative integer such that the sign of (−1)^kψ^(k)(r), r > 0, k=m, m+1, m+2, . . ., is independent of r and k. Letσ denote this sign.

Example 2.1. In the cases given in Table 1.1 the integer m takes the values 0, 0, 1, 1, 2, 2, respectively, and the values ofσare 1, 1, −1, −1, 1, 1.

The following theorem gives a useful property of m. It holds for points x∈ R^d for any positive integer value ofd. Πm−1 is the space of polynomials of degree at most m−1 fromR^d toR.

Theorem 2.1. Let the pointsx_i∈ R^d,i= 1,2, . . . , n, be different, wheredis any positive integer, letΦ have the elementsφ(kx_i−x_jk),1≤i, j ≤n, for some choice of φsuch that the integer m exists, and let v∈ Rⁿ satisfy Pn

i=1vip(xi) = 0, p∈Πm−1, except that v is unconstrained in the case m = 0. Then v^TΦv is nonzero if v is nonzero, and its sign is equal toσ in the definition of the integer m.

Proof. Some advanced analysis is required, using the theory of completely monotonic functions, see pp 4–5.

Rbf interpolation including low order polynomials

Givenφ, we seek m, and, ifm= 0, we apply radial basis function interpolation as before.

Otherwise, we letshave the form s(x) =

Xn j=1

λjφ(kx−xjk) +p(x), x∈ R^d , (2.1) with the constraints Pn

j=1λjq(xj) = 0, q∈Πm−1, and p chosen from Πm−1. Now the interpolation conditionss(x_i) =f(x_i),i= 1,2, . . . , n, are satisfied by a uniquesfor general right hand sides, if the pointsxi,i= 1,2, . . . , n, are different, and ifq∈Πm−1 withq(xi) = 0,i= 1,2, . . . , n, imply thatq is zero.

Proof of the last assertion. We express the polynomial term in the form p(x) =

mb

X

j=1

µjbj(x), x∈ R^d ,

where the{bj :j= 1,2, . . . ,mb} is a basis of Πm−1, and we letB be themb×nmatrix with elements Bij =bi(xj), i = 1,2, . . . ,m,b j = 1,2, . . . , n. Then the interpolation conditions and the constraints give the linear system of equations

Φ B^T

B 0

λ µ

= f

0

. (2.2)

It is sufficient to prove that the matrix of the system is nonsingular, so we suppose that

wv

is in its nullspace, which is equivalent to

Φv+B^Tw= 0 and Bv = 0.

(8)

Thus we findv^TΦv= 0, so Theorem 2.1 givesv = 0. It now follows thatB^Tw= 0, so the polynomial Pmb

j=1w_jb_j(x), x∈ R^d, vanishes at x_j,j= 1,2, . . . , n. Hence the condition at the end of the assertion givesw= 0, which completes the proof.

Example 2.2. In the multiquadric case m = 1, which implies mb = 1, we choose b1(x) = 1 as the basis function of Π0. This leads toB = 1^T, and we see that in this case (2.2) is identical to (1.2).

The native scalar product and semi-norm

The constant sign ofv^TΦvthat has been mentioned provides the following “native” scalar product and semi-norm, the sign being (−1)^m in all the cases that we consider. We letS be the linear space of all functions of the form

t(x) = Xk j=1

µ_jφ(kx−y_jk) +q(x), x∈ R^d ,

where k is any finite positive integer, where the points yj are all different and can be anywhere inR^d, whereq is any element in Π_m₋₁, and where the real numbersµ_j can take any values that satisfyP_k

j=1µ_jp(y_j) = 0,p∈Π_m₋₁. The function s(x) =

Xn i=1

λ_iφ(kx−x_ik) +p(x), x∈ R^d , is also inS, and we define the scalar product

hs, tiφ= (−1)^m Xn

i=1

Xk j=1

λiφ(kxi−yjk)µj, s, t∈ S .

Further, because hs, siφ is nonnegative by Theorem 2.1, we define the semi-norm kskφ=

qhs, siφ, s∈ S .

Thuskskφ is zero if and only if all the coefficients λi,i= 1,2, . . . , n, ofs∈ S are zero.

An identity for the scalar product hs, tiφ= (−1)^m

Xn i=1

λit(xi) = (−1)^m Xk j=1

µjs(yj), the first part being proved as follows:

(−1)^m Xn

i=1

λ_it(x_i) = (−1)^m Xn i=1

λ_i



 Xk j=1

µ_jφ(kx_i−y_jk) +q(x_i)





= (−1)^m Xn i=1

Xk j=1

λ_iφ(kx_i−y_jk)µ_j = hs, tiφ .

(9)

Rbf interpolation is “best”

Lets∈ S be the given rbf interpolant to the data f(x_i),i= 1,2, . . . , n, of the form s(x) =

Xn j=1

λ_jφ(kx−x_jk) +p(x), x∈ R^d .

There are infinitely many interpolants to the data from S, but the following argument shows thats is the function inS that minimizeskskφ subject to the interpolation conditions.

Any other interpolant can be written ass+t, wheretis inS and satisfiest(x_i) = 0, i= 1,2, . . . , n. Thus the scalar product identity gives the inequality

ks+tk²_φ = ksk²_φ+ktk²_φ+ 2hs, tiφ

= ksk²_φ+ktk²_φ+ 2P_n

i=1λ_it(x_i)

= ksk²_φ+ktk²_φ ≥ ksk²_φ.

The inequality is strict unless ktkφ is zero, and then t is in Π_m₋₁, which implies t = 0, assuming the condition stated soon after equation (2.1).

Thin plate spline interpolation in two dimensions

We consider this “best” property in the thin plate spline caseφ(r) =r²logr,r≥0, when d= 2, the value of m being 2. Let ξ and η denote the components of x∈ R², and, for s∈ S, letI(s) be the integral

I(s) = Z Z

R²

"

∂²s

∂ξ² 2

+ 2 ∂²s

∂ξ∂η 2

+ ∂²s

∂η² 2#

dξ dη ,

which is finite, because the coefficients ofssatisfyPn

j=1λjq(xj) = 0,q∈Π1. By applying integration by parts toI(s), one can deduce

I(s) = 8π Xn

i=1

λ_is(x_i) = 8πksk²_φ.

Thus the rbf interpolants is the solution to the following problem: Find the function s with square integrable second derivatives that minimizes I(s) subject to s(x_i) = f(x_i), i= 1,2, . . . , n.

The rbf interpolation method usually gives good results in practice, and the “best”

property may be the main reason for this success. We now turn to a technique for global optimization that uses the “best” property to implement an attractive idea.

(10)

The idea for global optimization in one dimension This idea is due to Don Jones as far as I know.

Let a function f(x), 0 ≤ x ≤ 1, take the four values that are shown, the crosses being the points (0,1), (0.5,0), (0.6,0) and (1,2).

We seek the minimum off using only calculated values off. Therefore, when onlyf(x_i), i= 1,2, . . . , n, are available, we have to pick xn+1, the case n = 4 being depicted. This task becomes less daunting if we assume that f^∗ = min{f(x), 0 ≤ x ≤ 1} is known. For example, we would pick x5 from [0.5,0.6] or [0,0.5] in the cases f^∗ =−0.1 or f^∗ =−10, respectively. Specifically, we are trying to choosex5so that the dataf(xi),i= 1,2,3,4,

f = 1 f

f = 2

x and the hypothetical value f(x₅) = f^∗ can be interpolated by as smooth a curve as possible, for the current choice of f^∗. The rbf interpolation method is suitable, because we take the view that sis smoothest ifkskφ is least.

An outline of the procedure for global optimization

We seek the least value of f(x), x∈ D ⊂ R^d. We choose one of the functions φ that have been mentioned, in order to apply rbf interpolation, including the polynomial term p∈Π_m₋₁. We pick a few well-separated points x_i, i= 1,2, . . . , n, for the first iteration such that q∈Πm−1 and q(xi) = 0, i= 1,2, . . . , n, imply q = 0. Then the values f(xi), i= 1,2, . . . , n, are calculated. Let s∈ S be the usual interpolant to these values, which minimizeskskφ. We are now ready to begin the first iteration.

On each iteration,f^∗ is set to a value that has the propertyf^∗ <min{s(x) : x∈ D}. For eachy∈ D\{x₁, x₂. . . . , x_n}, we define s_y to be the element ofS that minimizesks_ykφ

subject tos_y(xi) =f(xi), i= 1,2, . . . , n, ands_y(y) =f^∗. We seek y^∗, say, which is the y that minimizes ks_ykφ. The valuex_n+1=y^∗ is chosen andf(x_n+1) is calculated. Further, s is changed to the best interpolant to f(x_i), i= 1,2, . . . , n+1. Then n is increased by one for the next iteration.

On some iterations,f^∗ is close to min{s(x) : x∈ D}, and on other iterations we pick f^∗ = −∞. Thus the method combines local convergence properties with exploration of regions where f has not yet been calculated.

Further remarks. For each y, the dependence of s_y on f^∗ is shown explicitly by the formula

s_y(x) =s(x) +{f^∗−s(y)}`_y(x), x∈ R^d ,

where `_y(x), x∈ R^d, is the function of least norm in S that satisfies `_y(x_i) = 0, i= 1,2, . . . , n, and`_y(y) = 1. Further, the scalar product identity provideshs, s−s_yiφ= 0 and the formula

ks_yk²φ=ksk²φ+ks−s_yk²φ=ksk²φ+{f^∗−s(y)}²k`_yk²φ .

Therefore it may be helpful to calculate s(y) and k`_ykφ for several values of y before choosingf^∗. We see also that, iff^∗ =−∞, theny^∗ is the value ofy that minimizesk`_ykφ.

(11)

Local adjustments toy can be based on the property that, ifby∈ D satisfies s_y(by) <

s_y(y), then the replacement ofy byby gives a reduction in ks_ykφ.

If D is bounded, and if f^∗ = −∞ is chosen on every third iteration, say, then the points x_i, i= 1,2, . . . , n, tend to become dense in D as n → ∞. This observation is of hardly any value in practice, but it does provide a theoretical proof of convergence to the least value off(x), x∈ D, when the objective function is continuous.

Further information about the algorithm can be found in “A radial basis function method for global optimization” by H-M. Gutmann, Journal of Global Optimization, 19, pp 201–207 (2001).

(12)

Lecture 3

Rbf interpolation when n is large

Imagine that a radial basis function interpolants(x),x∈ R², is required to 10,000 values of a function f on the unit square, the distribution of the data points being nearly uniform.

Then, when x∈ R² is a general point that is well inside the square, the value of s(x) should depend mainly on values off at interpolation points that are close tox. Therefore it should be possible to construct a good estimate ofs(x) from a small subset of the data.

Further, it may be possible to construct an adequate approximation to s(x), x∈ R², in only O(n) operations, where n is still the number of given values of f. On the other hand, the work of deriving the coefficients of s directly from the interpolation equations would require O(n³) work, because there is no useful sparsity in the matrix Φ that has the elements Φij =φ(kxi−xjk), 1≤i, j≤n.

Therefore some iterative procedures have been developed for the calculation of s.

This lecture will describe a successful one that has been the subject of much research at Cambridge. It is a Krylov subspace method that employs the semi-norm that we studied in Lecture 2.

Notation. As in the previous lectures, we wish to interpolate the values f(x_i), i= 1,2, . . . , n, of a function f : R^d7→ Rby a function of the form

s(x) = Xn j=1

λjφ(kx−xjk) +p(x), x∈ R^d ,

where p∈Π_m₋₁ and m∈ {0,1,2}, the termp being dropped in the case m= 0. Further, the coefficients have to satisfy the constraints

Xn j=1

λjq(xj) = 0, q∈Πm−1 ,

when m≥1. We now letS be the linear space of functionssof this form, the centresx_j, j= 1,2, . . . , n, being fixed, and we let

t(x) = Xn j=1

µ_jφ(kx−x_jk) +q(x), x∈ R^d,

be another element of S. We recall that the relation between m and φprovides a scalar product

hs, tiφ = (−1)^m Xn j=1

λ_jt(x_j) = (−1)^m Xn j=1

µ_js(x_j) and a semi-norm

kskφ =

qhs, siφ = n

(−1)^mP_n

j=1λ_js(x_j) o1/2

, s∈ S .

Outline of the procedure for calculating the interpolant

We lets_∗∈ S be the required interpolant that satisfiess_∗(x_i) =f(x_i),i= 1,2, . . . , n. The procedure is iterative and constructssk+1∈ Sfromsk∈ S, starting withs1≡0. The norms

(13)

ks_k−s_∗kφ, k= 1,2,3, . . ., decrease strictly monotonically, and in theory ks_k−s_∗kφ = 0 occurs for an integer k that is at most n+ 1−m, whereb mb = dim(Π_m₋₁). Far fewer iterations are usually required in practice.

Ifks_k+1−s_∗kφ= 0 happens, thenp∈Π_m₋₁ can be calculated from the interpolation conditions, such thats_∗ =s_k+1+p. Therefore the iterations are terminated in this case.

We do not consider the possibilityks_∗kφ= 0 as no iterations are required.

The iterative procedure employs an operator A : S 7→ S that has the following properties

(1) kAskφ= 0 ⇔ kskφ= 0, s∈ S (nonsingularity).

(2) ht, Asiφ=hAt, siφ, s, t∈ S (symmetry).

(3) hs, Asiφ>0 ⇔ kskφ>0, s∈ S (strict positivity).

Then, for each iteration numberk >0,s_k+1 is the element of the linear subspace spanned byA^js_∗,j= 1,2, . . . , k, that minimizesks_k+1−s_∗kφ. The following argument proves that the dimension of this subspace isk.

Otherwise, there are coefficients θ_j, j = 1,2, . . . , k, not all zero, such that kP_k

j=1θ_jA^js_∗kφ = 0. We let ` be the least integer that satisfies θ_` 6= 0. Then the nonsingularity ofAimplies the identity

kP_k₋_`

j=1(−θ_`+j/θ_`)A^js_∗−s_∗kφ= 0 .

Therefore termination would have occurred no later than the (k−`)-th iteration, which is a contradiction.

The construction of s_k+1 from s_k

Our procedure is analogous to the conjugate gradient method for minimizing ks−s_∗k²_φ, s∈ S, using Aas a pre-conditioner, and starting ats=s₁= 0. Letting A be the identity operator would provide perfect conditioning, butAs_∗ is required and s_∗ is not available.

ThereforeAhas a form, given later, that allowsAs_∗to be calculated by the scalar product identity, using the valuess_∗(x_i) =f(x_i), i= 1,2, . . . , n, which are data.

For eachk≥1,s_k+1 has the form

s_k+1=s_k+α_kd_k∈ S ,

whered_k∈ Sis a “search direction”, and where the “step length”α_kis set to the value ofα that minimizesks_k+α d_k−s_∗k²_φ,α∈ R. The Krylov subspace construction is achieved by picking eachd_kfrom span{A^js_∗ :j=1,2, . . . , k}in a way that gives the orthogonality prop- ertieshd_k, d_jiφ= 0,j= 1,2, . . . , k−1, fork≥2, and the descent conditionhd_k, s_k−s_∗iφ<0.

Thus eachα_k is positive until termination, and s_k+1 is the function s_k+1=s_k−hdk, sk−s_∗iφ

hd_k, d_kiφ

d_k ∈ S , which satisfieshd_k, s_k+1−s_∗iφ= 0.

After calculatingtk=A(sk−s_∗), the search direction dk is defined by the formula

d_k=





−t_k, k= 1 ,

−tk+ hd_k₋₁, t_kiφ

hd_k₋₁, d_k₋₁iφ

dk−1, k≥2 .

(14)

Thus the identity

hd_k, s_k−s_∗iφ = h−t_k, s_k−s_∗iφ = −hs_k−s_∗, A(s_k−s_∗)iφ

holds for everyk, the first equation being due to the line search of the previous iteration. It follows from the strict positivity ofAthat the descent condition is achieved. The orthogonality hd_k, d_k₋₁iφ= 0 is given by the definition of d_k, and the propertieshd_k, djiφ = 0, 1≤j≤k−2, can be proved by induction.

On termination

The value of ks_k+1 −s_∗kφ is not known for any k, because s_∗ is not available. On the other hand, the residuals

s_k+1(x_i)−s_∗(x_i) =s_k+1(x_i)−f(x_i), i= 1,2, . . . , n ,

can be calculated. Termination occurs when all the residuals can be less than a prescribed toleranceε >0, which is the condition

p∈minΠm−1

max{|s_k+1(x_i) +p(x_i)−f(x_i)|:i= 1,2, . . . , n}< ε .

The form of the operator A

If mb = dim(Πm−1) exceeds one, the data points are reordered if necessary so that the zero function is the only element of Π_m₋₁ that vanishes at the last mb points. Thus, if the coefficients λ_j, j = 1,2, . . . , n−m, ofb s∈ S are given, the remaining coefficients λj, n−m+1b ≤j≤n, are defined by the constraintsPn

j=1λjq(xj) = 0,q∈Πm−1. We pick functions`_j∈ S,j= 1,2, . . . , n−m, that have the formb

`_j(x) = Xn

i=j

Λ_jiφ(kx−x_ik), x∈ R^d ,

with the important property that the leading coefficient Λjj is nonzero for each j. Thus the functions `j, j = 1,2, . . . , n−m, are linearly independent. Then the operatorb A is defined by the equation

A s =

nX−m^b j=1

h`j, siφ

h`_j, `_jiφ

`_j , s∈ S .

This construction provides the nonsingularity, symmetry and strict positivity conditions that have been mentioned. Further, although s_∗ is not available, the function t_k=A(s_k−s_∗) can be calculated by using the formulae

h`j, sk−s_∗iφ = (−1)^m Xn

i=j

Λji{sk(xi)−f(xi)}, j= 1,2, . . . , n−m .b

The choice of `_j, j = 1,2, . . . , n−cm

The given procedure picks s₁ = 0, d₁ =−t₁ =As_∗ and s₂ = α₁d₁, where α₁ minimizes ks2−s_∗kφ=kα1As_∗−s_∗kφ. It follows that the termination condition ksk+1−s_∗kφ = 0

(15)

is achieved on the first iteration for everys_∗, ifA has the property kAs−skφ= 0, s∈ S, which is equivalent tokA`_j−`_jkφ= 0, j= 1,2, . . . , n−m. These conditions are satisfiedb by the following choice of the functions`j,j= 1,2, . . . , n−m.b

For eachj∈[1, n−m], we apply the radial basis function interpolation method to theb dataf(x_j) = 1 andf(x_i) = 0,i=j+1, j+2, . . . , n, letting the interpolant be the function

e g_j(x) =

Xn i=j

Γ_jiφ(kx−x_ik) +p_j(x), x∈ R^d ,

where pj∈Πm−1. We define gj=egj − pj, and we consider the choice `j=gj, j= 1,2, . . . , n−m. The interpolation conditions implyb keg_jkφ > 0, and the scalar product identity gives the equation

keg_jk²_φ = (−1)^m Xn

i=j

Γ_jieg_j(x_i) = (−1)^mΓ_jj ,

so the coefficients Γ_jj,j = 1,2, . . . , n−m, are nonzero as required. Further, ifb j and kare any integers that satisfy 1≤j < k≤n−m, we find the orthogonality propertyb

hgj, gkiφ = hgk,egjiφ = (−1)^m Xn i=k

Γkiegj(xi) = 0 .

Thus the definition of A on the previous page in the case `j = gj, j = 1,2, . . . , n−m,b provides A`_k = `_k, k = 1,2, . . . , n−m. It follows that our iterative procedure takes atb most one iteration for general right hand sidesf(x_i),i= 1,2, . . . , n.

The bad news, however, is that the calculation of eg_i is about as difficult as the calculation of s_∗, because egi also has to satisfy n interpolation conditions. In practice, therefore, guided by the remarks on page 12, we impose conditions on ge_j of the form e

g_j(x_i) =δ_ij,i∈ Lj, where Lj is a small subset of the integers{j, j+1, . . . , n}that includes j itself. The newegj has the form

`e_j(x) = X

i∈Lj

Λ_jiφ(kx−x_ik) +pe_j(x), x∈ R^d ,

and we choose`_j =e`_j −pe_j,j= 1,2, . . . , n−m.b

From now on we avoidm≥2 difficulties by restricting attention to linear and multiquadric radial functions, and we change the meaning ofq to a prescribed positive integer that bounds the number of elements in each set Lj, the value q = 30 being typical.

Then {x_i :i∈ Lj} is chosen to contain the q points from the set {x_i : i=j, j+1, . . . , n} that are closest to xj, except that Lj is the complete set {j, j+1, . . . , n} in the cases n−q+1≤j≤n−m.b

The Fortran software

A Fortran implementation of this procedure has been written at Cambridge for the radial basis functions φ(r) = √

r²+c², r ≥ 0, where c can be zero. Of course the coefficients {Γji : i∈ Lj}, j = 1,2, . . . , n−1, are found before beginning the iterations, which takes O(nq³) +O(n²) operations, including the searches for nearest points.

The residualss_k(x_i)−s_∗(x_i),i= 1,2, . . . , n, and the coefficients ofs_kare required at the beginning of thek-th iteration, with the coefficients and valuesdk−1(xi),i= 1,2, . . . , n,

(16)

of the previous search direction ifk≥2. The calculation of the coefficients oft_k=A(s_k−s_∗) takes O(nq) operations, and the extra work of finding the coefficients of d_k for k ≥2 is negligible. The valuesd_k(xi),i= 1,2, . . . , n, are also required, the work of this task being O(n²), which is the most expensive part of an iteration, unless fast multipole techniques are applied. This has been done very successfully by Beatson when d, the number of components inx, is only 2 or 3. Finding the step lengthα_k is easy. Then the coefficients of s_k+1 are given bys_k+1 =s_k+α_kd_k, and the new residuals take the values

s_k+1(xi)−s_∗(xi) =s_k(xi)−s_∗(xi) +α_kd_k(xi), i= 1,2, . . . , n .

The k-th iteration is completed by trying the test for termination. Further information can be found in “A Krylov subspace algorithm for multiquadric interpolation in many dimensions” by A.C. Faul, G. Goodsell and M.J.D. Powell, IMA Jnl. of Numer. Anal., 25, pp 1–24 (2005).

A few numerical results

We sample each xi and f(xi) randomly from the uniform distribution on {x :kxk ≤1} and [−1,1], respectively, and we setε= 10⁻¹⁰. Thus 10 problems were generated for each d and n. The ranges of the number of iterations of the Fortran software with c = 0 are given in the table below.

d= 2 d= 5

n q= 30 q= 50 q = 30 q= 50 250 7−8 6−6 20−21 13−14 500 8−9 6−7 26−27 17−18 1000 9−10 7−8 34−36 22−23 2000 10−10 8−8 44−47 29−30

This performance is very useful, but we will find in Lecture 5 that more iterations occur forc >0.

(17)

Lecture 4

Purpose of the lecture

We apply the radial basis function method when the data are the valuesf(j),j∈ Z^d, of a smooth functionf : R^d7→ R, where Z^dis the set of all points in R^dwhose components are integers. We retain the notations(x),x∈ R^d, for the interpolating function. We ask whether s ≡ f occurs when f is a constant or a low order polynomial. If the answer is affirmative for allf∈Π_`, which is the space of polynomials of degree at most` from R^d toR, and ifs_h is the interpolant to the dataf(hj),j∈ Z^d, then, ash→0, we expect the errorsf(x)−sh(x), x∈ R^d, to be of magnitudeh^`+1.

Fourier transforms

Some brilliant answers to the questions above have been found by considering Fourier transforms. When g(x), x∈ R^d, is absolutely integrable, its Fourier transform is the function

b g(t) =

Z

R^de⁻^ix^·^tg(x)dx, t∈ R^d .

Further, ifbg is absolutely integrable and ifg is continuous, then the inverse function has the form

g(x) = 1 (2π)^d

Z

R^de^ix^·^tbg(t)dt, x∈ R^d .

For example, the Fourier transform of the Gaussian φ(kxk) = e⁻^α^k^x^k², x∈ R^d, is the expression

φ(bktk) = (π/α)^d/2e^−k^t^k²^/4α, t∈ R^d .

It is elementary that, ifg is absolutely integrable, then the function u(x) =P_n

j=1λ_jg(x−x_j), x∈ R^d , has the Fourier transform

b

u(t) =nP_n

j=1λ_je^ix^·^t

obg(t), t∈ R^d . (4.1)

Thus, ifu,b λj and xj,j = 1,2, . . . , n, are available, we expect to be able to identify bg(t), t∈ R^d.

Generalized Fourier transforms

The function φ(kxk), x∈ R^d, is not absolutely integrable for most of the choices of φ that have been mentioned, but they do have “generalized” Fourier transforms that can be identified by the relation (4.1) betweenbg and u. For example, in the caseb φ(r) = r and d= 1, the hat function

u(x) =¹₂|x+ 1| − |x|+¹₂|x−1|, x∈ R ,

satisfies u(x) = 0, |x| ≥ 1, so it is absolutely integrable. Further, it has the Fourier transform

b u(t) =

Z ₁

0

cos(xt) (2−2x)dx = 2

t² (1−cost)

= {¹₂e⁻^it−1 +¹₂e^it}(−2/t²), t6= 0 .

(18)

Therefore the relation (4.1) between bg and ub suggests that g(x) = |x|, x∈ R, has the transformbg(t) =−2/t²,t∈ R \ {0}.

Further, ifg(x), x∈ R^d, is any function such thatu(x) =Pn

j=1λjg(x−xj),x∈ R^d, is absolutely integrable, for some choices of λ_j and x_j, j= 1,2, . . . , n, theng has a “generalized transform” bg(t), t∈ R^d, independent of u, such that u has the Fourier transform b

u(t) =nPn

j=1λje^ix^·^t

obg(t),t∈ R^d.

All the radial basis functions given in Table 1.1 have Fourier transforms or generalized Fourier transforms. We let φ(bktk), t∈ R^d, be the transform of φ(kxk), x∈ R^d, although this notation does not show the dependence of φbon d. The following cases will be useful later.

φ φ(bktk), t∈ R^d φ(r) =r const× ktk⁻^d⁻¹ φ(r) =r²logr const× ktk⁻^d⁻² φ(r) =r³ const× ktk⁻^d⁻³ φ(r) =e⁻^αr² (π/α)^d/2e^−k^t^k²^/(4α)

Table 4.1. Fourier transforms of radial basis functions.

Interpolation on Z^d

We chooseφ(r),r ≥0, and then we seek a function of the form χ_{(x) =} ^X

k∈Z^d

µ_kφ(kx−kk), x∈ R^d , that satisfies the Lagrange conditions

χ_{(j) =}_δ_j0_, _j_{∈ Z}^d _,

where δj0 is the Kronecker delta. Thus, if all the sums are absolutely convergent, the values f(j),j∈ Z^d, are interpolated by

s(x) = X

j∈Z^d

f(j)χ_(x₋_j), _x_{∈ R}^d _. _(4.2)

We see thatχ is independent off, and that the convergence of the last sum may impose some restrictions on f. We see also that χ is without ap∈Π_m₋₁ polynomial term.

The importance of χ

When φ(r) =r and d= 1,χ is the hat function χ_{(x) =} ¹

2|x+ 1| − |x|+¹₂|x−1|, x∈ R .

It follows that the interpolant (4.2) is well defined for any f(x),x∈ R, including the case f(x) = 1, x∈ R. On the other hand, it is not possible to express a nonzero constant function as the sumP_∞

j=−∞λj|x−j|,x∈ R, as all the coefficients λj have to be zero.