Application of the Non-Hermitian Singular Spectrum Analysis to the exponential retrieval problem

We present a new approach to solve the exponential retrieval problem. We derive a stable technique, based on the singular value decomposition (SVD) of lag-covariance and crosscovariance matrices consisting of covariance coefficients computed for index translated copies of an initial time series. For these matrices a generalized eigenvalue problem is solved. The initial signal is mapped into the basis of the generalized eigenvectors and phase portraits are consequently analyzed. Pattern recognition techniques could be applied to distinguish phase portraits related to the exponentials and noise. Each frequency is evaluated by unwrapping phases of the corresponding portrait, detecting potential wrapping events and estimation of the phase slope. Efficiency of the proposed and existing methods is compared on the set of examples, including the white Gaussian and auto-regressive model noise.


Introduction
The present paper originates from a classical problem in signal processing, namely: how to find a number of exponential constituents and their frequencies {ν j } in a time series {f (k)} m−1 k=0 .One of 1 to appear in the Journal of the Russian Universities Radioelectronics.
the techniques is to assume that c j e iν j k , k = 0, 1, . . ., m − 1 (1) and then to apply some least square method.Here, the complex-valued amplitudes {c j } and the real distinct frequencies {ν j } are such that c −j = cj , ν −j = −ν j .Note that ν 0 = 0 and c 0 is a real-valued constant.The random component w is commonly interpreted as noise, s is called the signal and is a real constant.
A variety of subspace methods such as the Maximum Entropy Method [1], MUltiple SIgnal Classification (MUSIC) [2], Linear Prediction Methods [3,4], Estimation of Signal Parameter via Rotational Invariance Technique (ESPRIT) [5], Matrix Pencil (MP) [6], and Minimum-Norm Method [7] have been used to find the exponentials {ν j } in the measured data {f (k)} m−1 k=0 .In [8], a unification of concepts of the subspace methods is presented in terms of the singular value decomposition (SVD) [9] of the d×l + 1 trajectory matrix X: for some constant d, translations κ i , and the constant l=m−κ d .We emphasize that in [8], κ i =i−1, whereas we propose to choose arbitrary translations.The new choice of translations allows us to increase the numerical rank [9] of the trajectory matrix X, and to improve accuracy of frequency evaluation.
Our method together with ESPRIT and MP employ shift-invariance properties of the trajectory matrix X, however, there are some differences.ESPRIT was developed to estimate the directionof-arrival by a uniform linear array (ULA) of sensors.The data readings from the i-th sensor is associated with the i-th row of the trajectory matrix X.The data in MP are similarly arranged in the row-wise format.Consequently, in both methods, the matrix X is partitioned into two submatrices H 0 and H 1 composed by the first d−1 and last d−1 rows of X, respectively.Note that equal spacings between sensors in ULA yields κ i =i−1 and hence the space shift-invariance property can be applied.If noise is absent and d=2n + 2 then the space shift-invariance property let us derive H 1 = ΨH 0 , Ψ = {e i(j−1)ν k }Λ{e i(j−1)ν k } −1 , Λ=diag[e iν −n , . . ., e iνn ], where j = 1, . . ., d − 1 and k = −n, . . ., n, and Λ is the eigenvalue matrix of Ψ.For the nonuniform linear array (NULA) of sensors the translations {κ i } are arbitrary and hence the space shift-invariance in space property of X does not directly apply [10].However, if d=2n+1 then the matrix X has a time shift-invariance property: where j = 1, . . ., d and k = −n, . . ., n; and the matrices X 0 , X 1 are given by the first l−1 and last l−1 columns of the matrix X, respectively; the matrix Λ is as in (3).The matrix is a generalized Vandermonde matrix and we assume that it is non-singular [11,12].Note that each frequency ν j is given by the argument of the corresponding eigenvalues of Ω.
In the presence of noise, (4) no longer holds.Therefore, we construct a d×d matrix Ω such that Y k − ΩY k−1 is minimal in some sense, where {Y k } are given in (2).In the framework of perturbation analysis it is possible to show that if d = 2n + 1 then frequencies {ν j } could be approximated by the arguments of the eigenvalues of Ω.However, the number 2n + 1 of exponentials in the time series f is a priori unknown and needs to be found.To deal with this problem we propose a two step approach.In the first step, we select d to be greater than the number of exponentials found either by existing methods [13,14,15,16,17,18,19] or by computing the rank of X 0 .We stress that we do not need to estimate the number of exponentials exactly at this step but to ensure that d ≥ 2n + 1.
In this case some eigenvalues of Ω are associated with exponentials, while others are related to the noise.Note that just taking into the account information about the eigenvalues, it is impossible to judge whether the eigenvalue of Ω is associated with the exponential or noise.Therefore, at the second step we cast the trajectory matrix X 0 into the basis of the eigenvectors of Ω.In the new basis rows of X 0 , that are associated with the exponentials, have a very specific structure, i.e.
the phase portrait is either the circle or an arc.We hence propose to evaluate the number n by a pattern recognition technique.We also show that the frequencies {ν j } n j=−n estimated using the information carried by the rows are more accurate than those estimated by the eigenvalues of Ω.
Before we proceed forward, we adopt notation for the inner product (f, and the norm f l = (f, f ) l of time series f ={f (k)} l−1 k=0 and g={g(k)} l−1 k=0 .The operator • stands for the complex conjugation, i.e. a + ib = a − ib.If l=∞, we define f ∞ = lim l→∞ f l and (f, g) ∞ = lim l→∞ (f, g) l .Also, we defined the translation operator T: where f ={f (k)} l−1 k=0 stands for a time series.

The case of a single exponential
In this section, we consider a single exponential corrupted by noise: and highlight key elements of the proposed technique.Our goal is to estimate the value of ν given values of f (k) for k = 0, 1, . . ., l.
A number of methods have been developed to estimate a single exponential in the time series, e.g.[2,3,5,6,20,21,22,23,24,25,26,27,28,29].Some of them are based on an observation that the exponential satisfies a first order auto-regressive process and poles of the associated filter could be used to identify the frequency ν [24,28].
To estimate ν when the observations are corrupted by noise, it is possible to introduce some averaging by solving a first order autoregressive problem, i.e. finding the scalar value of a such that the error term e in the following relation is minimized: ), e l → min, or in the matrix form: where , . .., f (l−1)] and , . .., f (l)] are complex-valued 1×l−1 matrices.The value of a is found by solving the least squares and is given by λl such that where * stands for the matrix conjugate transpose.Consequently, the frequency ν can be evaluated by νl = (ln( λl )), where λl is an eigenvalue in the case of 1 × 1 matrices X 1 X * 0 and X 0 X * 0 .After some algebra, it is possible to derive that When information about a number of exponentials in the time series is missing, one might increase an order of the autoregressive model to find constants a 1 and a 2 : or in the matrix form: find the matrix A such that where , and E stands for the 2×l−1 matrix associated with the noise.Note that in the case of the auto-regressive model of the first order, the information vectors Y k are scalars equal to f (k).
Using least squares, the matrix A in ( 8) is found by In the case of the single infinite exponential corrupted by the white noise, i.e. (w, w) ∞ = 1, (s, w) ∞ = 0, and (T w, w) ∞ = 0 we obtain that â1 = e iν /( 2 + 2) and â2 = e 2iν /( 2 + 2).Therefore the eigenvalues λk and eigenvectors vk of Ω are λ1,2 = e iν 1 ± One may note that the argument of λ1 = e iν (1 − 2 /3) + O( 4 ) could be used as an estimator of the frequency ν.The other eigenvalue λ2 = −e iν (1/2 − 2 /12) + O( 4) is rotated by the angle of π with respect to the argument of e iν .The arguments of eigenvalues may be thus used to estimate the frequency ν, but λ2 provides a false estimate.The modulus of eigenvalues could be used to distinguish genuine and false estimates, e.g.eigenvalues with the absolute values significantly less than one could be associated with the damping exponentials and be discarded.At the same time, the eigenvectors v1 and v2 also carry the information about the exponentials.
In the proposed technique we look at the dynamics of trajectory matrix X 0 by mapping it to the basis of eigenvectors using the matrix ].An image of the information vector Y k in the new basis is In our case, and hence The first coordinate of Z k , i.e.Z k,1 for k = 0, . . ., l rotates around the origin with an angle between Z k+1,1 and Z k,1 approximately equal to the frequency ν.Its phase portrait {( (Z k,1 ), (Z k,1 ))} l k=0 resembles a unit circle (or an unit arc) with the center at the origin.The second coordinate of Z k , i.e.Z k,2 for k = 0, . . ., l does not have a particular well-defined behavior, i.e. its phase portrait is given by a set of points randomly centered around the origin.This difference in the phase portraits allows to distinguish pairs of the eigenvector-eigenvalue corresponding to the true frequency from their false counterparts.
We note that once the coordinate of Z k related to the exponential signal is established (in our case the first coordinate), then the problem is simplified.Any appropriate method of the frequency estimation could be applied to recover a single exponential in the time series.
2 Linear regression approach in the case of multiple exponentials In this section, we extend our proposed approach to the time series composed of several exponentials and contaminated by noise.Now and for the rest of this article, we consider the information vectors Y k = (f (k + κ 1 ), f (k + κ 2 ), . . ., f (k + κ d )) t associated with arbitrary translations {κ i }.We consequently partition Y k into two time series S k and W k : where S k = (s(k+κ 1 ), s(k+κ 2 ), . .., s(k+κ d )) t and obtain that Here, V = {e iκ j ν k } 1≤j≤d, −n≤k≤n , C = (c −n , . . ., c n ) t , and Λ = diag[e iν −n , . . ., e iνn ].Thus, the trajectory matrices X 0 and X 1 could be expressed as Note that each row of [C, ΛC, . . ., Λ l−1 C] is associated with a unique frequency.The phase portrait of the k−th row is given by the set of points {c k e ijν k } l−1 j=0 , representing a circle/arc with the radius of |c k | around the origin.A key idea of the proposed method is to represent X 0 in a new basis, extract its rows and estimate a frequency associated with each phase portrait.

Linear regression problem
After some algebra, we derive that Y k+1 and Y k are connected by the relation where In the case of 2n + 1 = d, the matrix V is square, and hence V + = V −1 .Furthermore, if 2n + 1 = d then similar to (6) we obtain the relation between the signal components: and to find frequencies {ν k } in the noiseless time series one needs to compute Ω and then find its eigenvalues.
As in the case of the single exponential, we introduce averaging and approximate the matrix Ω by the real-valued d×d matrix Ω, which is the solution of the linear regression problem Y k+1 −AY k → min.Namely, the matrix Ω is called the best estimate of Ω, if Ω = arg min A J(A), where Note that in the case of multiples exponentials we do not know the number of exponentials contributing to the time series f .For now we assume that 2n + 1 ≤ d.We discuss the selection of translations {κ i } d i=1 and the dimensions of the information vectors Y k later in this section.It is possible to prove, see Appendix A for the proofs, that if detΓ 0 =0, then J(A) has the minimum only at Ω, satisfying and its minimal value is If 2n + 1 = d, then for the noiseless time series f , the solution Ω of the linear regression problem coincides with Ω, and for the noisy data the eigenvalues of Ω and Ω are connected via where Λ (n) are some diagonal matrices.Hence, we can use the eigenvalues of Ω = Γ 1 Γ −1 0 to evaluate those corresponding to Ω and by computing the argument of eigenvalues to find the frequencies.
There are some difficulties since the number of exponentials is a priori unknown, but it could be estimated by methods discussed in [13,14,15,16,17,18,19] or by computing the rank of X 0 .
In the case of noiseless time series, we have rank(X 0 ) = 2n + 1 for values d ≥ 2n + 1.This means that after a certain increase in d, the rank of X 0 stays constant and this threshold could be used to estimate the number of exponentials.In case of the noisy data, one can instead estimate the numerical rank of Γ 0 by SVD [8].Under the white noise assumption and m → ∞, the singular values {λ k } d k=1 of Γ 0 according to [8,28] are such that , where µ k > 0 are the singular values associated with the signal component and σ 2 stands for the noise variance.The values of µ k depend on the choice of translations {κ i } d i=1 and could be close to zero if the information vectors Y i are almost linearly dependent.By varying the translations, it is possible to make the information vectors more linearly interdependent, consequently increase µ k , and thus to obtain a notable distinctions between the singular values associated with the signal and noise.Note that the singular values associated with the white noise are constant.In the case when m is not large or when the noise is not white, smallest singular values µ k could start to overlap with those related to noise [30,31] and hence the notable decrease may be missing.Nevertheless, the SVD provides a good estimate for the numerical rank of matrix [8].
We note that if d > 2n + 1 then some eigenvalues of Ω approximate the frequencies {ν i } while the other are associated with the noise.As in the case with the single exponential, selection of eigenvalues associated with the exponentials is a matter of belief if no information regarding the noise structure is provided.On other hand, the eigenvectors of Ω can bring more information to decide whether the eigenvalue-eigenvector pair is associated with the exponential or noise.The ideas of using the eigenvectors are closely related to the time series decomposition by the Singular Spectrum Analysis (SSA) [32,33].

Time series decomposition and the principal component approach
Before introducing the principal components using eigenvectors of Ω, we briefly review some key points of the Singular Spectrum Analysis (SSA).Our approach exploits ideas of SSA, yet SSA and its various modifications such as Monte Carlo SSA [34,35,36] and Multiscale SSA [37], Random Lag SSA [38], Oblique SSA [39] do not compute the frequencies {ν} but allow representation of the data f in a new convenient way.In particular, SSA relies on the Karhunen-Loève decomposition of the correlation matrix and on representation of the vectors {Y k } in a coordinate system defined by the eigenvectors {u k } of Γ 0 , or namely where the rows {p k } of the matrix P can be seen as the coordinates of {Y k } in the orthogonal base {u k } and are commonly called principal components.Note that the vectors {p k } have the important property of orthogonality PP * = Σ 2 .However, arbitrary exponentials do not have to be orthogonal with respect to the inner product (•, •) l due to the finite number of sampling points k = 0, . . ., l − 1.Therefore, each principal component p k is usually a linear combination of exponentials even for the noiseless data.We would like to emphasize that even for the noiseless data, there is no one-to-one correspondence between the exponentials and the principal components p k [40].Hence, our goal is to obtain a one-to-one correspondence between the frequencies {ν} and certain objects in the absence of noise as follows.
We achieve our goal by representing the information vectors Y k in the basis of eigenvectors of Ω instead of the basis associated with Γ 0 as it completed in SSA.Consequently we call the proposed method the Non-Hermitian Singular Spectrum Analysis (NH-SSA).For the rest of this article we assume that the eigenvalues of Ω are all different and the Jordan decomposition of Ω holds as We define a central object in our approach, namely the image of information vectors Y k in the basis of V: Using the small perturbation theory it is possible to show that, in the case of d = 2n + 1, if all eigenvalues of Ω are distinct then the eigenvector matrix V is such that Here, the matrix R( ) is analytic with respect to and R( , where each diagonal entry of R (n+1) vanishes for any n.Recalling that or in its coordinate-wisely form as Here, j stands for the row index of the vector Z k =(Z k,1 , . . ., Z k,2n+1 ) t , λ j = e iν j is the eigenvalue of Ω, and η k,j represents noise.For each value of j, the consecutive values of Z k,j , k = 0, . . ., l rotate around the origin, each time turning by the angle of ν j (up to the level of noise O( )).
Furthermore, if d = 2n + 1 and → 0 then there is one-to-one correspondence between the rows of Z k and exponentials.
In many practical applications, a number 2n + 1 of exponentials in the time series f is unknown.
Therefore, in order not to loose some exponentials, we need to have d ≥ 2n + 1.This implies that we decompose the noise w into exponentials.Therefore, some rows of Z k are attributed to noise and do not have a stable "rotational" pattern around the origin, as was discussed in Section 1 for the case of the single exponential.
Thus, to decide whether the j-th row of Z k is associated with the noise or exponentials, we apply the pattern recognition technique by visualizing dynamics of {Z k,j } l k=0 .Namely, the sequence {Z k,j } l k=0 is thought to represent a single exponential if the phase portrait, i.e. the set of points {( (Z k,j ), (Z k,j ))} l k=0 lies between two concentric circles, see Figure 1a; the modulus |Z k,j | is bounded around a certain constant.One other hand, the sequence {Z k,j } l k=0 is associated with noise when the phase portrait {( (Z k,j ), (Z k,j ))} l k=0 is randomly distributed around the center, as shown in Figure 2a; the modulus |Z k,j | significantly varies.Furthermore, for the single exponential signal, the phase ψ j (k) is linearly increasing as shown in Figure 1c, while for {Z k,j } l k=0 related to noise the phase ψ j (m) = m−1 k=0 (ln (Z k+1,j /Z k,j )) could have some irregularities, see Figure 2c.
Once the rows of Z k associated with the exponentials are established the process is greatly simplified, note that each {Z k,j } l k=0 represents a single exponential and a whole variety of the frequency estimation methods could be applied to recover the frequency.
Finally, we note that similar to the SSA methodology, sequences {Z k,j } l k=0 associated with either the exponentials or noise could be grouped together and mapped back to the original space as follows.For example, mapping of {Z k,j } l k=0 associated with a single exponential back to the original space results in the trajectory matrix X(j) 0 = {c j e i(κs+k)ν j + O( )} ks , which is a Hankel matrix when the translations κ s = s − 1.We average elements in X(j) 0 and thus map the sequence {Z k,j } l k=0 to the original space as f (j) (k) = c j e iν j k + O( ), k = 0, . . ., m − 1. Figures 1d and 2d show mappings of {Z k,j } l k=0 associated with the exponential and noise, respectively, to the original is not bounded from zero; (C) The phase ψ j (k) can have one or multiple "wrapping" events [41]; (D) Mapping of {Z k,j } l k=0 back to the original space f (j) (k) is an irregular signal.
space.Note that f (j) (k) associated with the exponential is a cosine with an approximately constant amplitude, while the mapping of {Z k,j } l k=0 related to the noise results in some irregular structure.Similar to SSA, sequences {Z k,j } l k=0 associated with noise could be all grouped together and mapped back to ŵ.We thus produce a decomposition of the original signal f = f + ŵ + O( ), where f = j f (j) .Decomposition of the time series f into a sum of f (j) , where each f (j) is associated with the single exponential, depends on the noise variance, proposed number n of exponentials and translations {κ}.In the current model we use translations such that κ i =(i−1) m, where m is called the multiplicity.Hence, the decomposition depends on the pair of parameters (d, m).The multiplicity m can be chosen arbitrarily.However, our numerical experiments show that m should be of the same order as the expected period.In this case, the information vectors {Y k } tend to be more linearly independent in the computational sense and the matrices Γ 0 and Γ 1 be better conditioned.

Numerical realization
In this section, we provide a numerical algorithm to evaluate V−1 in order to map the time series into the principal components.Please note that since Γ 0 and Γ 1 are typically ill-conditioned matrices and the straightforward calculation of eigenvectors V of Ω = Γ 1 Γ −1 0 could lead to computational errors [42].We propose to maximize the "numerical" rank of Γ 0 and Γ 1 by varying m.Further, noting that the SVD is less error prone for the symmetrical positive-definite matrices than the eigenvalue decomposition, in order to achieve a better numerical accuracy we suggest to exploit the SVD of X 0 and X 1 as follows.
Assuming that trajectory matrices X 0 , X 1 have the rank equal to 2n+1 we denote the SVD of X 0 and of X 1 as where 2n+1×l matrix Π stands for the projector mapping of R l onto R (2n+1) , and where the unitary matrices U s , W s and the non-negative diagonal matrix D s are of size 2n+1×2n+1, l×l and 2n+1×2n+1, respectively for s = 0, 1.From (19) it follows that the equation Γ 1 = ΩΓ 0 is equivalent to where Let us emphasize that the matrix Q is a projection of the unitary matrix W * 1 W 0 , and therefore Q ≤ 1.The matrix Υ has the same spectrum Λ as Ω has.
Let us denote by Φ the matrix consisting of the generalized eigenvectors of the matrices (Q, R), namely QΦ = RΦ Λ. Taking into the account that Q = RΦ ΛΦ −1 = ΥR and the definition of the matrix Υ it is easy to check that From the Jordan decomposition of Ω it follows that the eigenvectors V of Ω and generalized eigenvectors Φ are connected by Therefore, by definition ( 15) we obtain Finally, we combine all steps together and provide the numerical algorithm to estimate exponentials in the time series: Input: the time series f (k), k = 0, 1, . . ., m − 1.
1. Pick a value of d > n and consider a set of multiplicities m = 0, 1, 2, . . . 5. Select a pair of (d, m), where cond(X 0 ) attains the minimum and compute trajectory matrices X 0 and X 1 for the found pair of d and m 6. Perform the SVD on matrices: . Discard rows {Z k,j } l k=0 associated with eigenvalues λ j such that | λj | < λ c , where λ c is a given threshold.These rows are related to the signal components f (j) damping with time.The value of λ c could be chosen such that a number of generalized eigenvalues | λj | < λ c is the same as the number of singular values of Γ 0 associated with the signal.11.Apply pattern recognition technique to the remaining {Z k,j } l k=0 and evaluate the number n of exponentials.The number n of exponentials is determined by the number of {Z k,j } l k=0 which have the graph lying in a vicinity of the unit circle in the complex plane.
12. Apply one of the frequency estimation methods to recover a single exponential in {Z k,j } l k=0 associated with the regular patterns.Here, we use ESPRIT to find a single exponential in {Z k,j } l k=0 .
Output: List the frequencies {ν j } associated with {Z k,j } l k=0 showing the regular patterns.
4 Comparison with other high resolution methods

Signal corrupted by the white noise
Let us consider the signal s consisting of four unit-amplitude cosines sampled at m = 300 points.
The frequencies for these cosines are ν 1 /2π = 0.04, ν 2 = 0.06, ν 3 = 0.07, and ν 4 = 0.12.The signal s is corrupted by the white Gaussian noise w with the zero mean E(w) = 0 and variance Var(w) = 1.In this case, the SNR dB defined by 10 log 10 s 2 m / w 2 m is approximately 3.5 dB.For a given realization of the time series, we compute the condition number cond(Γ 0 ) on the grid of (d, m), as shown in Figure 3.There are several potential pairs of m and d, marked by red asterisks, where the condition number is close to its minimum.We select the value of m equal to 4, while the choice for d is less restrictive, at the same time is better to select d rather large in order to increase the size of Γ 0 and reduce influence of noise on the information-carrying components.Our experience reveals that it is important to select the value of d to be at least three to four times larger than the number of exponentials, which could be estimated, for example, by analyzing eigenvalues of the auto-covariance matrix Γ 0 , shown by blue triangles in Figure 4. We note that a significant drop occurs at the 9 th eigenvalue.Hence, n = 4 (since the cosines are used to define the signal).Therefore, while applying the NHSSA method, we assume d = 18, however any number greater than 12 will be also sufficient.We stress that the precise determination of d is not important.
Additional dimensions d − n are used to decompose the noise into some exponentials and these false exponentials are discarded later at either Step 10 or 11.
A number of these false estimates is controlled by the threshold parameter λ c in Step 10.If the value of λ c is reduced then more false estimates are recovered, but then could be discarded at Step 11.At the same time, if the value of λ c is increased then NHSSA may start to omit true frequencies.
The value of λ c could be chosen such that a number of generalized eigenvalues | λj | < λ c is the same as the number of Γ 0 singular values associated with the signal.Figure 5 shows the distribution of λj for one of the realizations of the while noise-corrupted signal, in the case of n = 4 and d = 18.Four pairs of the generalized eigenvalues have a modulus close to 1, while the rest has significantly lower magnitudes.The value of λ c ≈ 0.8 provides a threshold to separate the true and false estimates in this case.Our experience indicates that the optimal value of λ c ≈ 0.8, however it might need to be decreased if the SNR is reduced or the time series length is reduced.We nevertheless propose to keep the value of λ c at the lower end to obtain some false estimates and then discard them based on using the pattern recognitions.We consider 100 realizations of the time series and estimate frequencies by the ESPRIT and NHSSA.To improve the accuracy of frequency estimation, the size of the auto-covariance matrix for ESPRIT is selected to be m/3 = 100.We consider two realizations of ESPRIT denoted by ESPRIT(n), when n = 4 (an exact number of cosines in the time series) or n = 7 is the assumed number of cosines in the ESPRIT algorithm.
Figure 6a plots the recovered frequencies for each realization according to each method, i.e. frequencies recovered by NHSSA are plotted by red circles, results of the ESPRIT recovery are shown by dots and crosses.Figure 6b shows the probability of occurrence (a number of times the frequency is recovered within 0.005 intervals, which uniformly span the frequency domain) for the estimated frequencies.Both realizations of ESPRIT almost always recover the frequencies, while NHSSA shows good performance for ν/2π = 0.04 and ν/2π = 0.12, while slightly underperforms for ν/2π = 0.06 and ν/2π = 0.07.Instead of identifying both of the latter frequencies, NHSSA recovers a single frequency in the middle.We will discuss the separability of frequencies later in this section.Note that all methods recover the frequencies quite well, especially after taking into the account that NHSSA does not have any information about the number of exponentials.
We list the estimated mean and variance for each {ν k } 4 k=1 in Table 1.The variance of ESPRIT is smaller than that of NHSSA, but the ESPRIT requires a number of cosines as an input variable.
Additionally, the ESPRIT with n = 7 provides additional estimates, shown by black crosses, that are distributed rather randomly.It is hard to distinguish these false estimates from the true frequencies if only their values are given (as a potential solution one may apply ESPRIT with different sizes of the auto-covariance matrix to investigate whether the estimates are true or false).
The NHSSA also has some false estimates, i.e. red circles lying away from the genuine frequencies.
Finally, we illustrate separability of the original time series f = s + w into the components, namely: ŝ -the recovered signal and ŵ -the estimated noise.The recovered signal ŝ is obtained by grouping and mapping sequences associated with {Z k,j } l k=0 , which have a circular phase portrait, to the original space.The noise ŵ is estimated by grouping and mapping the rest of {Z k,j } l k=0 to the original space as well.Figure 7a shows the signal s and the time series f = s + w for one of the realizations of w.The recovered signal ŝ is compared to the original signal s in Figure 7b.
Note that the two time series match quite well, but ŝ still have some noise.Figure 7c shows the comparison of time series for the original w and estimated ŵ noise, which match quite good as well.
We emphasize that the decomposition of the time series f = ŝ + ŵ + O( ) is obtained straight from grouping appropriate {Z k,j } l k=0 and mapping them to the original space.The proposed method thus allows us to de-noise the original time series.Despite some inspiring empirical observations, further investigations are required to derive sharp estimates for s − ŝ and w − ŵ.

Signal corrupted by the autoregressive noise
To further test the proposed method, we consider a signal s consisting of the same four cosines, but now it is corrupted by the noise w, which is generated according to an autoregressive (AR) process Here, ξ(k) is the white Gaussian noise such that E(ξ) = 0 and Var(ξ) = 1; the SNR dB for the former and latter models is 0.6 and 1.5 dB, respectively.Note that for the AR noise, the eigenvalues of Γ 0 do not have a clear jump as shown in Figure 4 and the number of exponentials for ESPRIT could be estimated between j = 9 and j = 15 or higher (since each cosine is associated with two exponentials, and thus the number of cosines could be between 4 and 7).After computing the condition number cond(X 1 ) on the grid of (d, m), we choose m = 3 and d = 18.As before, we consider 100 different realizations of the noise and estimate frequencies by the same three methods.Results of the frequency recovery for the AR1 and AR2 noise models are shown in Figures 8 and 9, respectively.Note that AR1 noise model generates a significant number of false estimates for ESPRIT, there is also a comparable number of false estimates for NHSSA, but the latter could be discarded using the pattern recognitions.
For the AR2 noise model, NHSSA performs significantly better than ESPRIT (7), but less effective than ESPRIT(4).However, ESPRIT(4) has an advantage by exploiting information regarding the correct number of exponentials in the signal.If the number of exponentials is increased, e.g. as in ESPRIT (7), false estimates occur.For the sake of completion we list the mean and variance for each recovered frequency in Table 1.

Separability
In this section, we consider a signal composed of two cosines with close frequencies that are hard to detect simultaneously due to the shortness of the time series f , i. and rather estimates a single frequency somewhere in the middle between ν 1 /2π and ν 2 /2π.The NHSSA also fails to distinguish both cosines and recovers a single frequency -some sort of averaging of ν 1 /2π and ν 2 /2π as well.This frequency is related to an eigenvalue pair (the cosine consists of two exponentials) lying close to the unit circle, while other eigenvalues have significantly lower moduli and are related to noise.The phase portrait {( (Z k,j ), (Z k,j ))} l k=0 of the sequence {Z k,j } l k=0 associated with the pair lying close to the unit circle is a spiral, see Figure 12a.
We stress that the original problem of exponential recovery was concerned with finding frequencies {ν k : ν k ∈ R}.However, both ESPRIT [43] and NHSSA could be applied to recover damping exponentials with frequencies {ν k : ν k ∈ C + }, where C + is the upper complex plane, which discussion is beyond the scope of this manuscript.Therefore, instead of recovering both exponentials with closely lying frequencies, the NHSSA approximated them by a single damping cosine, as could be observed in the mapping of {Z k,j } l k=0 back to the original space, see Figure 12d.This information is not revealed by standard ESPRIT, since it only relies on the eigenvalues and discards information carried by the eigenvectors.The noise ŵ and ŝ are both recovered in NHSSA by grouping appropriate eigenvalues and mapping back to the original space, see Figures 10b and 10c.Nevertheless, further investigations are necessary to understand when the recovery of two exponentials sampled on a short interval is feasible.

Conclusions
We present a new method of estimating exponentials and their frequencies in the time series.The proposed method decomposes the time series consisting of several exponentials into components by casting the information vectors into a new basis.Each component corresponds either to only one of the exponentials or to noise.For the information-carrying component one of many frequency estimation techniques (e.g.ESPRIT, MUSIC, ML, etc.) could be applied to recover a single exponential.The overall accuracy of the proposed method is comparable with that of the widely used ESPRIT method, if the latter is provided with the number of exponentials in the signal.
Furthermore, when the model order (number of exponentials) in ESPRIT is overestimated, the proposed method can reduce the number of false frequency estimates, as shown by numerical examples.
One of the significant benefits of the proposed approach is a way to distinguish false and true frequency estimates using the pattern recognition.The primary automatization of the pattern recognition is completed by discarding noise-related components, associated with the eigenvectors that have a modulus less than a certain threshold λ c .At the second stage, the phase portrait and phase dynamics for the remaining components could be visually analyzed.Images associated with the true frequencies have phase portraits resembling unit circles and phase dynamics with zero or a minimal number of phase wrapping events.Closeness of the phase portrait to the unit circle and mapping of the component back to the original space could provide certain levels of confidence that the component is associated with the exponential or noise.False frequencies associated with the phase portraits that have a random structure.At the same time further research is needed to produce a fully automatic pattern recognition algorithm.
Finally, we note that under certain conditions the proposed method could be generalized to estimate exponentials in the time series where some measurements are missing [44].holds.Substituting this formula into (26) and equating the terms in front of the same powers of , we derive where

Figure 1 :Figure 2 :
Figure 1: Typical results for the sequence {Z k,j } l k=0 associated with an exponential.(A) The phase portrait {( (Z k,j ), (Z k,j ))} l k=0 lies between two concentric circles; dashed and dotted lines represent one and two standard deviations from the mean modulus; (B) The modulus of {Z k,j } l k=0 is bounded from zero; (C) The phase ψ j (k) is a linearly increasing function; (D) Mapping of {Z k,j } l k=0

Figure 3 :
Figure 3: Dependence of cond(Γ 0 ) on the multiplicity m and the size of Γ 0 .Cells marked by asterisks, where the condition number is minimal, are potential candidates to choose a pair of ( m, d).

Figure 4 :
Figure 4: Eigenvalues of the auto-covariance matrix Γ 0 for the signal consisting of a sum of eight exponentials and the constant.The signal is corrupted by either the white Gaussian noise or auto-regressive noise of the first and second orders.

Figure 6 :
Figure 6: (A) Frequencies estimated by NHSSA and ESPRIT for different realizations of the white Gaussian noise.The number of cosines in ESPRIT is assumed to be either 4 or 7. (B) Probability of occurrence for the estimated frequencies.
A) Comparison of the original signal s consisting of the four cosines to the corrupted signal f = s + w, where w is the white Gaussian noise; (B) Comparison of the original signal s and the recovered ones ŝ; (C) Comparison of the original noise w and the recovered noise ŵ.

Figure 8 :
Figure 8: (A) Frequencies estimated by NHSSA and ESPRIT for different realizations of the autoregressive noise of the first order.The number of cosines in ESPRIT is assumed either to be 4 or 7; (B) Probability of occurrence for the estimated frequencies.

Figure 9 :
Figure 9: (A) Frequencies estimated by NHSSA and ESPRIT for different realizations of the autoregressive noise of the second order.The number of cosines in ESPRIT is assumed either to be 4 or 7; (B) Probability of occurrence for the estimated frequencies.
e. the sum of cosines is sampled at m = 100 points.Namely, we consider ν 1 /2π = 0.01 and ν 2 /2π = 0.015, c k = c −k = 0.5 for k = 1, 2, and c 0 = −1; the noise w is assumed white Gaussian with the zero mean E(w) = 0 and Var(w) = 1/16.The signal s and time series f are shown in Figure 10a; the SNR is 15 dB.Estimation of the frequencies by both ESPRIT and NHSSA is shown in Figure 11.Note that ESPRIT(2) with the correct number of cosines fails to estimate both frequencies simultaneously,

Figure 10 :
Figure 10: (A) Comparison of the original signal s consisting of the two cosines with closely lying frequencies to the corrupted signal f = s+w, where w is the white Gaussian noise; (B) Comparison of the original signal s and the recovered ones ŝ; (C) Comparison of the original noise w and the recovered noise ŵ.

Figure 11 :Figure 12 :
Figure 11: (A) Frequencies estimated by NHSSA and ESPRIT for different realizations of the white Gaussian noise.The number of cosines in ESPRIT is assumed to be either 2 or 5; (B) Probability of occurrence for the estimated frequencies.

Table 1 :
Estimation of the frequencies {ν k } by ESPRIT and NHSSA in the case of time series s corrupted by the white Gaussian or autoregressive noise of the first and second order.