I had a lot of confusion and trouble in correctly pursuing the Self Similarity and Long Range Dependence analysis of Vehicular Mobility Data. Finally, I sat down and read few papers that specifically focus on preparing dataset, estimating hurst parameter, and plots. Following is the gist of those papers that I read.
Question Asked:
1) Stationary vs. NonStationary Data
2) Decomposing Singnal
3) Periodicity in the Data set that affects LRS analysis
4) Accurate way to measure Hurst Parameter
LongRange Dependence in a Changing Internet Traffic Mix
The concepts and definitions for selfsimilarity and longrange dependence given in section 1 assume that the time series of arrivals is secondorder stationary (a.k.a. weakly stationary). Loosely speaking, this means that the variance of the time series (and more generally, its covariance structure) does not change over time, and that the mean is constant (so the time series can always be transformed into a zeromean stochastic process by simply subtracting the mean). The obvious question this raises is whether Internet traffic is stationary. This is certainly not the case at the scales in which the timeofday effects are important (traffic sharply drops at night), so Internet traffic is usually characterized as selfsimilar and longrange dependent only for those scales between a few hundred milliseconds and a few thousand seconds.
For example, the UNC link showed an increase in traffic intensity during morning hours as more and more people become active Internet users. However, it is still useful to study time series using the selfsimilarity and longrange dependence framework, and this is possible using methods that are robust to nonstationarities in the data (i.e., methods that first remove trends and other effects from the data to obtain a secondorder stationary time series). In some cases, nonstationarities are so complex, that conventional models fail to accommodate them, which can result in estimates of the Hurst parameter greater than or equal to 1. We examine some of these cases in section 5.
The waveletbased tools for analysis of time series are important because they have been shown to provide a better estimator (and confidence intervals) than other approaches for the Hurst parameter [14]. These methods also are robust in the presence of certain nonstationary behavior (notably linear trends) in the time series.
SiZer provides a useful method for finding statistically significant local trends in time series and is especially useful for finding important underlying structure in time series with complex structure. SiZer is based on local linear smooths of the data, shown as curves corresponding to different window widths in the top panel of Figure 4 (a random sample of time series values is also shown using dots). These curves are very good at revealing potential local trends in the time series, and provide a visualization of structure in the time series at different scales, i.e. at different window widths used for smoothing. See [10] for an introduction to local linear smoothing. Two important issues are: which of these curves (i.e., scales) is the right one, and which of the many visible trends (at a variety of different scales) are statistically significant (thus representing important underlying structure) as opposed to reflecting natural variation?
Summary: This paper provide an insight into the Sizer tool which can used to detect tends in the traffic. however, I need to look further into this paper and see how actually this tool be used. there is also an indication that trends can be the part of the analysis. but i am a bit confused here.
Wavelet Analysis of LongRangeDependent Trafﬁc
Patrice Abry and Darryl Veitch
It can be implemented very efﬁciently allowing the direct analysis of very large data sets, and is highly robust against the presence of deterministic trends, as well as allowing their detection and identification.
There are always come conflicting studies regarding the nature of independence in the dataset.
Statistical Methods for Data with LongRange Dependence
by Jan Beran
A typical data set where such a model seems to fit is the record of the Nile river minima (Figure 1). This data played a key role in the discovery of longrange dependence in hydrological data by the famous hydrol ogist Hurst (1951). The plot of the data reveals several interesting features: At first sight the data might seem nonstationary, in particular parts of the data seem to have local trends or periodicities and the expected value seems to be changing slowly. A closer look at the whole series however shows that both, trends and periodicities, change with time in an irregular way and the overall mean seems to be constant. Such behaviour is typical for stationary processes with long memory.
A Practical Guide to Measuring the Hurst Parameter
Richard G. Clegg
This paper describes, in detail, techniques for measuring the Hurst parameter. Measurements are given on artificial data both in a raw form and corrupted in various ways to check the robustness of the tools in question. Measurements are also given on real data, both new data sets and wellstudied data sets. All data and tools used are freely available for download along with simple “recipes” which any researcher can follow to replicate these measurements.
2Measuring the Hurst Parameter
While the Hurst parameter is perfectly welldefined mathematically, measuring it is problematic. The data must be measured at high lags/low frequencies where fewer readings are available. Early estimators were biased and converged only slowly as the amount of data available increased. All estimators are vulnerable to trends in the data, periodicity in the data and other sources of corruption. Many estimators assume specific functional forms for the underlying model and perform poorly if this is misspecified. The techniques in this paper are chosen for a variety of reasons. The R/S parameter, aggregated variance and periodogram are wellknown techniques which have been used for some time in measurements of the Hurst parameter. The local Whittle and wavelet techniques are newer techniques which generally fare well in comparative studies. All the techniques chosen have freely available code which can be used with free software to esti mate the Hurst parameter.
The problems with reallife data are worse than those faced when measuring artificial data. Real life data is likely to have periodicity (due to, for example, daily usage patterns), trends and perhaps quantisation effects if readings are taken to a given precision. The naive researcher taking a data set and running it through an offtheshelf method for estimating the Hurst parameter is likely to end up with a misleading answer or possibly several different misleading answers.
Various techniques are tried to filter reallife traces in addition to making measurements purely on the raw data. These methods have been selected from the literature as commonly used by researchers in the field. Often in such cases, a high pass filter would be used to remove periodicity and trends, however, since LRD measurements are most important at lowfrequency that is an obvi ously inappropriate technique. The techniques used to preprocess data before estimating H are listed below.
• Transform to log of original data (only appropriate if data is positive).
• Removal of mean and linear trend (that is, subtract the best fit line Y = at + b for constant a and b).
• Removal of high order bestfit polynomial of degree ten (the degree ten was chosen after higher degrees showed evidence of overfitting).
LongRange Dependence: Now you see it, now you don’t!
By Thomas Karagiannis CSE Dept., UC Riverside tkarag@cs.ucr.edu
Michalis Faloutsos CSE Dept., UC Riverside michalis@cs.ucr.edu
Rudolff H. Riedi ECE Dept., Rice University riedi@rice.edu
This paper is very good in illustrating the possible difficulties that one may face in realizing the real goals of achieving LRD and SS. Some of the main focus of this paper are:

Not one single estimator is good in estimating LRD. for example, Whittle although provides robust result.

All of most of the estimator wrongly judge Hurst value in case of periodicity present in the data. The results are obscure by the presence of such signal.

Using interpolation one can smooth the time series that is affected by missing data, removal outliers etc.

It is very important to remove the periodic component. This can be done by processing and decomposing the signal.

Authors recommend to plot the signal at various scales to reveal different characteristics.
Over the last few years, the network community has started to make heavy use of novel concepts such as self similarity and LongRange Dependence (LRD). Despite their wide use, there is still much confusion regarding the identifi cation of such phenomena in real network traffic data. In this paper, we show that estimating LongRange Dependence is not straightforward: there is no systematic or definitive methodol ogy. There exist several estimating methodologies, but they can give misleading and conflicting estimates. More specifically, we arrive at several conclusions that could provide guidelines for a systematic approach to LRD. First, longrange dependence may exist even, if the estimators have different estimates of the Hurst exponent in the interval 0.51. Second, longrange dependence is unlikely to exist, if there are several estimators that fail to es timate the Hurst exponent. Third, we show that periodicity can obscure the analysis of a signal giving partial evidence of long range dependence. Fourth, the Whittle estimator is the most accurate in finding the exact value when LRD exists, but it can be fooled easily by periodicity. As a casestudy, we analyze real roundtrip time data. We find and remove a periodic component from the signal, before we can identify longrange dependence in the remaining signal.
To extract the useful information from the raw RTT data, we applied typical time series methodologies like, interpola tion to recover from loss (so that our signal would not have discontinuities), removal of outliers and smoothing. Ap plying the estimators in the RTT signal, resulted in non consistent estimations, in the sense that some of the estima tors showed longrange dependence for some of our datasets.
The evaluation of each estimator is achieved through three different Fractional Gaussian Noise (FGN) generators. FGN generators are often used to synthesize longrange depen dence series with a specific Hurst value. The first is devel oped by Paxson [12], while the second is described in [13]. The third is based in the DurbinLevinson coefficients. Due to space limitation, we only present results from the genera tor developed by Paxson. However, findings are similar for the other two generators.
We show that the estimators are quite sensitive and can be deceived to report LRD. In particular we apply the esti mators in synthesized signals such as cosine functions with noise or signals that show trend.
The definition of LRD assumes stationary sig nals. In this case, we intend to identify the impact of nonstationarity on the estimators. Thus, we created various signals with slow and fast decaying or increas ing trends. Such signals include combination of White Gaussian Noise and cosine functions with trend. In every case only Whittle gives an estimation for Hurst which is always .99. Also the Periodogram estimates Hurst to be greater than 1.
A reporting of the Hurst exponent is meaningful, only if it is accompanied by the method that was used, as well as the confidence intervals or correlation coefficient.
•Researchers should not rely only on one estimator in deciding the existence of longrange dependence (e.g. [14]). As we saw, several of the estimators (Whittle, Pe riodogram) can be overly optimistic in identifying long range dependence.
•For efficient characterization, it may be necessary to process and decompose the signal.
•A visual inspection of the signal can be very useful, pro viding a qualitative analysis and revealing many of its features, like periodicity. We recommend plotting the signal at several different scales, since each scale can reveal different characteristics.