Multiple outliers in circular regression
Searched over 200M research papers
Research Analysis by Consensus
Consensus Meter
Multiple Outliers in Circular Regression: Detection Methods and Techniques
Introduction to Circular Regression and Outliers
Circular regression models are used to analyze data where the response and/or predictor variables are measured in angles, such as time of day or wind direction. Detecting outliers in these models is crucial because outliers can significantly distort parameter estimates and lead to incorrect inferences. This article synthesizes various methods and techniques for detecting multiple outliers in circular regression models.
Trigonometric Transformation and Row Deletion Approach
One effective method for detecting outliers in circular regression involves transforming circular residuals into linear measures using a trigonometric function. This transformation allows for the application of traditional outlier detection techniques. The row deletion approach is then used to identify observations that most affect the measure, flagging them as potential outliers. This method has been applied successfully to circadian data, demonstrating its practical utility.
Clustering-Based Outlier Detection Methods
Agglomerative Hierarchical Clustering
Agglomerative hierarchical clustering algorithms, such as single-linkage and average-linkage methods, have been developed to detect multiple outliers in circular regression models. These methods start with each data point in its own cluster and merge clusters based on a similarity criterion until all data points are grouped into one cluster. The single-linkage method, in particular, has shown high effectiveness in detecting multiple outliers with lower masking and swamping effects .
Comparative Studies and Performance
Comparative studies of clustering-based methods have demonstrated that these algorithms perform well across various outlier scenarios and contamination levels. For instance, Satari’s S-SL algorithm has been found to be effective regardless of sample size and error concentration parameters. Additionally, the use of different distance measures, such as Euclidean distance, has been explored to enhance the performance of clustering algorithms in detecting outliers.
DFFITc Statistic for Multiple Circular Regression
The DFFITc statistic, an extension of the DFFITS statistic used in linear regression, has been adapted for multiple circular regression models. This statistic helps identify outliers by measuring the influence of each observation on the fitted values. Simulation studies have shown that the DFFITc statistic is effective in detecting outliers in models with more than one independent circular variable.
Graphical Techniques and COVRATIO Statistic
Graphical techniques, combined with numerical methods like the COVRATIO statistic, have also been employed to detect outliers in multiple circular regression models. The COVRATIO statistic, originally used in linear regression, has been extended to circular regression to assess the impact of outliers on the covariance matrix of the parameters. This method has proven effective in identifying outliers in datasets such as wind data .
Robust Circular Distance
A novel approach involves using robust circular distance to identify multiple outliers in simple circular regression models. This method calculates the distance between circular residuals and a circular location parameter, providing a robust measure that minimizes the effects of masking and swamping. Simulation studies have shown that this method has a high proportion of detected outliers and low rates of masking and swamping, making it a reliable tool for outlier detection.
Non-Parametric Methods
For non-parametric linear-circular regression, robust outlier detection methods based on the circular median have been proposed. These methods, which include Nadaraya-Watson and local linear regression techniques, perform well in medium to high contamination scenarios. The local linear estimation method, in particular, has been found to fit data better when the response variable contains outliers.
Conclusion
Detecting multiple outliers in circular regression models is a complex but essential task to ensure accurate model development and prediction. Various methods, including trigonometric transformations, clustering algorithms, DFFITc statistics, graphical techniques, robust circular distances, and non-parametric methods, have been developed and tested. Each method has its strengths and is suitable for different types of circular regression models and outlier scenarios. By employing these techniques, researchers can effectively identify and mitigate the impact of outliers in circular data.
Sources and full results
Most relevant research papers on this topic