R語言中的Theil-Sen回歸分析
來源:拓端數(shù)據(jù)部落
本文約1000字,建議閱讀5分鐘
Theil-Sen估計器是一種在社會科學(xué)中不常用的簡單線性回歸估計器。
在數(shù)據(jù)中所有點之間繪制一條線
計算每條線的斜率
中位數(shù)斜率是回歸斜率
用這種方法計算斜率非??煽?。當(dāng)誤差呈正態(tài)分布且沒有異常值時,斜率與OLS非常相似。
相關(guān)視頻
有幾種獲取截距的方法。如果關(guān)心回歸中的截距,那么知道軟件在做什么是很合理的。
當(dāng)我對異常值和異方差性有擔(dān)憂時,請在上方針對Theil-Sen進(jìn)行簡單線性回歸的評論。
我進(jìn)行了一次模擬,以了解Theil-Sen如何在異方差下與OLS比較。它是更有效的估計器。
library(simglm)library(ggplot2)library(dplyr)library(WRS)# HeteronRep <- 100n.s <- c(seq(50, 300, 50), 400, 550, 750, 1000)samp.dat <- sample((1:(nRep*length(n.s))), 25)lm.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))ts.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))lmt.coefs.0 <- matrix(ncol = 3, nrow = nRep*length(n.s))dat.s <- list()ggplot(dat.frms.0, aes(x = age, y = sim_data)) +geom_point(shape = 1, size = .5) +geom_smooth(method = "lm", se = FALSE) +facet_wrap(~ random.sample, nrow = 5) +labs(x = "Predictor", y = "Outcome",title = "Random sample of 25 datasets from 15000 datasets for simulation",subtitle = "Heteroscedastic relationships")

ggplot(coefs.0, aes(x = n, colour = Estimator)) +geom_boxplot(aes(ymin = q025, lower = q25, middle = q50, upper = q75, ymax = q975), data = summarise(group_by(coefs.0, n, Estimator), q025 = quantile(Slope, .025),q25 = quantile(Slope, .25), q50 = quantile(Slope, .5),q75 = quantile(Slope, .75), q975 = quantile(Slope, .975)), stat = "identity") +geom_hline(yintercept = 2, linetype = 2) + scale_y_continuous(breaks = seq(1, 3, .05)) +labs(x = "Sample size", y = "Slope",title = "Estimation of regression slope in simple linear regression under heteroscedasticity",subtitle = "1500 replications - Population slope is 2",caption = paste("Boxes are IQR, whiskers are middle 95% of slopes","Both estimators are unbiased in the long run, however, OLS has higher variability",sep = "\n"))

原文鏈接:http://tecdat.cn/?p=10080
編輯:于騰凱
校對:林亦霖
評論
圖片
表情
