> vars<-c("mpg","hp","wt") > head(mtcars[vars]) mpg hp wt Mazda RX4 21.0 110 2.620 Mazda RX4 Wag 21.0 110 2.875 Datsun 710 22.8 93 2.320 Hornet 4 Drive 21.4 110 3.215 Hornet Sportabout 18.7 175 3.440 Valiant 18.1 105 3.460 > summary(mtcars[vars]) mpg hp wt Min. :10.40 Min. : 52.0 Min. :1.513 1st Qu.:15.43 1st Qu.: 96.5 1st Qu.:2.581 Median :19.20 Median :123.0 Median :3.325 Mean :20.09 Mean :146.7 Mean :3.217 3rd Qu.:22.80 3rd Qu.:180.0 3rd Qu.:3.610 Max. :33.90 Max. :335.0 Max. :5.424 </pre> >**`summary()`** 函數(shù)提供最小值、大值、四分位數(shù)和數(shù)值型變量的均值,以及因子向量和邏輯型向量的頻數(shù)統(tǒng)計(jì)
> library(Hmisc) 載入需要的程輯包:grid 載入需要的程輯包:lattice 載入需要的程輯包:survival 載入需要的程輯包:Formula 載入需要的程輯包:ggplot2 載入程輯包:‘Hmisc’ The following objects are masked from ‘package:base’: format.pval, round.POSIXt, trunc.POSIXt, units > describe(mtcars[vars]) mtcars[vars] 3 Variables 32 Observations ------------------------------------------------------------------------------------------------ mpg n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95 32 0 25 1 20.09 12.00 14.34 15.43 19.20 22.80 30.09 31.30 lowest : 10.4 13.3 14.3 14.7 15.0, highest: 26.0 27.3 30.4 32.4 33.9 ------------------------------------------------------------------------------------------------ hp n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95 32 0 22 1 146.7 63.65 66.00 96.50 123.00 180.00 243.50 253.55 lowest : 52 62 65 66 91, highest: 215 230 245 264 335 ------------------------------------------------------------------------------------------------ wt n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95 32 0 29 1 3.217 1.736 1.956 2.581 3.325 3.610 4.048 5.293 lowest : 1.513 1.615 1.835 1.935 2.140, highest: 3.845 4.070 5.250 5.345 5.424 ------------------------------------------------------------------------------------------------
> library(pastecs) 載入需要的程輯包:boot > stat.desc(mtcars[vars]) mpg hp wt nbr.val 32.0000000 32.0000000 32.0000000 nbr.null 0.0000000 0.0000000 0.0000000 nbr.na 0.0000000 0.0000000 0.0000000 min 10.4000000 52.0000000 1.5130000 max 33.9000000 335.0000000 5.4240000 range 23.5000000 283.0000000 3.9110000 sum 642.9000000 4694.0000000 102.9520000 median 19.2000000 123.0000000 3.3250000 mean 20.0906250 146.6875000 3.2172500 SE.mean 1.0654240 12.1203173 0.1729685 CI.mean.0.95 2.1729465 24.7195501 0.3527715 var 36.3241028 4700.8669355 0.9573790 std.dev 6.0269481 68.5628685 0.9784574 coef.var 0.2999881 0.4674077 0.3041285
> library(psych) 載入程輯包:‘psych’ The following object is masked from ‘package:boot’: logit > describe(mtcars[vars]) vars n mean sd median trimmed mad min max range skew kurtosis se mpg 1 32 20.09 6.03 19.20 19.70 5.41 10.40 33.90 23.50 0.61 -0.37 1.07 hp 2 32 146.69 68.56 123.00 141.19 77.10 52.00 335.00 283.00 0.73 -0.14 12.12 wt 3 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91 0.42 -0.02 0.17
在比較多組個(gè)體或觀測(cè)時(shí),關(guān)注的焦點(diǎn)經(jīng)常是各組的描述性統(tǒng)計(jì)信息,而不是樣本整體的描述性統(tǒng)計(jì)信息。同樣地,在R中完成這個(gè)任務(wù)有若干種方法。我們將以獲取變速箱類型各水平的描述性統(tǒng)計(jì)量開始。
成都創(chuàng)新互聯(lián)公司始終堅(jiān)持【策劃先行,效果至上】的經(jīng)營理念,通過多達(dá)十多年累計(jì)超上千家客戶的網(wǎng)站建設(shè)總結(jié)了一套系統(tǒng)有效的推廣解決方案,現(xiàn)已廣泛運(yùn)用于各行各業(yè)的客戶,其中包括:石涼亭等企業(yè),備受客戶贊許。> aggregate(mtcars[vars], by=list(am=mtcars$am),mean) am mpg hp wt 1 0 17.1 160 3.77 2 1 24.4 127 2.41 > aggregate(mtcars[vars],by=list(am=mtcars$am),sd) am mpg hp wt 1 0 3.83 53.9 0.777 2 1 6.17 84.1 0.617 </pre> >由上面的分析結(jié)果可看出,am有兩個(gè)值,根據(jù)am的兩個(gè)值將`mtcars`數(shù)據(jù)集分為兩組,得出上面的`mpg`, `hp`, `wt` 的平均值以及標(biāo)準(zhǔn)差。<br> >其中, **`list(am=mtcars$am)`** 的使用,將 `am` 列標(biāo)注為一個(gè)更有幫助的列標(biāo)簽,而非 `Group.1`。<br> >遺憾的是,aggregate()僅允許在每次調(diào)用中使用平均數(shù)、標(biāo)準(zhǔn)差這樣的單返回值函數(shù)。它無法一次返回若干個(gè)統(tǒng)計(jì)量。要完成這項(xiàng)任務(wù),可以使用 **`by()`** 函數(shù)。
> by(mtcars[vars],mtcars$am,summary) mtcars$am: 0 mpg hp wt Min. :10.4 Min. : 62 Min. :2.46 1st Qu.:14.9 1st Qu.:116 1st Qu.:3.44 Median :17.3 Median :175 Median :3.52 Mean :17.1 Mean :160 Mean :3.77 3rd Qu.:19.2 3rd Qu.:192 3rd Qu.:3.84 Max. :24.4 Max. :245 Max. :5.42 --------------------------------------------------------------------------------------------------- mtcars$am: 1 mpg hp wt Min. :15.0 Min. : 52 Min. :1.51 1st Qu.:21.0 1st Qu.: 66 1st Qu.:1.94 Median :22.8 Median :109 Median :2.32 Mean :24.4 Mean :127 Mean :2.41 3rd Qu.:30.4 3rd Qu.:113 3rd Qu.:2.78 Max. :33.9 Max. :335 Max. :3.57
格式:summaryBy(formula, dataframe, FUN=function)
> **`formula`** 支持格式:<br> > **var1 + var2 + var3 + …… +varN ~ groupvar1 + groupvar2 + …… +groupvarN**<br> > **`~`** 左邊的變量是需要分析的數(shù)值型變量,右側(cè)的變量是類別性的分組變量。<br> > **`function`** 可為任何內(nèi)建或用戶自編的R函數(shù)。 <pre> > library(doBy) 載入需要的程輯包:survival 載入程輯包:‘survival’ The following object is masked from ‘package:boot’: aml > summaryBy(mpg+hp+wt~am,data=mtcars,FUN=mystats) am mpg.n mpg.mean mpg.stdev mpg.skew mpg.kurtosis hp.n hp.mean hp.stdev hp.skew hp.kurtosis wt.n wt.mean wt.stdev 1 0 19 17.14737 3.833966 0.01395038 -0.8031783 19 160.2632 53.90820 -0.01422519 -1.2096973 19 3.768895 0.7774001 2 1 13 24.39231 6.166504 0.05256118 -1.4553520 13 126.8462 84.06232 1.35988586 0.5634635 13 2.411000 0.6169816 wt.skew wt.kurtosis 1 0.9759294 0.1415676 2 0.2103128 -1.1737358
> library(reshape) > dstats <- function(x) (c(n=length(x), mean=mean(x), sd=sd(x)) + ) > dfm <- melt(mtcars, measure.vars=c("mpg","hp","wt"),id.vars=c("am","cyl")) > cast(dfm,am+cyl+variable~.,dstats) am cyl variable n mean sd 1 0 4 mpg 3 22.900000 1.4525839 2 0 4 hp 3 84.666667 19.6553640 3 0 4 wt 3 2.935000 0.4075230 4 0 6 mpg 4 19.125000 1.6317169 5 0 6 hp 4 115.250000 9.1787799 6 0 6 wt 4 3.388750 0.1162164 7 0 8 mpg 12 15.050000 2.7743959 8 0 8 hp 12 194.166667 33.3598379 9 0 8 wt 12 4.104083 0.7683069 10 1 4 mpg 8 28.075000 4.4838599 11 1 4 hp 8 81.875000 22.6554156 12 1 4 wt 8 2.042250 0.4093485 13 1 6 mpg 3 20.566667 0.7505553 14 1 6 hp 3 131.666667 37.5277675 15 1 6 wt 3 2.755000 0.1281601 16 1 8 mpg 2 15.400000 0.5656854 17 1 8 hp 2 299.500000 50.2045815 18 1 8 wt 2 3.370000 0.2828427
分布特征的數(shù)值刻畫的確很重要,但是這并不能代替視覺呈現(xiàn)。對(duì)于定量變量,我們有直方 圖(6.3節(jié))、密度圖(6.4節(jié))、箱線圖(6.5節(jié))和點(diǎn)圖(6.6節(jié))。它們都可以讓我們洞悉那些依 賴于觀察一小部分描述性統(tǒng)計(jì)量時(shí)忽略的細(xì)節(jié)。
目前我們考慮的函數(shù)都是為定量變量提供概述的。下一節(jié)中的函數(shù)則允許考察類別型變量的 分布。
> mytable<-with(Arthritis,table(Improved)) > mytable Improved None Some Marked 42 14 28 > prop.table(mytable) Improved None Some Marked 0.5000000 0.1666667 0.3333333 > prop.table(mytable)*100 Improved None Some Marked 50.00000 16.66667 33.33333
> attach(Arthritis) > mytable1<-xtabs(~Treatment+Improved,data=Arthritis) > mytable1 Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21 > margin.table(mytable1,1) Treatment Placebo Treated 43 41 > margin.table(mytable1,2) Improved None Some Marked 42 14 28 > prop.table(mytable,2) Error in if (d2 == 0L) { : 需要TRUE/FALSE值的地方不可以用缺少值 > prop.table(mytable1,2) Improved Treatment None Some Marked Placebo 0.6904762 0.5000000 0.2500000 Treated 0.3095238 0.5000000 0.7500000 > addmargins(mytable1) Improved Treatment None Some Marked Sum Placebo 29 7 7 43 Treated 13 7 21 41 Sum 42 14 28 84 > addmargins(prop.table(mytable1)) Improved Treatment None Some Marked Sum Placebo 0.34523810 0.08333333 0.08333333 0.51190476 Treated 0.15476190 0.08333333 0.25000000 0.48809524 Sum 0.50000000 0.16666667 0.33333333 1.00000000 > addmargins(prop.table(mytable1),2) Improved Treatment None Some Marked Sum Placebo 0.34523810 0.08333333 0.08333333 0.51190476 Treated 0.15476190 0.08333333 0.25000000 0.48809524
table(A,B)
;
xtabs(~A+B,data=mydata)
:~
符號(hào)右方出現(xiàn)的為要進(jìn)行交叉分類的變量,以+
作為分隔;
margin.table(table(A,B) or xtabs.table(~A+B),no.ofvariables)
: 生成邊際頻數(shù)和比例
prop.table(table(A,B),no.ofvariables)
: 生成比例
addmargins(table(A,B) or xtabs.table(~A+B) or prop.table(mytable1), no.ofvariables)
: 增加第幾個(gè)變量的合計(jì),如果不加no.ofvariables
則都加;
> mytable2<-xtabs(~Treatment+Sex+Improved,data=Arthritis) > mytable2 , , Improved = None Sex Treatment Female Male Placebo 19 10 Treated 6 7 , , Improved = Some Sex Treatment Female Male Placebo 7 0 Treated 5 2 , , Improved = Marked Sex Treatment Female Male Placebo 6 1 Treated 16 5 > ftable(mytable2) Improved None Some Marked Treatment Sex Placebo Female 19 7 6 Male 10 0 1 Treated Female 6 5 16 Male 7 2 5 > margin.table(mytable2,1) Treatment Placebo Treated 43 41 > margin.table(mytable2,2) Sex Female Male 59 25 > margin.table(mytable2,2) Sex Female Male 59 25 > margin.table(mytable2,3) Improved None Some Marked 42 14 28 > margin.table(mytable2,c(1,2,3)) , , Improved = None Sex Treatment Female Male Placebo 19 10 Treated 6 7 , , Improved = Some Sex Treatment Female Male Placebo 7 0 Treated 5 2 , , Improved = Marked Sex Treatment Female Male Placebo 6 1 Treated 16 5 > margin.table(mytable2,c(1,3)) Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21 > ftable(prop.table(mytable2,c(2,3)) + ) Improved None Some Marked Treatment Sex Placebo Female 0.7600000 0.5833333 0.2727273 Male 0.5882353 0.0000000 0.1666667 Treated Female 0.2400000 0.4166667 0.7272727 Male 0.4117647 1.0000000 0.8333333 > ftable(addmargins(prop.table(mytable2,c(1,2)),3))*100 Improved None Some Marked Sum Treatment Sex Placebo Female 59.375000 21.875000 18.750000 100.000000 Male 90.909091 0.000000 9.090909 100.000000 Treated Female 22.222222 18.518519 59.259259 100.000000 Male 50.000000 14.285714 35.714286 100.000000
> mytable3<-xtabs(~Treatment+Improved,data=Arthritis);mytable3 Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21 > chisq.test(mytable3) Pearson's Chi-squared test data: mytable3 X-squared = 13.055, df = 2, p-value = 0.001463
> mytable4<-xtabs(~Treatment+Improved,data=Arthritis);mytable4 Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21 > fisher.test(mytable4) Fisher's Exact Test for Count Data data: mytable4 p-value = 0.001393 alternative hypothesis: two.sided
> mytable5<-xtabs(~Treatment+Improved+Sex, data=Arthritis) > mantelhaen.test(mytable5) Cochran-Mantel-Haenszel test data: mytable5 Cochran-Mantel-Haenszel M^2 = 14.632, df = 2, p-value = 0.0006647
另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn,海內(nèi)外云服務(wù)器15元起步,三天無理由+7*72小時(shí)售后在線,公司持有idc許可證,提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案,具有“安全穩(wěn)定、簡(jiǎn)單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢(shì),專為企業(yè)上云打造定制,能夠滿足用戶豐富、多元化的應(yīng)用場(chǎng)景需求。
新聞標(biāo)題:R的一些統(tǒng)計(jì)分析包工具-創(chuàng)新互聯(lián)
網(wǎng)站網(wǎng)址:http://jinyejixie.com/article30/egjpo.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供營銷型網(wǎng)站建設(shè)、靜態(tài)網(wǎng)站、企業(yè)建站、網(wǎng)站設(shè)計(jì)公司、響應(yīng)式網(wǎng)站、網(wǎng)站營銷
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容
移動(dòng)網(wǎng)站建設(shè)知識(shí)