Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201919

How to take an Average of + or - SD

$
0
0

I have data where the [1] dependent variable is taken from a controlled and independent variable [2] then independent variable. The mean and SD are taken from [1].

(a) and this is the result of SD:

   Year        Species Pop_Index                      
1  1994   Corn Bunting  2.082483                        
5  1998   Corn Bunting  2.048155                     
10 2004   Corn Bunting  2.061617                      
15 2009   Corn Bunting  2.497792                       
20 1994      Goldfinch  1.961236 
25 1999      Goldfinch  1.995600 
30 2005      Goldfinch  2.101403 
35 2010      Goldfinch  2.138496 
40 1995 Grey Partridge  2.162136

(b) And the result of mean:

   Year        Species Pop_Index
1  1994   Corn Bunting  2.821668
5  1998   Corn Bunting  2.916975
10 2004   Corn Bunting  2.662797
15 2009   Corn Bunting  4.171538
20 1994      Goldfinch  3.226108
25 1999      Goldfinch  2.452807
30 2005      Goldfinch  2.954816
35 2010      Goldfinch  3.386772
40 1995 Grey Partridge  2.207708

(c) This is the Code for SD:

structure(list(Year = c(1994L, 1998L, 2004L, 2009L, 1994L, 1999L, 
2005L, 2010L, 1995L), Species = structure(c(1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 3L), .Label = c("Corn Bunting", "Goldfinch", "Grey Partridge"
), class = "factor"), Pop_Index = c(2.0824833420524, 2.04815530904537, 
2.06161673349657, 2.49779159320587, 1.96123572400404, 1.99559986715288, 
2.10140285528351, 2.13849611018009, 2.1621364896722)), row.names = c(1L, 
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L), class = "data.frame")

(d) This is the code for mean:

structure(list(Year = c(1994L, 1998L, 2004L, 2009L, 1994L, 1999L, 
2005L, 2010L, 1995L), Species = structure(c(1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 3L), .Label = c("Corn Bunting", "Goldfinch", "Grey Partridge"
), class = "factor"), Pop_Index = c(2.82166841455814, 2.91697463618566, 
2.66279663056763, 4.17153795031277, 3.22610845074252, 2.45280743991572, 
2.95481600904799, 3.38677188055508, 2.20770835158744)), row.names = c(1L, 
5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L), class = "data.frame")

(e) And this is the code used to take the mean of meanPop_Index over the years:

df2 <- aggregate(Pop_Index ~ Year, df1, mean)

(f) And this is the result:

  Year Pop_Index
1 1994  3.023888
2 1995  2.207708
3 1998  2.916975
4 1999  2.452807
5 2004  2.662797
6 2005  2.954816
7 2009  4.171538
8 2010  3.386772

Now it wouldn't make sense for me to take the average of SD by doing the same procedure as before with the functionmean or SD.

I have looked online and found someone in a similar predicament with this data:

Month: January
Week 1 Mean: 67.3 Std. Dev: 0.8
Week 2 Mean: 80.5 Std. Dev: 0.6
Week 3 Mean: 82.4 Std. Dev: 0.8

And the response:
"With equal samples size, which is what you have, the standard deviation you are looking for is:
Sqrt [ (.64 + .36 + .64) / 3 ] = 0.739369"

How would I do this in R, or is there another way of doing this? Because I want to plot error bars and the dataset plotted is like that of (f), and it would be absurd to plot the SD of (a) against this because the vector lengths would differ.


Sample from original data.frame with a few columns and many rows not included:

structure(list(GRIDREF = structure(c(1L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L, 10L), .Label = c("SP8816", "SP9212", "SP9322", 
"SP9326", "SP9440", "SP9513", "SP9632", "SP9939", "TF7133", "TF9437"
), class = "factor"), Lat = c(51.83568688, 51.83568688, 51.79908899, 
51.88880822, 51.92476157, 52.05042795, 51.80757645, 51.97818159, 
52.04057068, 52.86730817, 52.89542895), Long = c(-0.724233561, 
-0.724233561, -0.667258035, -0.650074995, -0.648996758, -0.630626734, 
-0.62349292, -0.603710436, -0.558026241, 0.538966197, 0.882597783
), Year = c(2006L, 2007L, 1999L, 2004L, 1995L, 2009L, 2011L, 
2007L, 2011L, 1996L, 2007L), Species = structure(c(4L, 7L, 5L, 
10L, 4L, 6L, 8L, 3L, 2L, 9L, 1L), .Label = c("Blue Tit", "Buzzard", 
"Canada Goose", "Collared Dove", "Greenfinch", "Jackdaw", "Linnet", 
"Meadow Pipit", "Robin", "Willow Warbler"), class = "factor"), 
    Pop_Index = c(0L, 0L, 2L, 0L, 1L, 0L, 1L, 4L, 0L, 0L, 8L)), row.names = c(1L, 
100L, 1000L, 2000L, 3000L, 4000L, 5000L, 6000L, 10000L, 20213L, 
30213L), class = "data.frame")

A look into this data.frame:

      GRIDREF      Lat       Long Year        Species Pop_Index TempJanuary
1      SP8816 51.83569 -0.7242336 2006  Collared Dove         0    2.128387
100    SP8816 51.83569 -0.7242336 2007         Linnet         0    4.233226
1000   SP9212 51.79909 -0.6672580 1999     Greenfinch         2    5.270968
2000   SP9322 51.88881 -0.6500750 2004 Willow Warbler         0    4.826452
3000   SP9326 51.92476 -0.6489968 1995  Collared Dove         1    4.390322
4000   SP9440 52.05043 -0.6306267 2009        Jackdaw         0    2.934516
5000   SP9513 51.80758 -0.6234929 2011   Meadow Pipit         1    3.841290
6000   SP9632 51.97818 -0.6037104 2007   Canada Goose         4    7.082580
10000  SP9939 52.04057 -0.5580262 2011        Buzzard         0    3.981290
20213  TF7133 52.86731  0.5389662 1996          Robin         0    3.532903
30213  TF9437 52.89543  0.8825978 2007       Blue Tit         8    7.028710

Viewing all articles
Browse latest Browse all 201919

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>