# Sta303 Assignment Submission

STA303/1002: Assignment 1∗ Craig Burkett February 3, 2016 Due in the tutorial on Thursday, Feb 4th . Please hand it in on 8.5 x 11 inch paper, stapled in the upper left, with no other packaging and no title page. Please try to make this assignment look like something you might hand in to your boss at a job. In particular, it is inappropriate to hand in pages of R output without explanation or interpretation. Quote relevant numbers from your R output as part of your solutions. The only direct R output you should submit with the assignment are relevant plots. You must append your R program file to the end of the assignment, formatted nicely with a fixed-width font. You do not need to append all of your console output. No assignment will be marked without a program file, and marks will be deducted if the instructions above are not followed. Any time that I use the words {Present, State, Give, Show, Predict, Display}, you must supply that plot/table/output/prediction in your submission. If I say {Produce, Make}, you do not need to show what you produced or made, but you still need to do it. ∗ This assignment was prepared by Craig Burkett of the Department of Statistics, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. 1 A Ants (20 marks) Let’s investigate some data collected by Peter Nonacs from UCLA. The dataset can be found on Portal, and a data dictionary and some relevant background information can be found here. You’ll need to do some mild cleaning first. In particular, you should set appropriate levels for the size factor, and make Colony a factor as well. Distance you can leave as numeric. 1. Let’s see if thatch ants are different from seed ants, on average. (5 mks) (a) Present a layered histogram of mass (in mg) with one layer for Thatch ants and one for Seed ants. You should have two histograms on one plot, and you can use colour or shading to differentiate them. Add a legend to say which is which. This is in Figure 1. Figure 1: layered histogram of mass (in mg) for Thatch ants and Seed ants. (a) 150 100 count Type Seed Thatch 50 0 0 50 100 Mass 150 200 (b) Do a simple analysis to determine if the observed difference is statistically significant. If it is, give the magnitude, direction and evidence of the effect. 2 A two-sample t-test to compare ant mass suggests that the Thatch ants have a mass that is 41 mg higher, on average, than Seed ants (95% CI = (39.05, 42.93), p < 0.0001). The R output from the test is given below. Welch Two Sample t-test data: ant.t$Mass and ant.s$Mass t = 41.451, df = 1770, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 39.04723 42.92584 sample estimates: mean of x mean of y 95.23264 54.24610 (c) Repeat the previous two parts using Headwidth in mm instead of Mass. This is in Figure 2. Figure 2: layered histogram of Headwidth in mm for Thatch ants and Seed ants. (a) 100 count Type Seed Thatch 50 0 1.00 1.25 1.50 1.75 Headwidth..mm. 3 2.00 A two-sample t-test to compare ant headwidth suggests that Thatch ants have an average headwidth that is 0.074652 mm greater that Seed ants (95% CI = (0.05771, 0.09159), p < 0.0001). The R output from the test is given below. Two Sample t-test data: ant.t$Headwidth..mm. and ant.s$Headwidth..mm. t = 8.6419, df = 1770, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.05770902 0.09159374 sample estimates: mean of x mean of y 1.697837 1.623185 2. Let’s look for differences in Mass between several groups. (5 mks) (a) Looking at the Thatch ants only and treating Distance as a factor, present a boxplot of Mass vs. Distance and do some analysis to determine if the average ant Mass differs based on distance traveled. If you decide that the groups are not all the same, follow the analysis up with some Tukey HSD corrected tests to determine which distances are different from each other. The results in Table 1 provides no evidence to suggest the groups are not all the same. We do not reject the null hypothesis, and therefore no Tukey HSD corrected tests are required. Boxplot visualizing Mass vs. Distance for different Thatch and Seed ant colonies are in Figure 3. factor(Distance) Residuals Df Sum Sq 4 5610.01 1190 932289.32 Mean Sq F value Pr(>F) 1402.50 1.79 0.1284 783.44 Table 1: ANOVA for Thatch Ants mass vs. distance (b) Repeat the previous analysis using Colony as the independent variable instead. 4 Figure 3 Thatch ants all distances Seed ants all distances ● ● 100 ● ● ● ● ● 80 Mass (mg) ● ● ● ● ● ● ● ● ● ● ● 40 100 ● ● ● ● ● ● ● 20 50 Mass (mg) 150 ● 60 ● ● 1 2 3 4 5 6 7 8 9 11 101 2 23 25 28 3 Colony Colony Thatch ants all colonies Seed ants all colonies 4 ● ● ● 80 ● ● ● ● 60 ● 40 100 Mass (mg) ● 50 ● 20 Mass (mg) 150 ● 100 ● 0 1 4 7 10 0 Distance (m) 5 Distance (m) 5 10 X The results in Table 2 provides no evidence to suggest the groups are not all the same. We do not reject the null hypothesis, and therefore no Tukey HSD corrected tests are required. Colony Residuals Df Sum Sq Mean Sq F value 10 11417.49 1141.75 1.46 1184 926481.84 782.50 Pr(>F) 0.1493 Table 2: ANOVA for Thatch Ants mass vs. colony (c) Repeat the previous two analyses, this time using the Seed ants instead. The results in Table 3 and Table 4 provides evidence to suggest the groups are not all the same. The Tukey HSD corrected test are in the R output given below. It looks like Seed ants going to 0 distance have a higher mass than those going to 5 and 10m. factor(Distance) Residuals Df Sum Sq 2 1950.00 574 104523.05 Mean Sq F value Pr(>F) 975.00 5.35 0.0050 182.10 Table 3: ANOVA for Seed Ants mass vs. distance Colony Residuals Df Sum Sq Mean Sq F value Pr(>F) 7 33593.31 4799.04 37.47 0.0000 569 72879.74 128.08 Table 4: ANOVA for Seed Ants mass vs. colony Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Mass ~ factor(Distance), data = ant.s) $‘factor(Distance)‘ diff lwr upr p adj 5-0 -3.3403883 -6.370741 -0.3100354 0.0265577 6 10-0 -4.2821381 -7.714791 -0.8494854 0.0098107 10-5 -0.9417498 -4.386596 2.5030968 0.7967414 Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Mass ~ Colony, data = ant.s) $Colony 2-101 23-101 25-101 28-101 3-101 4-101 X-101 23-2 25-2 28-2 3-2 4-2 X-2 25-23 28-23 3-23 4-23 X-23 28-25 3-25 4-25 X-25 3-28 4-28 X-28 4-3 X-3 X-4 diff -23.06784610 -12.49390244 -4.24390244 -13.32017363 -19.57507127 -10.59723577 -4.22625538 10.57394366 18.82394366 9.74767248 3.49277483 12.47061033 18.84159072 8.25000000 -0.82627119 -7.08116883 1.89666667 8.26764706 -9.07627119 -15.33116883 -6.35333333 0.01764706 -6.25489764 2.72293785 9.09391825 8.97783550 15.34881589 6.37098039 lwr upr p adj -28.6493351 -17.4863572 0.0000000 -18.5974770 -6.3903279 0.0000000 -9.7261075 1.2383026 0.2658975 -19.1980044 -7.4423428 0.0000000 -25.0387698 -14.1113727 0.0000000 -16.0983761 -5.0960955 0.0000002 -9.5557043 1.1031935 0.2371866 4.2895676 16.8583197 0.0000116 13.1411335 24.5067539 0.0000000 3.6823073 15.8130376 0.0000360 -2.1721842 9.1577339 0.5682539 6.7695311 18.1716896 0.0000000 13.3059982 24.3771833 0.0000000 2.0536353 14.4463647 0.0014939 -7.3752603 5.7227179 0.9999424 -13.2611660 -0.9011717 0.0122970 -4.3164572 8.1097905 0.9832112 2.2060146 14.3292796 0.0009978 -15.0503994 -3.1021430 0.0001269 -20.8983328 -9.7640049 0.0000000 -11.9572478 -0.7494188 0.0139362 -5.4178233 5.4531174 1.0000000 -12.2120477 -0.2977476 0.0317526 -3.2685711 8.7144469 0.8650608 3.2596522 14.9281843 0.0000727 3.3920243 14.5636467 0.0000360 9.9320117 20.7656201 0.0000000 0.9164125 11.8255483 0.0097346 7 3. The last part does not take into account both factors at the same time. Now, let’s model the Mass as a function of both Distance and Colony and look at the Seed ants only. (5 mks) (a) Present an appropriate plot showing how the means of Mass differ for different levels of Colony and Distance. An interaction plot for Colony and Distance is in Figure 4, where mass is plotted against distance for different Colonies. The crossing of the lines for different Colonies suggest the presence of interaction. Figure 4: Interaction plot of Colony and Distance. 75 (a) 70 Colony 60 55 40 45 50 mean of Mass 65 101 4 X 25 3 2 23 28 0 5 10 factor(Distance) (b) Present a 2D table showing the group means of Mass for each combination of Colony and Distance. Round the output to one decimal place. This table is given in Table 5. 8 101 2 23 25 28 3 4 X 0 61.8 41.0 52.0 74.5 53.4 41.3 53.3 72.0 5 70.3 40.4 52.6 55.2 49.5 47.8 54.2 57.1 10 62.2 44.5 48.0 46.6 55.3 53.4 Table 5: Average Mass of Seed Ants for each Distance / Colony combination (c) Do some analysis to determine if Colony and Distance are useful factors to predict the Mass of a Seed ant. Give your conclusion in plain English. The exploratory work and results presented suggest a two-way ANOVA with interaction, with main effects of Colony and Distance. The ANOVA analysis in Table 6 indicate both Colony and Distance are significant in predicting the mass, and each of their effects depends on the level of the other. Df Sum Sq Mean Sq F value Pr(>F) factor(Distance) 2 1950.00 975.00 9.80 0.0001 Colony 7 34667.69 4952.53 49.76 0.0000 factor(Distance):Colony 12 14612.10 1217.68 12.23 0.0000 Residuals 555 55243.26 99.54 Table 6: ANOVA for Interaction of Mass and Distance in Seed Ants 9 4. The real goal of this study was to see if different colonies had different strategies for sending workers out to gather food. A colony may adopt a “worker-centric” mentality where they send more massive ants out to far away locations. A reasonable explanation is that ants foraging at far distances are more likely to encounter predators and starvation, and the increased mass (a proxy for energy, or amount of food consumed by the ant) gives them a higher chance to survive. So, the colony puts a premium on keeping workers alive, which you’d think would be a nobrainer, except there may not be enough energy (food, back at home base) to keep all of these workers alive. In that case, a colony may adopt an “energy-centric” approach, whereby the weaker ants are sent out to far away distances. If they die, it’s not as big of a loss to the colony because they weren’t carrying that much energy with them. In fact, it’s more like a gain, although you won’t catch the ant politicians saying as much! To answer this, it would be nice to see the relationship between Mass and Distance for each colony. In addition, the colony’s strategy may depend on the size of the ants, so we’ll have to take that into account as well. Using ggplot2, show the relationship between Mass and Distance (as a numeric this time) for each size classification factor level, for thatched ant colonies 2 and 11 only. Comment on the foraging strategy for each colony, being as detailed as your plots will allow. (5 mks) Figures 5 (a) and (b) illustrate the relationship between distance and mass for two sample colonies of ants. For the first example colony in Figure 5 (a), the figure suggests that mass tends to increase for small ants as distance increases. This relationship tends to grow smaller and eventually reverse itself as the size of the ant increases. For the second example colony in Figure 5 (b), there is a different trend. For all ant sizes, the colony tends to send its weakest ants out to the farthest distances. This is an example of an energy-centric strategy. 10 Figure 5 (a) 30−34 35−39 40−43 >43 160 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 120 ● ● ● ● ● Mass ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 Distance (b) <30 30−34 35−39 40−43 >43 ● ● ● ● ● 150 ● ● ● ● ● ● Mass ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 2.5 ● 5.0 7.5 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 Distance 11 10.00.0 2.5 5.0 7.5 10.00.0 2.5 5.0 7.5 10.0 Figure 6 (a) 2 11 ● ● ● ● ● 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Mass ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Size ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 2.5 5.0 7.5 10.0 0.0 Distance 12 2.5 ● 5.0 7.5 10.0 ● <30 ● 30−34 ● 35−39 ● 40−43 ● >43 B Format Please make your submission look nice. This means: • Proper paragraph structure free of typing errors. • Graphs and tables should be in the body of the report, not thrown in at the end. • R Code should be appended to the end of the assignment, in a small fixed-width font like courier new; two columns per page. • Your assignment should be printed double-sided. We’re not trying to destroy a rainforest here! 13

Help

It's difficult to give a general answer here, partly because the answer might vary by discipline, and partly just because hard cases make bad law. Basically, instructors are not out to get you if you're actually doing the work.

The way I would approach this as a student is to give credit for ideas that I myself think I would have difficulty coming up with or reproducing if I hadn't consulted the external source.

So for example let's say that I'm new to proofs of statements like p is O(q) where p and q are polynomials, so when doing a a problem for some course other than CSC165, I read the example in the old CSC165 notes on p. 55 and follow it closely. I would probably say that I followed that example in my solution if it's for a course other than CSC165.

If it is for a CSC165 problem set and those kinds of examples were covered in class lots of times and it's assumed that everyone knows that I know about them, I would not cite the notes. It would be hard for someone to claim that I pretended that the ideas in the notes are my own if everyone knows that I've read the notes!

If I am not new to those kinds of proofs, and I don't need to consult the notes to write down a proof like the one on p. 55, I would not cite the notes.

Being told about how serious uoft takes plagarism, I’m always afraid that for simple problem sets, I would have just happened to have the same solution as someone else in my classes.

That's a somewhat different concern.

There are problems with one-liner solutions where lots of people will submit the same thing, and everyone understands that. On the other hand, for anything that's longer than 15-20 words, there is just an amazing amount of variation in terms of how things can be written.

(For a demonstration of that, try googling "there is just an amazing amount of variation". I am the first person to have every said that on the internet!)

## 0 Replies to “Sta303 Assignment Submission”