Homework 2: Visualization Scott Alberts (modified Scott Thatcher) September 5, 2018

Introduction Read Chapter 3 (http://r4ds.had.co.nz/data-visualisation.html) of our text and complete Lab 2 on visualization before you start this assignment. You should use RStudio with ggplots for this assignment about Tasmanian Abalone (abalone are yummy marine snails).

The data set abalone_clean.csv contains measurements of 10 variables (1 categorical, 1 integer and 8 numeric) on n = 4177 snails. Someone in Tasmania collected these 4,000 delicious measurements:

Categorical: sex (M, F, I), (I=Indeterminate sex. I have no idea how you tell the sex of an abalone, and often, neither could the abalone scientist.) Integer: rings (in general, an abalone gets a new ring each year, sort of like a tree) Numeric: length , diameter , height , whole.weight , shucked.weight , viscera.weight , shell.weight .

Units for numerical variables are mm and g, as appropriate.

You can find the cleaned data set along with this homework set on Blackboard. The original is available from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Abalone), but you shouldn’t have to use it.

Assignment Explore these questions using R. Do NOT do any “real” statistics, like t-tests or ANOVA. Do make relevant and awesome charts, tables, and graphs. You should use R markdown to create a homework submission that shows your code and comments for each part below.

Note that it’s easiest if your downloaded data file and your markdown file are in the same directory.

1. Get ready to go: Load packages and import data. a. Load the appropriate package: tidyverse b. Import your data using the read.csv command. (Note: “csv” stands for comma-separated-values, a common

generic format for data.) You can call your data frame abalone if you’d like. 2. Use View , summary and head to examine your dataset. See anything interesting?

3. Use geom_histogram to make a one-variable histogram of whole.weight . a. Make a simple histogram using x=whole.weight (there is no y). b. Make it prettier; Set binwidth=0.25 , color=”red” , fill=”tan” . Or use colors of your choice!

4. Make a boxplot now, with geom_boxplot a. Plot y=shucked.weight by x=sex , using fill=sex to make it pretty. b. Does is look like size differs much by sex? c. Based on your graph, what is one possible reason that researchers sometimes can’t determine sex?

5. Let’s make some scatterplots now, with geom_point or geom_jitter . a. Make a scatterplot with x=rings , y=shucked.weight and color = sex . b. What did you find? Are older abalone larger? Does sex matter? c. Add a trendline with geom_smooth(method=”lm”) . d. What does the trendline show? Does the trendline help you understand the data?

6. Write a full paragraph to explain what you’ve learned.