Exploratory Analysis & Hypothesis Testing: Tooth Growth in Guinea Pigs by Vitamin C Dose and Delivery Method

Exploratory Analysis & Hypothesis Testing: Tooth Growth in Guinea Pigs by Vitamin C Dose and Delivery Method

Date
October 29, 2023
Tags
R

Motive/Background

This analysis is part 2 of 2 of the Statistical Inference course project of John Hopkins University’s Data Science Specialization course. This second section of the project investigates the ToothGrowth dataset and provides a general exploratory analysis before performing hypothesis tests to determine if there are statistically significant differences in tooth growth by vitamin C dosage and delivery method.

Data

The ‘ToothGrowth’ dataset from the R ‘datasets’ library was used for this analysis. This dataset contains information regarding the effect of Vitamin C dosage on tooth growth in guinea pigs.

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Writeup

Click here to view the full RPubs writeup/report for this project with all of the included code. The rest of this document will summarize/restate what is already on that page.

Vitamin C Dosage and Administration Methods on Guinea Pig Teeth Growth

For this section, we will be working with the ToothGrowth dataset from the datasets library. Lets briefly load in and preview the data to see what we’re working with:

image

To provide some more context, here is a brief description of the dataset taken from the official R Documentation vignette:

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
  • [Numeric] len: Odontoblast length (picometers)
  • [Factor] supp: Supplement type (VC = Ascorbic Acid, OJ = Orange Juice)
  • [Numeric] dose: Vitamin C dose in milligrams/day

Summary Statistics by Group

Lets create some quick summary tables to get a further idea of the distributions of values by each group. Keep in mind that we will be trying to figure out to what extent the dosage and supplement has on odontoblast (and thus tooth) length. First, lets group by supplement and dose:

image

Grouping by supplement only:

image

Grouping by dose only:

image

Visualizing the Data

Now lets actually visualize the data by its groups using a box plot:

Code
image

By simply looking at the graph, it is fair to say that we can see some general trends, namely:

  1. Orange juice (generally) seems to exhibit higher odontoblast lengths compared to its ascorbic acid counterpart
  2. The higher the vitamin C dosage given to guinea pigs, the longer their odontoblasts

While these trends and differences may at a glance seem pretty clear, in order to truly say whether or not the dosage and supplements actually increase tooth length, we must seek statistical significance in the form of confidence intervals and hypothesis testing.

Hypothesis Testing

For the following hypothesis tests, we will be making use of the t.test() function. Since the variances between our samples/population are unequal (as dictated by the var.equal=FALSE argument), these are technically Welch’s t-tests rather than Student’s t-tests. Additionally, for each test, our null hypothesis will always be that the means are equal, regardless of dose or supplement. In other words, we will need to reject the null hypothesis (P-value < 𝛼) if the dose or supplement actually does make a difference in increasing guinea pig teeth length. Furthermore, the significance level (𝛼) for each test will be standardized to 0.05, with a confidence interval of 95%.

Looking at only the difference between dosages, with no regard to supplement

To begin, lets see if there’s any statistical significance in odontoblast length between giving the guinea pigs 0.5mg versus 1mg of vitamin C. The first section of code assigns an object (in this case, doses_first_pair) the results of the test (a list of length 10, containing features such as the p-value, test statistic, confidence interval, etc. – all accessible using the “$” operator). The next line of code then accesses the p-value result of the test and checks whether or not it is smaller than (𝛼) = 0.05. If it is indeed smaller, the code returns a logical “TRUE” value, which means that we should reject our null hypothesis that the mean tooth length for these two groups are the same.

Running the first test:

image

As you can see, we get a “TRUE” value back, which means that we can reject our null hypothesis. In other words, there is statistical significance to conclude that the mean length of odontoblasts in guinea pigs are different when given 0.5mg/day than 1mg/day of vitamin C.

Now lets check if the means are different between 1mg/day vs 2mg/day of vitamin C:

image

Once again, we reject our null hypothesis. There is statistical significance to conclude that the mean length of odontoblasts in guinea pigs are different when given 1mg/day than 2mg/day of vitamin C.

Looking at only the difference between supplement/delivery method

We now know that larger doses of vitamin C correspond to larger ondontoblast size. Now lets examine whether or not the delivery method (ascorbic acid vs. orange juice) makes any meaningful impact. First, we will examine the significance on the delivery method for all dosages across the board:

image

As you can see, we get a FALSE value back! We cannot reject our null hypothesis. In other words, there is insufficient statistical significance to reject our claim that the mean odontoblast lengths are any different for guinea pigs that received their vitamin C dosages through ascorbic acid versus orange juice (without isolating the dosage amount).

Lets now examine whether or not our previous result changes when we specifically examine the delivery methods for each individual dosage amount. Testing for differences in supplement for subjects given 0.5mg/day of vitamin C:

image

Our P-value is now lower than our alpha value, so we reject our null hypothesis. There is statistical significance to conclude that the mean length of odontoblasts in guinea pigs are different when fed with ascorbic acid versus orange juice (when given 0.5mg/day of vitamin C).

Testing for differences in supplement for subjects given 1mg/day of vitamin C:

image

We reject our null hypothesis. There is statistical significance to conclude that the mean length of odontoblasts in guinea pigs are different when fed with ascorbic acid versus orange juice (when given 1mg/day of vitamin C).

And finally, testing for differences in supplement for subjects given 2mg/day of vitamin C:

image

We fail to reject our null hypothesis. In other words, there is insufficient statistical significance to reject our claim that the mean odontoblast lengths in guinea pigs are different when fed with ascorbic acid versus orange juice (when given 2mg/day of vitamin C).

Conclusions

Now that we have ran hypothesis tests for every relevant combination/pair of dosage and delivery method of vitamin C in guinea pigs, we have much more confidence in our conclusions and findings as a result. Overall, we discovered a few things:

  • Vitamin C dosage has a positive (and statistically significant) correlation to odontoblast length, irrespective of delivery method
  • There is no conclusive evidence to state that ascorbic acid or orange juice are any more effective than one or the other on increasing odontoblast length when vitamin C dosage is not considered.
  • There is statistical significance to conclude that there is a difference in odontoblast length between ascorbic acid and orange juice when guinea pigs are given vitamin C doses of 0.5mg/day and 1mg/day, but not 2mg/day.

Note: When speaking about “conclusive evidence” in the context of this analysis, I am using the term as it pertains to this particular dataset and sample size. In reality, there should obviously be much larger and more conclusive studies done to accurately determine such results.