Programming Savvy: 2023-09-24

For this assignment, I will be answering the following questions:

Consider a population consisting of the following values, which represents the number of ice cream purchases during the academic year for each of the five housemates. 8, 14, 16, 10, 11

For the four parts of the question, I will embed each of the sub questions within the following R code.

Suppose that the sample size n = 100 and the population proportion p = 0.95.

1. Does the sample proportion p have approximately a normal distribution? Explain.
When it comes to determining if sample proportion p has an approximate normal distribution, it is important that we refer to the Central Limit Theorem which states that the larger the given sample size, the greater the chance that the sample proportion will have a normal distribution. Additionally, it is often said that if the population proportion is very close to 0 or 1, there is a higher likelihood that there must be larger sample size for there to be a normal distribution.
Given that the sample size n is 100 and the population proportion p is 0.95, this sort of scenario should be suitable enough to render a normal distribution. However, to assure that there is straight normality, a larger sample size would be needed to achieve an approximate normal distribution.
Let's break this idea down mathematically:
Using the above values, let’s solve for np and nq and determine whether it is greater than 10 for the distribution to be normal.
np = 100 * 0.95 = 95
nq = 100 * (1 – 0.95) = 5
Only np is greater than 10 so we can conclude that this may not exactly be a normal distribution. While these circumstances may not be completely unreasonable, if precision is prioritized then a larger sample size is needed.

2. What is the smallest value of n for which the sampling distribution of p is approximately normal?

To answer this question, I must preface that there is no one-size-fits-all answer when it comes achieving the smallest possible value for n for which the sampling distribution is normal. Typically, the smallest value of n is dependent on the population proportion and the desired level of approximation. Many sources have suggested n to be greater than 10 but determining the best value varies depending on the population distribution and how close p is to 0 or 1.

The sample mean from a group of observations is an estimate of the population mean μ . Given a sample of size n, consider n independent random variables X1, X2, ..., Xn, each corresponding to one randomly selected observation. Each of these variables has the distribution of the population, with mean μ and standard deviation σ.

A. Population mean = (8 + 14 + 16 + 10 + 11) / 5 (5 represents the number of values in the set)

B. Sample of size n = 5

C. Mean of sample distribution = 11.8

We can put together some samples using some R code:

And Standard Error Qm / Q

Square root of n 4.4 / square root of 5

D. I am looking for table with the following variables X, x=u, and (x-u)^2

Here's a little hint

The sample size n =100 and the population proportion p = 0.95

Does the sample proportion p have approximately a normal distribution? The distribution is expected to be normal if both np and nq are greater ....... (Your Turn)

Since p = .95, q = .05.

p * n = .95 * 100 = 95

q * n = .05 * 100 = 5

It is often said that there must be some kind of benchmark or value to determine normality. To have a reasonably decent normal distribution, we can refer to classic Central Limit Theorem guidelines that state that np and nq should be greater than or equal to 10. Seeing that nq is not greater than 10, this tells us that the sample proportion does not have an approximately normal distribution. What can be taken away from this is that more information is needed based on the precision level and context to determine that this is a reasonable enough normal distribution.

From our textbook, Chapter 2 Probability Exercises # 2.4 Simulated coin tossing is probability better done using function called rbinom than using function called sample. Explain.

Comparing the functions rbinom and sample, we can immediately see that in the parameters taken in by rbinom, it is better equipped to handle something like a simulated coin toss as contrasted with sample. From the textbook and this week’s lecture slides, the binomial distribution maintains constant probability with each trial and these trials are always independent. Furthermore, rbinom is more suited to handle complex statistical scenarios where there are multiple trials or varying levels of probabilities for each trial. From a functional standpoint, rbinom is much more powerful as sample is a bit more general in its function.

~ Katie

Programming Savvy

Saturday, September 30, 2023

LIS4273 - Module 6 Assignment

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

Report Abuse