Java Python PM 510 Homework #03
Problems not marked with [SPSS] should be done by hand, as problems similar to those could be on the Midterm and/or Final. (Of course, feel free to check your work using SPSS!)
Problems marked with [SPSS] are intended to be done with SPSS. For these problems, the first step you should always do (even if the problem doesn’t explicitly ask for it!) is produce appropriate graph(s) to look at the data. The type(s) of graph that are appropriate will vary depending on the analysis you have planned. (And sometimes, the type of analysis you do will depend on what the data look like!)
1. A study was conducted investigating the long-term prognosis of children who have suffered an acute episode of bacterial meningitis, an inflammation of the membranes enclosing the brain and spinal cord. Listed below are the times to the onset of seizure for 13 children who took part in the study. In months, the measurements are:
0.10 0.25 0.50 4 12 12 24 24 31 36 42 55 96
(a) What type of data are these? (Nominal, ordinal, discrete numeric, continuous.) Why?
(b) Construct an appropriate plot or chart that displays the shape of these observed data.
Describe the shape of the observed distribution in terms of symmetry, skewness, modality, normality, and presence of outliers.
(c) Find the following numerical summary statistics of the data.
i. mean
ii. median
iii. mode
iv. range
v. IQR (Do this by both hand-calculation methods from lecture and compare the answers.)
vi. 10th and 90th percentiles
vii. standard deviation
(b) Which numbers would you select as measures of central tendency and dispersion/variability to summarize these data? Why?
2. [SPSS] A study was conducted comparing female adolescents who suffer from bulimia to healthy females with similar body compositions and levels of physical activity. The file bulimia.sav contains measures of daily caloric intake, recorded in kilocalories per kilogram, for samples of adolescents from each group.
(a) Find the median daily caloric intake for both the bulimic adolescents and the healthy ones.
(b) Compute the IQR for each group.
(c) Construct box-and-whisker plots for each group.
(d) Describe the shape of the observed distribution for each group. Do you think that the sampled data come from a population with a normal distribution? Why or why not?
(e) Describe the qualitative differences between the two groups based on the box-and-whisker plots. (For example, which average is higher? Which group has more variability? Are there outlying values in either group?)
3. [SPSS] The declared concentrations of nicotine in milligrams for 35 brands of Canadian cigarettes ar PM 510 Biostatistics Homework #03Matlab e saved under the variable name nicotine in the file cigarett .sav.
(a) Find the mean and median concentrations of nicotine.
(b) Produce a histogram of the nicotine measurements. Describe the shape of the observed distribution. Do you think that the sampled data come from a population with a normal distribution? Why or why not?
(c) Which number do you think provides the best measure of central tendency for these concentrations, the mean or the median? Why?
4. [SPSS] The file lowbwt .sav contains information recorded for a sample of 100 low birth weight infants—those weighing less than 1500 grams—born in two teaching hospitals in Boston, Massachusetts. Measurements of systolic blood pressure are saved under the variable name sbp. The dichotomous (i.e., categorical with two possible values) variable sex designates the gender of each child, with 1 representing a male and 0 a female.
(a) Construct apair of box plots for the systolic blood pressure measurements—one for boys and one for girls. Compare the two distributions in terms of the shape of the distribution and the values of the summary statistics.
(b) Compute the mean and standard deviation of the systolic blood pressure measurements for males and for females. Which group has the larger mean? The larger standard deviation?
(c) Calculate the coefficients of variation corresponding to each gender and compare them.
(d) Suppose that the population parameters μ and σ are actually equal to the sample mean x and sample standard deviation s, respectively, from part (b). Consider calculating x by taking another sample of size 100. What would the (approximate) distribution of x be? What if we took a sample of size 900 instead?
BEYOND THE BASICS
5. Show that the ordinary mean is a weighted mean as defined in the Lecture Notes.
6. In this problem you will explore the effects of some simple data transformations on summary statistics.
(a) Show that if you take a dataset and add the same constant k to every data point, then the new sample mean is the old sample mean plus k and the new sample standard deviation is equal to the old sample standard deviation.
(b) Show that if you take a dataset and multiply the same constant k to every data point, then the new sample mean is the old sample meantimesk and the new sample standard deviation is the old sample standard deviation time k.
7. The Lecture Notes state “This alternate method [for calculating the quartiles] gives the same answer as method from previous slides roughly 75% of the time.” Figure out exactly under what circumstances the two methods for calculating quartiles given in the Lecture Notes will agree