统计问题第176问: 数据的对数变换

妙趣横生统计学

于 2024-07-21 08:31:33 发布

点赞数

原文链接：https://mp.weixin.qq.com/s?__biz=MzAwOTYyMDY3OQ==&mid=2650436404&idx=3&sn=b97ff3fe7535b01283441be0922ec000&chksm=82ebb7e5e767694e526d1aaf2ad02de7fd1117f12eaaecd9ae39e26c01226d7b0bfea7998b3b&scene=126&sessionid=0

版权

Question

Researchers evaluated the effectiveness of early abdominopelvic computed tomography in patients with acute abdominal pain of unknown cause. A randomised controlled trial study design was used. The intervention was early computed tomography (within 24 hours of admission). The control treatment was standard practice (radiological investigations as indicated). In total, 55 patients were randomised to early computed tomography and 55 to control treatment.

The main outcome measures included length of hospital stay. The distribution of length of hospital stay was positively skewed. The logarithm function was used to transform the observations, and the Student’st test was then used to compare the treatment groups. The length of hospital stay for the standard practice group was on average 1.1 days longer than that in the early computed tomography group (geometric mean 6.4 days (range 1 to 60) versus 5.3 days (1 to 31). The ratio of geometric means (standard treatment versus early computed tomography) was 1.21 (95% confidence interval 0.92 to 1.56).

The overall conclusions of the study were that early abdominopelvic computed tomography for acute abdominal pain may reduce length of hospital stay and mortality. Furthermore, it could also identify unforeseen conditions and potentially serious complications.

Which of the following statements, if any, are true?

·a) The purpose of the logarithm transformation of length of hospital stay was to achieve a normal distribution.

·b) In each treatment group, the geometric mean of length of hospital stay was larger than the arithmetic mean.

·c) The standard practice group spent on average 21% longer in hospital than the early computed tomography group.

·d) The difference between treatments in length of hospital stay was significant at the 5% level.

提示：正确答案不止一个。

Answer

Statements a and c are true, while b and d are false.

Student’st test compares the mean of a variable measured on a continuous scale between two independent groups. Described in a previous question, Student’s t test is sometimes referred to as the independent samples ttest, the two sample t test, or simply the t test. Student’s t test is a parametric test, and assumptions are made about the data for the test to be applied. Parametric tests have been described in a previous question. It is assumed that the variable to be compared is approximately normally distributed in both groups and that the variances in the two groups are equal.

In the example above, the aim was to establish whether there was a significant difference between the treatment groups in length of hospital stay. The distribution of length of hospital stay was skewed, and therefore Student’sttest could not be used. Two options were available. Firstly, a non-parametric test could have been performed that did not make assumptions about the distribution of the data. The Wilcoxon rank sum test or Mann-Whitney U test, described in a previous question, could have been used. Alternatively, the observations of length of hospital stay could have been transformed, and hopefully the transformed data would then meet the assumptions of Student’s ttest. The outcome variable, length of hospital stay, was transformed using the logarithm function (referred to simply as “log transformed”, which involved obtaining the logarithm of each observation. It was not indicated whether natural logarithms (to base e, a mathematical constant, ≈2.718) or common logarithms (to base 10) were obtained, but either would have been suitable. After transformation the data were approximately normally distributed (a is true), permitting Student’s t test to be used. Although it was not indicated whether the second assumption of equal variances required for Student’s t test was met after transformation, a logarithmic transformation typically achieves equality of variances between two groups if it was not already present.

The distribution of length of hospital stay was skewed to the right, and therefore the arithmetic mean would be disproportionally raised by a small number of high values in the right hand tail of the distribution. The median length of hospital stay would be a better measure of central location than the arithmetic mean. However, the length of hospital stay was normally distributed after the log transformation, and in such circumstances the geometric mean is a good measure of central location. The geometric mean for each treatment group was derived by anti-logging (that is, back transforming) the arithmetic mean of the log transformed length of hospital stay for each treatment group. Anti-logging involves raising e or 10 (depending on whether natural logarithms or logarithms to base 10 were used to transform length of hospital stay) to the power of the group mean of the log transformed data. The geometric means are on the same scale and with the same units as the original outcome measure.

The distribution of length of hospital stay was skewed to the right, and therefore the geometric mean will be larger in value than the median yet smaller than the arithmetic mean (b is false). It was reported that both treatment groups had a median length of stay of five days, while the mean on the original scale (untransformed) was 6.6 (SD=5.8) days in the early computed tomography group and 9.2 (9.8) days in the standard practice group. The geometric mean was 5.3 days in the early computed tomography group and 6.4 days in the standard practice group.

The anti-log of the mean difference between treatment groups for the log transformed data is equivalent to the ratio of the geometric means. Anti-logging the limits of the 95% confidence interval for the mean difference of the log transformed data gives a 95% confidence interval for the ratio of the geometric means. The geometric means were 6.4 days for the standard practice group and 5.3 days for the early computed tomography group. The ratio of these two means (6.4÷5.3) was 1.21, with a corresponding 95% confidence interval of 0.92 to 1.56. This ratio is interpreted in a similar fashion to a relative risk, and therefore the standard practice group stayed in hospital on average 21% longer than the early computed tomography group (c is true). The 95% confidence interval is an interval estimate for the population parameter of the ratio of geometric means, representing the uncertainty of the sample in estimating the population parameter as a result of sampling error. Therefore, with a probability of 0.95 it was estimated that in the population the standard practice group, when compared with the early computed tomography group, may have as much as an 8% shorter stay or a 56% longer stay in hospital. The 95% confidence interval for the ratio of geometric means straddled unity and therefore, as described in a previous question, the difference between treatment groups in length of hospital stay was not significant at the 5% level of significance (d is false). This was reflected by the reported P value for the test of this ratio, which was 0.17.

In medicine many variables have a distribution that is skewed to the right, and the logarithm transformation is typically used to achieve a normal distribution in the data. Such a data transformation is important in statistical analysis; although it may appear as a way of manipulating data to get the desired result, a logarithm scale is simply an alternative means of representing data originally measured on a linear scale. It has advantages if the distribution of a variable is normal after transformation, since it permits parametric statistical tests to be performed rather than non-parametric ones. Parametric statistical tests allow the estimation of treatment effects using confidence intervals in addition to hypothesis testing, whereas non-parametric tests are generally limited to hypothesis testing alone. Sometimes it is possible to obtain estimates of treatment effects when performing non-parametric tests, but generally it is not straightforward.

The logarithm transformation is one of several transformations that may be applied in statistical analysis. Generally, a data transformation will be applied so that the data satisfy the assumptions of a statistical test or procedure that is to be applied. The choice of transformation typically depends on the type of variable, scale of measurement, or shape of the distribution of the variable.

所以答案是选择 a c

每天学习一点，你会更强大！

‍

妙趣横生统计学

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
统计问题第176问: 数据的对数变换

QuestionResearchers evaluated the effectiveness of early abdominopelvic computed tomography in patients with acute abdominal pain of unknown cause. A randomised controlled trial study design was used....
复制链接

扫一扫