USYD悉尼大学DATA 2002 【Module 1】: Categorical data 学习笔记（week1-week3）

最新推荐文章于 2024-04-11 10:00:17 发布

不二程序猿

最新推荐文章于 2024-04-11 10:00:17 发布

阅读量4.6k

点赞数 1

分类专栏：悉尼大学 DATA2002 文章标签：恰饭程序人生 r语言经验分享大数据

本文链接：https://blog.csdn.net/weixin_43773228/article/details/119781309

版权

DATA2002 lecture 01 02 03

前言
Week 1
Week 2
Week 3
总结

前言

系列博客是主要讲lecture里的重要知识点，这里面包括了data visualisation、data collection、 Chi-square test、 goodness of fit tests、 measure of performance 、 measure of risk 、 testing for homogeneity 、 testing for independent 和 testing in small sample。都是比较基础的知识点，掌握好，了解它。有的知识点讲的不够详细，后期会补上，现在把重点放在final内容。

Week 1

1.1 Data visualisation 数据可视化

我们要了解Palmer penguins数据集，并且要可视化。
这里不做过多讲解，基础代码部分自己对照lecture在Rstudio里运行

# install.packages("palmerpenguins")
library(palmerpenguins)

了解Palmer penguins数据集的更多信息。

help(penguins, package = "palmerpenguins")
# or more simply
?penguins

快速查看数据集的基本信息。

library(dplyr)
dplyr::glimpse(penguins) # glimpse the structure of the penguins data frame

使用ggplot2包将数据集可视化。

ggplot(data = penguins) + aes(x = species, fill = sex) + 
  geom_bar(position = "fill") + 
  labs(x = "", y = "Proportion of penguins", fill = "Sex") + 
  scale_y_continuous(labels = scales::percent_format()) + 
  facet_grid(cols = vars(island), scales = "free_x", space = "free_x") +
  theme_linedraw(base_size = 22)

这个part更多知识点参考其他博客里讲述ggplot绘图部分。

1.2 Data collection 数据收集

Sample and Population 样本和人口

A sample is part of a population（sample是population的一部分）
A statistic can be computed from a sample, and used to estimate a parameter.（可以从样本计算统计量，并用于估计参数）
A statistic summarises what the researcher knows. A parameter is what the researcher wants to know.（统计数据总结了研究人员所知道的。参数是研究人员想知道的）

为什么要用sample的方法，而不收集完整的Population来观察数据。

最低0.47元/天解锁文章

不二程序猿

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
4
评论
USYD悉尼大学DATA 2002 【Module 1】: Categorical data 学习笔记（week1-week3）

DATA2002 lecture 01 02 03前言下载资料前言第一周的lecture分成了01、02、03三个来讲解，分别是数据可视化——帕尔默企鹅、数据收集 - 吸烟有益于长寿吗？和卡方检验 - 遗传连锁。基础的知识点直接翻译原文，不做过多的记录，如果不懂的可以留言私信。小编尽可能做详细，每个步骤讲清楚，重点知识点一目了然，坚持更新下去。如果博客有错误的地方，可以留言评论。资源来源请参考下载资料，。下载资料数据文件链接:L01:网站：Data visualisation -
复制链接

扫一扫