向爱我-CSDN博客

原创 Stata操作

excel复制许多行的方法：选定第一行，按shift+ctrl+下，再复制destring varlist,replace 把字符型改为数值型codebook varlist 得到离散变量的值的个数egen n = group (x) 生成新变量n使字符型变量x的每个值对应一个值xttest3 异方差检验_n是系统观测数变量；keep if mod(_n,2)==0 保留偶数行su…,separator(1) 统计表中每行加分隔的线.i.xxx与c.xxx表示因..

2022-01-16 14:22:28 3157

原创《傻瓜计量经济学与Stata应用》课程笔记

encode a,generate (b):解码字符型变量，使可接summarizesummarize a b if c == 10 [,detail]:summarize的限定与强化do-file画组合图：kdensity gdpgraph save gdp.gph,replacekdensity employgraph save employ.gph,replacegraph combine gdp.gph employ.gphssc install [order]:安装..

2021-11-17 21:09:19 2266 6

原创刘泽云《计量经济学实验教程》笔记

第一章 Stata基本操作数据导入时，先将数据整理为Excel，再复制到Stata数据编辑器里保存命令的形式：order varlist [if] [in] [weight] [,option]命令用do-file文件写，保存后可供后续修改命令：summarize 描述性数据统计corr 算相关系数scatter 画散点图gen 跑回归rename 修改变量名（rename a b:把a改成b）label 给变量加标签说明（label var code “省代码”:给code变量附

2021-11-14 19:10:45 2044 1

原创工具变量估计

回归ivregress 2sls lwage (educ=mothereduc) exper，vce(robust)括号内等号左边是内生变量，右边是工具变量vce(robust) 可获得异方差稳健的标准误！可以用多个工具变量，形如（educ = mothereduc fathereduc）检验内生性被怀疑内生的变量对其他解释变量和取代之的工具变量回归，提取残差残差放入原回归（原因变量为因变量），看其是否显著若显著，则该变量有内生性检验工具变量和内生变量是否相关上述前一个回归中.

2021-08-22 18:12:10 1819

原创时间序列分析

创建时间变量gen date=tq(1981q1)+_n-1tsset date跑回归L指滞后 L2是两期滞后可用L（0/3）表示0期到3期滞后D指差分例：reg inf D.u L(0/1).u检测序列相关跑回归后打 estat bgodfrey,lags(1) 看一期滞后有没有序列相关；同样…lags（5）表示五期。p值小则有用指数平滑法预测tssmooth exponential sm1=gsm1是新设置的预测的变量自动使用最好的平滑参数打开数据编辑器给时间..

2021-08-22 17:47:10 296

原创概率统计与模拟

The Normal Distributiondnorm(0, mean = 0, sd = 1, log = FALSE)pnorm(0, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)qnorm(0.05, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)rnorm(10, mean = 0, sd = 1)Descriptiondnorm gives the density.

2021-06-04 11:24:21 511

原创一些好操作

*names*显示数据集中各变量的名字*n_distinct*向量中仅出现一次的元素的个数*unique*向量中仅出现一次的元素所形成的向量*sample*抽样，如：sample(1:3,10,replace = FALSE) #无放回抽10个，默认为无放回sample(data$ID,5,replace = TRUE) #有放回从数据集里抽5个sample(c("a","b","c"),10,replace = TRUE,prob = c(0.1,0.5,0.4)) #概率抽样.

2021-06-03 21:32:57 149

原创若干题目

1.（1）请首先剔除数据中实际出发时间（dep_time)缺失的观测值，然后生成一个新的变量dep_interval,用来将数据中的实际出发时间分为上午（6:01-12:00）和下午（12:01-18:00），晚上（18:01-24:00）和凌晨（0:01-6:00）四组，将每一年每一月每一天内每个dep_interval分组（year,month,day,dep_interval),计算每一组的平均到时延误时间和平均到达的机场数量以及到达延误的方差。alter_flights <- flig.

2021-06-01 09:52:30 454

原创 Modeling

OLS regression##basicslm(y~x,sim1) #make a regression with y towards x from dataset `sim1``add_predictions()` adds predictions`add_residuals()` adds residuals#for examplesim1_mod <- lm(y~x,sim1)sim1 %>% data_grid(x) %>% #`data_grid(

2021-05-21 11:24:40 136

原创 Vector

Type Atomic Vector logicalintegerdoublecharacter#to name the vector> c(x=1,y=4,"father"=8) x y father 1 4 8 List #use `str()` to show the exact content of a list> list(1:2,5,"a")[[1]][1] 1 2[[2]]

2021-05-20 21:11:38 78

原创 Creating Functions and Iterations

Basicsrescale01 <- function(x) { rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])}#name, argument and body are needed for a function#the last expression is the value that returns by default;use `return()` to specially return

2021-05-19 22:27:06 106

原创 Five Major Functions for Data Transformation

filterfilter(flights, month == 1, day == 1)filter(flights, month %in% c(11, 12))#a way to check the number of NAflights %>% mutate(temp = is.na(dep_time) ) %>% select(temp) %>% table()arrangearrange(flights,arr_delay) #from s

2021-05-19 21:30:53 100

原创 Data Tidying

Pivot It Longer or Widertable4atable4a %>% pivot_longer( c("1999", "2000"), names_to = "year", values_to = "cases")#gather the columns as given and set the original names and values under variables of new namestable2table2 %>% pivot_wider

2021-05-12 18:54:12 125

原创 Data Parsing

Creating A Matrixtibble( x = 1:5, y = 1, z = x ^ 2 + y)# A tibble: 5 x 3# x y z# <int> <dbl> <dbl># 1 1 1 2# 2 2 1 5# 3 3 1 10# 4 4 1 17# 5 5 1

2021-05-12 09:45:37 180

原创 Drawing Plots

Orders1.draw the plot1.1 basic#draw a bar plot,for categorical variables:ggplot(diamonds) + geom_bar(aes(cut))#for continuous variables:ggplot(diamonds) + geom_histogram(aes(carat), binwidth = 0.2)#this can be used to find the clusters #

2021-05-11 20:07:54 274

原创 Factor

Creating a factorFix the order and spelling of a string:x1 <- c("Dec", "Apr", "Jan", "Mar")month_levels <- c( "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")y1 <- factor(x1, levels = month_levels)s

2021-05-07 10:44:42 150

原创 Regular Expression

基本操作1.str_length() 看长度（含空格）2.str_c() 组合例：str_c("x","y",sep=",") 得到x,ystr_c(c("x","y"),collapse=",") 得到x,y3.str_to_upper() str_to_lower() 变大写小写4. str_sub() 截取例：x <- c("apple")str_sub(x,1,3) 得appstr_sub(x,-3,-1) 得ple #负号表示从后往前数的位置匹配str_

2021-05-05 12:25:08 133

原创 Joining Dataframes

键一个变量存在于两个表中，并可唯一标识其中一个表。相对于可唯一标识的表，该变量称主键；相对于另一个表，该变量称外键。人为添加的主键称代理键。如：flights %>% mutate(iden=row_number())#iden就是代理键，将每个观测值依次标为1,2,3...合并连接1.连接方式内连接（只保留相同），左连接（保留左侧），右连接（保留右侧），全连接（两侧都保留）#无对应部分用NA代替#重复部分依次对应连接格式均为：inner_join(x,y,by="ke

2021-04-23 10:45:22 116

weixin_51674826的博客