Julia: DataFrame最大的好处是对NA的处理和对象化操作！-CSDN博客

本文链接：https://blog.csdn.net/wowotuo/article/details/38311867

和Array相比: A new Julia type that represents a missing value NA。另外，在数据的操作友好性上，提供了对象性操作方式，而不是冰冷的数据下标。
缺点是，其效率没有Array高。如果数据操作量大，效率会受到一定的影响，这个有些象MATLAB中的dataset.
当然，要用DataFrame,首先要加DataFrames库包，另外，using DataFrames.
1、对象化操作
假设，A:是字段，B也是字段
julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
4x2 DataFrame:
A B
[1,] 1 "M"
[2,] 2 "F"
[3,] 3 "F"
[4,] 4 "M"
julia> df["A"] #df[:A] 对象化操作的方式，再也不是第几行，第几列了，人性化呀
4-element DataArray{Int64,1}:
1
2
3
4
2、增加列操作
julia>df["c"]=2:5 # 增加一列
2:5
julia> df
4x3 DataFrame:
A B c
[1,] 1 "M" 2
[2,] 2 "F" 3
[3,] 3 "F" 4
[4,] 4 "M" 5

3、增加一行操作
julia> de =DataFrame(A = 2,B ="g",c=5)
1x3 DataFrame:
A B c
[1,] 2 "g" 5
julia> df =vcat(df,de)
5x3 DataFrame:
A B c
[1,] 1 "M" 2
[2,] 2 "F" 3
[3,] 3 "F" 4
[4,] 4 "M" 5
[5,] 2 "g" 5

4、读到csv,txt 的IO操作
主要函数格式有：
df =readtable("data.csv")
df = readtable("data.tsv")
df =readtable("data.wsv")
df =readtable("data.txt", separator = '\t')
df =readtable("data.txt", header = false)
比如：读CSV数据
julia> @time df =readtable("C:\\Users\\Administrator\\Desktop\\julia\\mydatacsv.csv")
elapsed time: 0.321465977 seconds(12324052 bytes allocated)
DataFrame with 5085 rows, 11columns
Columns:
FutureCode 5085 non-null values
DateTime 5085 non-null values
BarEndTime 5085 non-null values
Close 5085 non-null values
Open 5085 non-null values
High 5085 non-null values
Low 5085 non-null values
PreClose 5085 non-null values
Volume 5085 non-null values
OpenInterest 5085 non-null values
BarCount 5085 non-null values
5、写操作
格式：
writetable("output.csv",df)
writetable("output.dat",df, separator = ',', header = false)
writetable("output.dat",df, quotemark = '\'', separator = ',')
writetable("output.dat",df, header = false)

6、索引和条件检索

df = DataFrame(A = 1:10)
# 索引注意：和MATLAB不一样，所有的向量计算，都加了一个".",否则容易出错！
julia> df[df[:Close].>2000,:]
DataFrame with 5085 rows, 11columns
Columns:
FutureCode 5085 non-null values
DateTime 5085 non-null values
BarEndTime 5085 non-null values
Close 5085 non-null values
Open 5085 non-null values
High 5085 non-null values
Low 5085 non-null values
PreClose 5085 non-null values
Volume 5085 non-null values
OpenInterest 5085 non-null values
BarCount 5085 non-null values

7、类型判断
julia> typeof(df)
DataFrame (constructor with 22 methods)

8、合并操作
# 合并，vcat,join 操作，但没有看到push! ,append! add!

julia> @time vcat(hh,kk)
elapsed time: 0.00010462 seconds (110802x2 DataFrame:
A B
[1,] 5.0 10.0
[2,] 1.0 9.0

julia> @time vcat(df,kk) # 时间增加了三倍
elapsed time: 0.000374261 seconds (84696 bytes allocated)
1001x2 DataFrame:
A B
[1,] 1.0 9.0
[2,] 5.0 10.0
[3,] 3.0 6.0
.......
julia> @time vcat(df,df) # 时间还在增长
elapsed time: 0.000719703 seconds (181256 bytes allocated)
2000x2 DataFrame:
A B
[1,] 1.0 9.0
[2,] 5.0 10.0
[3,] 3.0 6.0
.....