和Array相比: A new Julia type that represents a missing value NA。另外,在数据的操作友好性上,提供了对象性操作方式,而不是冰冷的数据下标。
缺点是,其效率没有Array高。如果数据操作量大,效率会受到一定的影响,这个有些象MATLAB中的dataset.
当然,要用DataFrame,首先要加DataFrames库包,另外,using DataFrames.
1、对象化操作
假设,A:是字段,B也是字段
julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
4x2 DataFrame:
A B
[1,] 1 "M"
[2,] 2 "F"
[3,] 3 "F"
[4,] 4 "M"
julia> df["A"] #df[:A] 对象化操作的方式,再也不是第几行,第几列了,人性化呀
4-element DataArray{Int64,1}:
1
2
3
4
2、增加列操作
julia>df["c"]=2:5 # 增加一列
2:5
julia> df
4x3 DataFrame:
A B c
[1,] 1 "M" 2
[2,] 2 "F" 3
[3,] 3 "F" 4
[4,] 4 "M" 5
3、 增加一行操作
julia> de =DataFrame(A = 2,B ="g",c=5)
1x3 DataFrame:
A B c
[1,] 2 "g" 5
julia> df =vcat(df,de)
5x3 DataFrame:
A B c
[1,] 1 "M" 2
[2,] 2 "F" 3
[3,] 3 "F" 4
[4,] 4 "M" 5
[5,] 2 "g" 5
4、读到csv,txt 的IO操作
主要函数格式有:
df =readtable("data.csv")
df = readtable("data.tsv")
df =readtable("data.wsv")
df =readtable("data.txt", separator = '\t')
df =readtable("data.txt", header = false)
比如:读CSV数据
julia> @time df =readtable("C:\\Users\\Administrator\\Desktop\\julia\\mydatacsv.csv")
elapsed time: 0.321465977 seconds(12324052 bytes allocated)
DataFrame with 5085 rows, 11columns
Columns:
FutureCode 5085 non-null values
DateTime 5085 non-null values
BarEndTime 5085 non-null values
Close 5085 non-null values
Open 5085 non-null values
High 5085 non-null values
Low 5085 non-null values
PreClose 5085 non-null values
Volume 5085 non-null values
OpenInterest 5085 non-null values
BarCount 5085 non-null values
5、 写操作
格式:
writetable("output.csv",df)
writetable("output.dat",df, separator = ',', header = false)
writetable("output.dat",df, quotemark = '\'', separator = ',')
writetable("output.dat",df, header = false)
6、索引和条件检索
df = DataFrame(A = 1:10)
# 索引 注意:和MATLAB不一样,所有的向量计算,都加了一个".",否则容易出错!
julia> df[df[:Close].>2000,:]
DataFrame with 5085 rows, 11columns
Columns:
FutureCode 5085 non-null values
DateTime 5085 non-null values
BarEndTime 5085 non-null values
Close 5085 non-null values
Open 5085 non-null values
High 5085 non-null values
Low 5085 non-null values
PreClose 5085 non-null values
Volume 5085 non-null values
OpenInterest 5085 non-null values
BarCount 5085 non-null values
7、类型判断
julia> typeof(df)
DataFrame (constructor with 22 methods)
8、合并操作
# 合并,vcat,join 操作,但没有看到push! ,append! add!
julia> @time vcat(hh,kk)
elapsed time: 0.00010462 seconds (110802x2 DataFrame:
A B
[1,] 5.0 10.0
[2,] 1.0 9.0
julia> @time vcat(df,kk) # 时间增加了三倍
elapsed time: 0.000374261 seconds (84696 bytes allocated)
1001x2 DataFrame:
A B
[1,] 1.0 9.0
[2,] 5.0 10.0
[3,] 3.0 6.0
.......
julia> @time vcat(df,df) # 时间还在增长
elapsed time: 0.000719703 seconds (181256 bytes allocated)
2000x2 DataFrame:
A B
[1,] 1.0 9.0
[2,] 5.0 10.0
[3,] 3.0 6.0
.....