关注微信公共号:小程在线
关注CSDN博客:程志伟的博客
使用 ] 进入下载包的进程,然后add "DataFrames",等待包下载完成,首次下载还需要下载别的包,完成之后Ctrl + C退出
加载包
julia> using DataFrames
首先使用DataFrame创建一个简单的数据表
julia> DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
4×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ M │
│ 2 │ 2 │ F │
│ 3 │ 3 │ F │
│ 4 │ 4 │ M │
julia> df = DataFrame()
0×0 DataFrame
julia> df[:A] = 1:8
┌ Warning: `setindex!(df::DataFrame, v::AbstractVector, col_ind::ColumnIndex)` is deprecated, use `begin
│ df[!, col_ind] = v
│ df
│ end` instead.
│ caller = top-level scope at REPL[8]:1
└ @ Core REPL[8]:1
1:8
julia> df[:B] = ["M", "F", "F", "M", "F", "M", "M", "F"]
┌ Warning: `setindex!(df::DataFrame, v::AbstractVector, col_ind::ColumnIndex)` is deprecated, use `begin
│ df[!, col_ind] = v
│ df
│ end` instead.
│ caller = top-level scope at REPL[9]:1
└ @ Core REPL[9]:1
8-element Array{String,1}:
"M"
"F"
"F"
"M"
"F"
"M"
"M"
"F"
julia> df
8×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ M │
│ 2 │ 2 │ F │
│ 3 │ 3 │ F │
│ 4 │ 4 │ M │
│ 5 │ 5 │ F │
│ 6 │ 6 │ M │
│ 7 │ 7 │ M │
│ 8 │ 8 │ F │
查看数据的维度
julia> size(df,1)
8
julia> size(df,2)
2
julia> size(df)
(8, 2)
一行行添加数据
julia> df = DataFrame(A = Int[],B= String[])
0×2 DataFrame
julia> push!(df,[1,"M"])
1×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ M │
julia> push!(df,Dict(:B => "F",:A => 2))
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ M │
│ 2 │ 2 │ F │
查看数据的前几行,后几天以及数据分布
julia> head(df)
┌ Warning: `head(df::AbstractDataFrame)` is deprecated, use `first(df, 6)` instead.
│ caller = top-level scope at REPL[18]:1
└ @ Core REPL[18]:1
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ M │
│ 2 │ 2 │ F │
julia> tail(df)
┌ Warning: `tail(df::AbstractDataFrame)` is deprecated, use `last(df, 6)` instead.
│ caller = top-level scope at REPL[19]:1
└ @ Core REPL[19]:1
2×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ M │
│ 2 │ 2 │ F │
julia> describe(df)
2×8 DataFrame
│ Row │ variable │ mean │ min │ median │ max │ nunique │ nmissing │ eltype │
│ │ Symbol │ Union… │ Any │ Union… │ Any │ Union… │ Nothing │ DataType │
├─────┼──────────┼────────┼─────┼────────┼── ───┼─────────┼──────────┼──────────┤
│ 1 │ A │ 1.5 │ 1 │ 1.5 │ 2 │ │ │ Int64 │
│ 2 │ B │ │ F │ │ M │ 2 │ │ String │
对数据的简单统计
julia> df = DataFrame(A = 1:4, B = 4.0:-1.0:1.0)
4×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1 │ 1 │ 4.0 │
│ 2 │ 2 │ 3.0 │
│ 3 │ 3 │ 2.0 │
│ 4 │ 4 │ 1.0 │
julia> colwise(sum, df)
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│ caller = top-level scope at REPL[26]:1
└ @ Core REPL[26]:1
2-element Array{Real,1}:
10
10.0