1 功能概要
在处理表格数据的时候,很多字段值都是空的,这时候 Julia 一般把空值 用missing 来表示,并且很多Julia的函数都会默认处理missing值。下面会列举一些处理方式。
2代码示例
using DataFrames, CSV, Statistics,Tables
# 創建DataFrame方式
#通過讀取CSV文件
iris = DataFrame(CSV.File(joinpath(dirname(pathof(DataFrames)),
"C:/D/Julia/DataFrames/DataFrames.jl/docs/src/assets/iris.csv")));
permutedims([1, 2, 3])
x = [1, 2, missing, 3]
x -> x([1,2])
show(x -> x([1,2]))
map(x -> x(missing), [sin, cos, zero, sqrt]) # part 1
map(x -> x([1,2]), [minimum, maximum, extrema, mean, float]) # part 3
map(x -> x(missing), [sin, cos, zero, sqrt]) # part 1
collect(skipmissing(x))
# 必须过滤掉 missing,以下函数才能使用(`sum`, `prod`,`minimum`, `maximum`, `mean`, `var`, `std`, `first`, `last` and `length`)
map(x -> x(iris.PetalLength) ,[minimum, maximum, extrema, mean, float,prod,length],) # part 3
map(x -> x(collect(skipmissing(iris.PetalLength))) ,[minimum, maximum, extrema, mean, float,prod,length],) # part 3
# 把missing替换成其他值
df = DataFrame(a=[1,2,missing], b=["a", "b", missing])
replace([1.0, missing, 2.0, missing], missing=>NaN)
replace!([1.0, missing, 2.0, missing], missing=>NaN)
replace!(df.a, missing=>100)
coalesce.([1.0, missing, 2.0, missing], NaN)
df.b = coalesce.(df.b, 100)
#根据默认值输出,missing作为一个独立的类型处理,
recode([1.0, missing, 2.0, missing], false, missing=>1)
recode([1.0, missing, 2.0, missing], false, missing=>true)
#取唯一值,unique包含missing,levels会去掉missing
unique([1, missing, 2, missing,2])
levels([1, missing, 2, missing,2])
#是否允许missing,如果有,disallowmissing函数会报异常
x = [1,2,3,missing]
y = allowmissing(x)
z = disallowmissing(y)