https://stackoverflow.com/questions/34611109/julia-dataframe-replacing-missing-values
一、replace
Probably the easiest approach is to use replace! or replace from base Julia. Here is an example with replace!:
julia> using DataFrames
julia> df = DataFrame(x = [1, missing, 3])
3×1 DataFrame
│ Row │ x │
│ │ Int64⍰ │
├─────┼─────────┤
│ 1 │ 1 │
│ 2 │ missing │
│ 3 │ 3 │
julia> replace!(df.x, missing => 0);
julia> df
3×1 DataFrame
│ Row │ x │
│ │ Int64⍰ │
├─────┼────────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
However, note that at this point the type of column x still allows missing values:
julia> typeof(df.x)
Array{Union{Missing, Int64},1}
This is also indicated by the question mark following Int64 in column x when the data frame is printed out. You can change this by using disallowmissing! (from the DataFrames.jl package):
julia> disallowmissing!(df, :x)
3×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
Alternatively, if you use replace (without the exclamation mark) as follows, then the output will already disallow missing values:
julia> df = DataFrame(x = [1, missing, 3]);
julia> df.x = replace(df.x, missing => 0);
julia> df
3×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
Base.ismissing with logical indexing
You can use ismissing with logical indexing to assign a new value to all missing entries of an array:
julia> df = DataFrame(x = [1, missing, 3]);
julia> df.x[ismissing.(df.x)] .= 0;
julia> df
3×1 DataFrame
│ Row │ x │
│ │ Int64⍰ │
├─────┼────────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
Base.coalesce
Another approach is to use coalesce:
julia> df = DataFrame(x = [1, missing, 3]);
julia> df.x = coalesce.(df.x, 0);
julia> df
3×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
DataFramesMeta
Both replace and coalesce can be used with the @transform macro from the DataFramesMeta.jl package:
julia> using DataFramesMeta
julia> df = DataFrame(x = [1, missing, 3]);
julia> @transform(df, x = replace(:x, missing => 0))
3×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
julia> df = DataFrame(x = [1, missing, 3]);
julia> @transform(df, x = coalesce.(:x, 0))
3×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 0 │
│ 3 │ 3 │
二、delete
删除某列:
julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"], C = 2:5)
4x3 DataFrame
|-------|---|-----|---|
| Row # | A | B | C |
| 1 | 1 | "M" | 2 |
| 2 | 2 | "F" | 3 |
| 3 | 3 | "F" | 4 |
| 4 | 4 | "M" | 5 |
julia> delete!(df, :B)
4x2 DataFrame
|-------|---|---|
| Row # | A | C |
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 4 |
| 4 | 4 | 5 |
或
如果不知道列名的话:
df = df[:,[1:2,4:end]] # remove column 3