I have a table in csv format, the data is the following:
1 3 1 2
1415_at 1 8.512147859 8.196725061 8.174426394 8.62388149
1411_at 2 9.119200527 9.190318548 9.149239039 9.211401637
1412_at 3 10.03383593 9.575728316 10.06998673 9.735217522
1413_at 4 5.925999419 5.692092375 5.689299161 7.807354922
When I read it with:
m
and print the values of m, I notice that they change to:
X X.1 X1 X3 X1.1 X4
1 1415_at 1 8.512148 8.196725 8.174426 8.623881
I made some manipulation to keep only those columns that are labelled 1 or 2, so I do that with:
smallerdat
write.csv(smallerdat,"table2.csv")
it writes me the file with those annoying headers and that first column added, which I do not need it:
X X.1 X1 X1.1 X2
1 1415_at 1 8.512148 8.174426 8.623881
so when I open that data in Excel the headers are still X, X.1 and son on. What I need is that the headers remain the same as:
1 1 2
1415_at 1 8.196725061 8.174426394 8.62388149
any help?
Please notice also that first column that is added automatically, I do not need it, so how I can get rid that of that column?
解决方案
There are two issues here.
For reading your CSV file, use:
m
Notice that by doing this, though, you can't use the column names as easily. You have to quote them with backticks instead, and will most likely still run into problems because of duplicated column names:
m$1
# Error: unexpected numeric constant in "mydf$1"
mydf$`1`
# [1] 8.512148 9.119201 10.033836 5.925999
For writing your "m" object to a CSV file, use:
write.csv(m, "table2.csv", row.names = FALSE)
After reading your file in using the method in step 1, you can subset as follows. If you wanted the first column and any columns named "3" or "4", you can use:
m[names(m) %in% c("", "3", "4")]
# 3 4
# 1 1415_at 1 8.196725 8.623881
# 2 1411_at 2 9.190319 9.211402
# 3 1412_at 3 9.575728 9.735218
# 4 1413_at 4 5.692092 7.807355
Update: Fixing the names before using write.csv
If you don't want to start from step 1 for whatever reason, you can still fix your problem. While you've succeeded in taking a subset with your grep statement, that doesn't change the column names (not sure why you would expect that it should). You have to do this by using gsub or one of the other regex solutions.
Here are the names of the columns with the way you have read in your CSV:
names(m)
# [1] "X" "X.1" "X1" "X3" "X1.1" "X2"
You want to:
Remove all "X"s
Remove all ".some-number"
So, here's a workaround:
# Change the names in your original dataset
names(m)
# Create a temporary object to match desired names
getme
# Subset your data
smallerdat
# Reassign names to your subset
names(smallerdat)