当我们要处理数据时,我们第一件事是看一下数据的结构是怎么样的,各是什么数据类型,每一个变量各自的值有哪些,他们的分布是怎么样的。
我们的数据长这个样子:
V1 V2
1 7063b3d0c075a4d276c5f06f4327cf4a effb071415be51f11e845884e67c0f8c
2 0db66c0dd3993fd3504bb98c3beb15b3 f87ff481d85d2f95335ab602f38a7655
3 f8c065dc140ec74c6e44144164e618e3 8a27d9a6c59628c991c154e8d93f412e
4 2c6082cf0d68e244f2a10325e8d1b85b ecea5fe33e6817d09c395f2910479728
5 2c6082cf0d68e244f2a10325e8d1b85b 31a3d0420d89c9b121bb55dbdbbeda6b
V3 V4 V5
1 1426406400 1 20150315
2 1426417200 1 20150315
3 1426406400 2 20150315
4 1426417200 1 20150315
5 1426417200 1 20150315
一步步来,首先,数据的结构信息:
str(c)
得到:
'data.frame': 5 obs. of 5 variables:
$ V1: Factor w/ 349946 levels "0000110e00f7c85f550b329dc3d76210",..: 153443 18715 340011 60216 60216
$ V2: Factor w/ 10278 levels "00088cb1e6d740fcd42bc8ff2673c805",..: 9613 9962 5650 9482 2041
$ V3: int 1426406400 1426417200 1426406400 1426417200 1426417200
$ V4: int 1 1 2 1 1
$ V5: int 20150315 20150315 20150315 20150315 20150315
接着,我们要知道各变量的分布信息,可以使用summary函数或者describe函数:
describe(c)
得到:
5 Variables 5 Observations
-----------------------------------------------------------------------------
V1
n missing unique
5 0 4
0db66c0dd3993fd3504bb98c3beb15b3 (1, 20%)
2c6082cf0d68e244f2a10325e8d1b85b (2, 40%)
7063b3d0c075a4d276c5f06f4327cf4a (1, 20%)
f8c065dc140ec74c6e44144164e618e3 (1, 20%)
-----------------------------------------------------------------------------
V2
n missing unique
5 0 5
31a3d0420d89c9b121bb55dbdbbeda6b (1, 20%)
8a27d9a6c59628c991c154e8d93f412e (1, 20%)
ecea5fe33e6817d09c395f2910479728 (1, 20%)
effb071415be51f11e845884e67c0f8c (1, 20%)
f87ff481d85d2f95335ab602f38a7655 (1, 20%)
-----------------------------------------------------------------------------
V3
n missing unique Info Mean
5 0 2 0.75 1.426e+09
1426406400 (2, 40%), 1426417200 (3, 60%)
-----------------------------------------------------------------------------
V4
n missing unique Info Mean
5 0 2 0.5 1.2
1 (4, 80%), 2 (1, 20%)
-----------------------------------------------------------------------------
V5
n missing unique Info Mean
5 0 1 0 20150315
-----------------------------------------------------------------------------