如何查看数据?
本节仅介绍一些简单的函数:dim()
,class()
,nrow()
,ncol()
, object.size()
, names()
,head()
,tail()
,summary()
,table()
, str()
> class(plants)
[1] "data.frame"
> dim(plants)
[1] 5166 10
> nrow(plants)
[1] 5166
> ncol(plants)
[1] 10
> object.size(plants)
644232 bytes
> names(plants)
[1] "Scientific_Name" "Duration" "Active_Growth_Period" "Foliage_Color" "pH_Min"
[6] "pH_Max" "Precip_Min" "Precip_Max" "Shade_Tolerance" "Temp_Min_F"
> head(plants)
Scientific_Name Duration Active_Growth_Period Foliage_Color pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
1 Abelmoschus <NA> <NA> <NA> NA NA NA NA <NA> NA
2 Abelmoschus esculentus Annual, Perennial <NA> <NA> NA NA NA NA <NA> NA
3 Abies <NA> <NA> <NA> NA NA NA NA <NA> NA
4 Abies balsamea Perennial Spring and Summer Green 4 6 13 60 Tolerant -43
5 Abies balsamea var. balsamea Perennial <NA> <NA> NA NA NA NA <NA> NA
6 Abutilon <NA> <NA> <NA> NA NA NA NA <NA> NA
head
是默认预览前6行,可以通过head(plants, n=10)
来预览前10行
> tail(plants, 3)
Scientific_Name Duration Active_Growth_Period Foliage_Color pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
5164 Zostera marina Perennial <NA> <NA> NA NA NA NA <NA> NA
5165 Zoysia <NA> <NA> <NA> NA NA NA NA <NA> NA
5166 Zoysia japonica Perennial <NA> <NA> NA NA NA NA <NA> NA
预览最后三行,默认是6行。
> summary(plants)
Scientific_Name Duration Active_Growth_Period Foliage_Color pH_Min pH_Max
Abelmoschus : 1 Perennial :3031 Spring and Summer : 447 Dark Green : 82 Min. :3.000 Min. : 5.100
Abelmoschus esculentus : 1 Annual : 682 Spring : 144 Gray-Green : 25 1st Qu.:4.500 1st Qu.: 7.000
Abies : 1 Annual, Perennial: 179 Spring, Summer, Fall: 95 Green : 692 Median :5.000 Median : 7.300
Abies balsamea : 1 Annual, Biennial : 95 Summer : 92 Red : 4 Mean :4.997 Mean : 7.344
Abies balsamea var. balsamea: 1 Biennial : 57 Summer and Fall : 24 White-Gray : 9 3rd Qu.:5.500 3rd Qu.: 7.800
Abutilon : 1 (Other) : 92 (Other) : 30 Yellow-Green: 20 Max. :7.000 Max. :10.000
(Other) :5160 NA's :1030 NA's :4334 NA's :4334 NA's :4327 NA's :4327
Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
Min. : 4.00 Min. : 16.00 Intermediate: 242 Min. :-79.00
1st Qu.:16.75 1st Qu.: 55.00 Intolerant : 349 1st Qu.:-38.00
Median :28.00 Median : 60.00 Tolerant : 246 Median :-33.00
Mean :25.57 Mean : 58.73 NA's :4329 Mean :-22.53
3rd Qu.:32.00 3rd Qu.: 60.00 3rd Qu.:-18.00
Max. :60.00 Max. :200.00 Max. : 52.00
NA's :4338 NA's :4338 NA's :4328
预览数据分布,概况,根据数据类型,展现不同的属性,比如对于numeric
类型,展现出最小值,四分之一处大小的数,中位数,均值,四分之三处大小的数,最大值。对于factor
类型的数据(关于种类的变量),展现的是每个种类出现的次数。
> table(plants$Active_Growth_Period)
Fall, Winter and Spring Spring Spring and Fall Spring and Summer Spring, Summer, Fall Summer Summer and Fall Year Round
15 144 10 447 95 92 24 5
table
展示某个属性的分布
> str(plants)
'data.frame': 5166 obs. of 10 variables:
$ Scientific_Name : Factor w/ 5166 levels "Abelmoschus",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Duration : Factor w/ 8 levels "Annual","Annual, Biennial",..: NA 4 NA 7 7 NA 1 NA 7 7 ...
$ Active_Growth_Period: Factor w/ 8 levels "Fall, Winter and Spring",..: NA NA NA 4 NA NA NA NA 4 NA ...
$ Foliage_Color : Factor w/ 6 levels "Dark Green","Gray-Green",..: NA NA NA 3 NA NA NA NA 3 NA ...
$ pH_Min : num NA NA NA 4 NA NA NA NA 7 NA ...
$ pH_Max : num NA NA NA 6 NA NA NA NA 8.5 NA ...
$ Precip_Min : int NA NA NA 13 NA NA NA NA 4 NA ...
$ Precip_Max : int NA NA NA 60 NA NA NA NA 20 NA ...
$ Shade_Tolerance : Factor w/ 3 levels "Intermediate",..: NA NA NA 3 NA NA NA NA 2 NA ...
$ Temp_Min_F : int NA NA NA -43 NA NA NA NA -13 NA ...
str
查看数据的结构 (structure)