如何查看数据? 本节仅介绍一些简单的函数:dim(),class(),nrow(),ncol(), object.size(), names(),head(),tail(),summary(),table(), str()

> class(plants)
[1] "data.frame"
> dim(plants)
[1] 5166   10
> nrow(plants)
[1] 5166
> ncol(plants)
[1] 10
> object.size(plants)
644232 bytes
> names(plants)
 [1] "Scientific_Name"      "Duration"             "Active_Growth_Period" "Foliage_Color"        "pH_Min"              
 [6] "pH_Max"               "Precip_Min"           "Precip_Max"           "Shade_Tolerance"      "Temp_Min_F"  
> head(plants) 
               Scientific_Name          Duration Active_Growth_Period Foliage_Color pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
1                  Abelmoschus              <NA>                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA
2       Abelmoschus esculentus Annual, Perennial                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA
3                        Abies              <NA>                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA
4               Abies balsamea         Perennial    Spring and Summer         Green      4      6         13         60        Tolerant        -43
5 Abies balsamea var. balsamea         Perennial                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA
6                     Abutilon              <NA>                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA

head是默认预览前6行,可以通过head(plants, n=10) 来预览前10行

> tail(plants, 3)
     Scientific_Name  Duration Active_Growth_Period Foliage_Color pH_Min pH_Max Precip_Min Precip_Max Shade_Tolerance Temp_Min_F
5164  Zostera marina Perennial                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA
5165          Zoysia      <NA>                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA
5166 Zoysia japonica Perennial                 <NA>          <NA>     NA     NA         NA         NA            <NA>         NA

预览最后三行,默认是6行。

> summary(plants)
                     Scientific_Name              Duration              Active_Growth_Period      Foliage_Color      pH_Min          pH_Max      
 Abelmoschus                 :   1   Perennial        :3031   Spring and Summer   : 447      Dark Green  :  82   Min.   :3.000   Min.   : 5.100  
 Abelmoschus esculentus      :   1   Annual           : 682   Spring              : 144      Gray-Green  :  25   1st Qu.:4.500   1st Qu.: 7.000  
 Abies                       :   1   Annual, Perennial: 179   Spring, Summer, Fall:  95      Green       : 692   Median :5.000   Median : 7.300  
 Abies balsamea              :   1   Annual, Biennial :  95   Summer              :  92      Red         :   4   Mean   :4.997   Mean   : 7.344  
 Abies balsamea var. balsamea:   1   Biennial         :  57   Summer and Fall     :  24      White-Gray  :   9   3rd Qu.:5.500   3rd Qu.: 7.800  
 Abutilon                    :   1   (Other)          :  92   (Other)             :  30      Yellow-Green:  20   Max.   :7.000   Max.   :10.000  
 (Other)                     :5160   NA's             :1030   NA's                :4334      NA's        :4334   NA's   :4327    NA's   :4327    
   Precip_Min      Precip_Max         Shade_Tolerance   Temp_Min_F    
 Min.   : 4.00   Min.   : 16.00   Intermediate: 242   Min.   :-79.00  
 1st Qu.:16.75   1st Qu.: 55.00   Intolerant  : 349   1st Qu.:-38.00  
 Median :28.00   Median : 60.00   Tolerant    : 246   Median :-33.00  
 Mean   :25.57   Mean   : 58.73   NA's        :4329   Mean   :-22.53  
 3rd Qu.:32.00   3rd Qu.: 60.00                       3rd Qu.:-18.00  
 Max.   :60.00   Max.   :200.00                       Max.   : 52.00  
 NA's   :4338    NA's   :4338                         NA's   :4328

预览数据分布,概况,根据数据类型,展现不同的属性,比如对于numeric类型,展现出最小值,四分之一处大小的数,中位数,均值,四分之三处大小的数,最大值。对于factor类型的数据(关于种类的变量),展现的是每个种类出现的次数。

 > table(plants$Active_Growth_Period)

Fall, Winter and Spring                  Spring         Spring and Fall       Spring and Summer    Spring, Summer, Fall                  Summer         Summer and Fall              Year Round 
                     15                     144                      10                     447                      95                      92                      24                       5

table展示某个属性的分布

> str(plants)
'data.frame':    5166 obs. of  10 variables:
 $ Scientific_Name     : Factor w/ 5166 levels "Abelmoschus",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Duration            : Factor w/ 8 levels "Annual","Annual, Biennial",..: NA 4 NA 7 7 NA 1 NA 7 7 ...
 $ Active_Growth_Period: Factor w/ 8 levels "Fall, Winter and Spring",..: NA NA NA 4 NA NA NA NA 4 NA ...
 $ Foliage_Color       : Factor w/ 6 levels "Dark Green","Gray-Green",..: NA NA NA 3 NA NA NA NA 3 NA ...
 $ pH_Min              : num  NA NA NA 4 NA NA NA NA 7 NA ...
 $ pH_Max              : num  NA NA NA 6 NA NA NA NA 8.5 NA ...
 $ Precip_Min          : int  NA NA NA 13 NA NA NA NA 4 NA ...
 $ Precip_Max          : int  NA NA NA 60 NA NA NA NA 20 NA ...
 $ Shade_Tolerance     : Factor w/ 3 levels "Intermediate",..: NA NA NA 3 NA NA NA NA 2 NA ...
 $ Temp_Min_F          : int  NA NA NA -43 NA NA NA NA -13 NA ...

str查看数据的结构 (structure)

results matching ""

    No results matching ""