Making Summary Tables in R

By Deependra Dhakal in tidyverse R

August 4, 2020

Background

Table output of R is one of the richest and satisfying to use feature. Rmarkdown format provides loads of package support to create, format, and present tables beautifully. This is on one aspect extremely useful while on the other end it could very well be daunting as to choose between various package options to use while formating your table. I have a bunch of suggestions and enlistments here to help get off that dilemma.

Once in a while someone writes a blog post and addressess these issues. This is true for this topic too. https://rfortherestofus.com/2019/11/how-to-make-beautiful-tables-in-r/ has wonderfully curated list of several such options. My intnet too is to supplement the information included in the post.

General purpose tables

Here goes the list of packages:

Table output (or in general, dataframe printing) is a more general idea for Rmarkdown documents. It can be set with a print option set in the YAML header.

title: Some good amount of table
output:
  html_document:
    df_print: paged

The df_print option can take other values such as default, kable and tibble. More on this at https://bookdown.org/yihui/rmarkdown/html-document.html#data-frame-printing.

gt package
kable + kableExtra.

Here are a bunch of appealing examples that surely entice you into using this combination of packages.

vignette(package = "kableExtra", topic = "awesome_table_in_pdf")

Sharla Gelfand also has whole repository maintained for sharing examples on use of kableExtra. Check that out at: https://github.com/sharlagelfand/kableExtra-cookbook

Additionally, I have forked the repo and tried to contribute some of my own hacks (not exactly my own, but learnt elsewhere on the internet) to the bookdown project.

formattable
DT. More at: https://rstudio.github.io/DT/
reactable. A demonstration of use at: https://projects.fivethirtyeight.com/2019-womens-world-cup-predictions/
flextable: https://davidgohel.github.io/flextable/index.html
huxtable. https://hughjonesd.github.io/huxtable/
rhandsontable. This extremely helpful package in case if you have dirty data and data representation. This lets you manually edit the data like working in a spreadsheet software. More on: https://jrowen.github.io/rhandsontable/
pixiedust. https://github.com/nutterb/pixiedust

Summary tables

rtables package

For the time of creating this post, the package rtable was available only as github project of G. Becker. In particular, specific branch was used to compile a package. However, as he mentions here, the project has a long history of being released as open-source well after being using as proprietary for some time.

We start by installing and loading essential libraries.

Throughout, williams.trees dataset from the agridat package will be used. Apart from already existing factors and numeric variables, an additional factor is generated from random process because pre-existing gen (genotype information) variable is nested in structure. Nested variable means that summary for that is available only for specific grouping and not for overall use.

Using rtable::split_cols_by, we split the analysis variable into multiple columns formed by a grouping variable.

##             Chanthaburi              HuaiBong               Ratchaburi               SaiThong                Sakaerat                 SiSaKet       
##          D       B       C       D       B       C       D       B       C       D       B       C       D       B       C       D       B       C  
## ————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
## mean   254.7   225.0   270.9   129.8   224.4   215.0   413.2   464.6   512.2   702.6   634.6   593.2   349.1   312.9   343.2   480.1   470.3   471.2

Row splitting can also be done as shown.

##           Chanthaburi    HuaiBong    Ratchaburi    SaiThong     Sakaerat     SiSaKet  
## ——————————————————————————————————————————————————————————————————————————————————————
## D         12 (40.0%)    6 (22.2%)    13 (38.2%)   8 (24.2%)    7 (21.2%)    12 (34.3%)
##   mean      254.67        129.83       413.23       702.62       349.14       480.08  
##   sd        164.54        37.45        131.67       224.19       114.81       228.98  
##   range     493.00        114.00       392.00       545.00       303.00       742.00  
##   max       569.00        189.00       608.00       894.00       538.00       839.00  
##   min        76.00        75.00        216.00       349.00       235.00       97.00   
## B         11 (36.7%)    9 (33.3%)    9 (26.5%)    17 (51.5%)   9 (27.3%)    11 (31.4%)
##   mean      225.00        224.44       464.56       634.65       312.89       470.27  
##   sd        119.19        105.25       161.15       302.51       131.65       194.21  
##   range     311.00        301.00       464.00       967.00       357.00       651.00  
##   max       391.00        397.00       673.00      1083.00       523.00       809.00  
##   min        80.00        96.00        209.00       116.00       166.00       158.00  
## C          7 (23.3%)    12 (44.4%)   12 (35.3%)   8 (24.2%)    17 (51.5%)   12 (34.3%)
##   mean      270.86        215.00       512.25       593.25       343.18       471.17  
##   sd        107.16        86.82        132.98       236.15       197.17       168.65  
##   range     278.00        237.00       438.00       632.00       595.00       518.00  
##   max       415.00        337.00       764.00       899.00       658.00       738.00  
##   min       137.00        100.00       326.00       267.00       63.00        220.00

In the previous table we used custom function for summarizing. However, we can use pre-existing helper functions of R like the summary function.

##          Chanthaburi    HuaiBong    Ratchaburi    SaiThong     Sakaerat     SiSaKet  
##            (N=30)        (N=27)       (N=34)       (N=33)       (N=33)       (N=35)  
## —————————————————————————————————————————————————————————————————————————————————————
## D        12 (40.0%)    6 (22.2%)    13 (38.2%)   8 (24.2%)    7 (21.2%)    12 (34.3%)
##   Mean     254.67        129.83       413.23       702.62       349.14       480.08  
## B        11 (36.7%)    9 (33.3%)    9 (26.5%)    17 (51.5%)   9 (27.3%)    11 (31.4%)
##   Mean     225.00        224.44       464.56       634.65       312.89       470.27  
## C         7 (23.3%)    12 (44.4%)   12 (35.3%)   8 (24.2%)    17 (51.5%)   12 (34.3%)
##   Mean     270.86        215.00       512.25       593.25       343.18       471.17

In earlier functions, we used variable as data parameter. But dataset entirely can also be provided as a data parameter if summary involves multiple variables.

##                                 Chanthaburi      HuaiBong       Ratchaburi        SaiThong         Sakaerat         SiSaKet   
##                                   (N=30)          (N=27)          (N=34)           (N=33)           (N=33)          (N=35)    
## ——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
## D                               12 (40.0%)       6 (22.2%)      13 (38.2%)       8 (24.2%)        7 (21.2%)       12 (34.3%)  
##   Total genotypes                   12               6              13               8                7               12      
##   Unique genotypes                   4               4               5               4                4                4      
##   Genotypes with > 1 records   8.00 (26.67%)   2.00 (7.41%)    8.00 (23.53%)   4.00 (12.12%)     3.00 (9.09%)    8.00 (22.86%)
## B                               11 (36.7%)       9 (33.3%)       9 (26.5%)       17 (51.5%)       9 (27.3%)       11 (31.4%)  
##   Total genotypes                   11               9               9               17               9               11      
##   Unique genotypes                   4               4               4               5                4                5      
##   Genotypes with > 1 records   7.00 (23.33%)   5.00 (18.52%)   5.00 (14.71%)   12.00 (36.36%)   5.00 (15.15%)    6.00 (17.14%)
## C                                7 (23.3%)      12 (44.4%)      12 (35.3%)       8 (24.2%)        17 (51.5%)      12 (34.3%)  
##   Total genotypes                    7              12              12               8                17              12      
##   Unique genotypes                   3               3               5               3                5                5      
##   Genotypes with > 1 records   4.00 (13.33%)   9.00 (33.33%)   7.00 (20.59%)   5.00 (15.15%)    12.00 (36.36%)   7.00 (20.00%)

qwraps2 package

For constructing simple whole sample or subsample summary tables, qwarps package have simple interface. It provides a richly showcased vignette using mtcars dataset.

It requires markup language to be set early on in code chunk to render proper format.

Alternatively, row group name can be used for informing p-value. This is exemplified in the vignette for package.

gtsummary package

This package is a recent development but has a lot more preview ready examples as vignette. It is richer and more easily extensible in feature because it draws upon the gt package. Check vignettes out at cran repo for the package: https://cran.r-project.org/web/packages/gtsummary/index.html.

Posted on:: August 4, 2020

Length:: 6 minute read, 1067 words

Categories:: tidyverse R

Tags:: R

See Also:: Simulating genetic drift; Missing negative from the normal; Internals of Mixed Models