fread
to read delimited filesrbindlist
to combine tablesfwrite
to write delimited filesThe documentation included during installation (2.1.1) is the best resource. Beyond base R, the Task Views (2.4.1.1) are a good way to see what is on CRAN. And beyond CRAN, specialized repositories offer further packages, like Bioconductor for bioinformatics.
The R website’s help page has a great overview of resources to use when you have a programming problem. It’s worth reading all the way through.
The first step is to talk through the problem, even if no one is around to listen:
totally commit to asking a thorough, detailed question of this imaginary person or inanimate object.
– Jeff Atwood, Rubber Duck Problem Solving
This rubber-ducking often solves it. If the problem persists, work on making a self-contained example of what you want to happen: reproducible input and associated desired output. Test code for the example in a new R session, since it’s possible that attached objects and packages will introduce confusing complications.
This example-building often solves it. If the problem persists, the next step is to formulate a question built around the example. This requires explaining the problem succinctly. For some advice on how to ask a good question, see BrodieG’s post.
If asking folks online instead of colleagues, make sure your post contains a single question, self-contained without a need for a sweeping narrative of the broader project. On the other hand, be open to the possibility that you have identified and characterized the core problem incorrectly. That is, be respectful when folks ask for clarification or suggest a different approach, and don’t respond by shouting “I told you what you need to know and what I want to do. Now, where is my answer!?”
The two main venues for questions, also identified on the R website’s help page, are:
mailing lists
Stack Overflow (“SO”)
There are different rules for posting to each mailing list and to SO. It is expected that every poster has read these rules and exhausted other resources.
Here are some common question types:
Parsing (date-time or regex). This tutorial has some pointers for date-time (3.6.4) and string parsing (4.8) issues. Make sure you’ve gone through the relevant docs and their examples, e.g., example(gsub)
or example(as.Date)
.
Performance. For a question like “How can I write this code better?”, consider the Code Review Stack Exchange site. Like SO and the mailing lists, it has its own posting guidelines. And look into profiling, ?Rprof
. If it’s related to data.table, toggle verbose messages on with options(datatable.verbose = TRUE)
before running the code.
For a question like “What is the fastest way to do x?”, make sure the example is big enough so timing matters. Generally, it is best to write a parameterized example, e.g., with n
for the number of rows, ng
for the number of groups, or similar. Common functions for timing are system.time
; benchmark
from the rbenchmark package; and microbenchmark
from the microbenchmark package. Here’s a good example question.
Recommendations. Questions asking about which package or software to use usually don’t yield good answers. Better to investigate directly, but mailing lists may serve as a last resort, assuming their posting rules allow it.
R>
, >
, or +
at the beginning of each new line. Be sure to explain the purpose and function of the code itself, either as code comments or in separate prose. Finally, it’s helpful to display important objects (e.g., I show input and output here).
Tools for outputting results for publication will always be in flux. When I started using R, tikzDevice, ggplot and Sweave were the packages for handling output. But since then, tikzDevice and ggplot went unmaintained and were removed from CRAN; and Sweave was removed from CRAN, integrated into base R’s utils package, and then supplanted in popularity by knitr (which builds on it). The upshot is: I think one should maintain a clean division between core code and code for publishing output, and as little time as humanly possible should be put into learning tools for the latter.
When plotting (3.7.4.3), graphs can be sent to various output file formats, like PNG and PDF, as seen in 3.2.4.4. See ?Devices
for a full list.
For multilevel tables, the tables package is a good option. It’s been around long enough to be stable and is maintained by a member of the core R development team. The tables package outputs in many formats, including LaTeX code. Others like it can be found in the Reproducible Research Task View. And other output formats, like spreadsheets, can be found using google.
Tables and graphs can be embedded in documents programmatically with LaTeX, but dropping R outputs off and then reading them into LaTeX can get complicated.
These days, knitr and related packages (rmarkdown, bookdown) are the friendliest tools for making documents that compile jointly with R code. They output to PDF; or html; or a multi-file document like this tutorial. They also have nice integration with Rstudio (syntax highlighting, a dynamic table of contents in the sidebar, one-click compilation). Knitr is compatible with LaTeX or markdown. LaTeX offers more control and typesets properly for the printed page, but only outputs nicely to PDF and can be cumbersome to write and maintain relative to markdown. Anyway, just pick one.
For interactive documents, there are a variety of shiny whizz-bang tools: shiny (yes, really); DT; and various packages for visualizing maps and graphs (“networks”). Code written for these is likely to have a short shelf-life compared to documents and code written mostly in R and Markdown or LaTeX, I reckon, but they can be fun.