How I use R

Hacks & Hackers - Data Journalism

Andy Pryke - Andy@The-Data-Mine.co.uk

Contents

  • Data Journalists Using R
  • Organising Projects
  • Reproducibility
  • Finding Out How to ...
  • Bonus - Nice Packages

These slides: source (Slidify / R Markdown), view online

Data Journalists Using R

Data Journalists Using R - Buzzfeed

Data Journalists Using R - Five Thirty Eight

How I Organise Projects

  • Keep original data somewhere separate
    • e.g. "originals" & "processed" directories
  • Start with a Rmarkdown file
  • Work on the command line then move code to file
  • Aim for readable code
  • Create functions to make code easier to read
  • Use very clear variable & function names
    • Don't worry if they are too long
  • Use git / github to version your code

Reproducibility - Why

So...

  • when I make a mistake I can re-do the work instantly
  • others can check how I did things (and if it was right)
  • so future me can check how I did things
  • so future me can re-use my work for new data / project

Reproducibility - How

  • Try to eliminate manual steps (within reason)
  • Use package "checkpoint" - to ensure you always use the same version of packages, even years later
  • Keep your code somewhere - github?
  • Include your copy of original data
  • Document

Finding Out How to ...

Searching for "R" can be hard

Google actually doesn't do too bad a job
try adding "package" if you're not getting good results

Finding Out How to ... - Rseek

Finding Example Code - github

Question...

What if someone copied your story?

  • All of it, exactly?
  • A paragraph?
  • Rephrased it?
  • Used the same themes?
  • etc.

Question...

What if someone copied your script?

  • All of it, exactly?
  • A function?
  • Rephrased it?
  • Used the same themes?
  • etc.

Question...

What if someone copied your script?

  • All of it, exactly?
  • A function?
  • Rephrased it?
  • Used the same themes?
  • etc.

Where is the line for code?

  • Sometimes there is a licence, often not.

Extra slides follow....

Bonus - Nice Packages beginning with "D"

  • dplyr - transforming data
  • data.table - Fast handling of larger datasets (10's of Millions)
  • DT - nice display of tables in HTML

Test DT in slidify

   dynamic_table <- DT::datatable(iris)
   DT::saveWidget(dynamic_table, 'example.html')
   cat('<iframe src="example.html" height=20 style="height:20"> </iframe>')

Hello

Next slide