R Programming for Research

Chapter 5 reproducible research #1.

The video lectures for this chapter are embedded at relevant places in the text, with links to download a pdf of the associated slides for each video. You can also access a full playlist for the videos for this chapter .

5.1 What is reproducible research?

Download a pdf of the lecture slides for this video.

A data analysis is reproducible if all the information (data, files, etc.) required is available for someone else to re-do your entire analysis. This includes:

  • Data available
  • All code for cleaning raw data
  • All code and software (specific versions, packages) for analysis

Some advantages of making your research reproducible are:

  • You can (easily) figure out what you did six months from now.
  • You can (easily) make adjustments to code or data, even early in the process, and re-run all analysis.
  • When you’re ready to publish, you can (easily) do a last double-check of your full analysis, from cleaning the raw data through generating figures and tables for the paper.
  • You can pass along or share a project with others.
  • You can give useful code examples to people who want to extend your research.

Here is a famous research example of the dangers of writing code that is hard to double-check or confirm:

  • The Economist
  • The New York Times
  • Simply Statistics

Some of the steps required to making research reproducible are:

  • All your raw data should be saved in the project directory. You should have clear documentation on the source of all this data.
  • Scripts should be included with all the code used to clean this data into the data set(s) used for final analyses and to create any figures and tables.
  • You should include details on the versions of any software used in analysis (for R, this includes the version of R as well as versions of all packages used).
  • If possible, there should be no “by hand” steps used in the analysis; instead, all steps should be done using code saved in scripts. For example, you should use a script to clean data, rather than cleaning it by hand in Excel. If any “non-scriptable” steps are unavoidable, you should very clearly document those steps.

There are several software tools that can help you improve the reproducibility of your research:

  • knitr : Create files that include both your code and text. These can be rendered to create final reports and papers. They keep code within the final file for the report.
  • knitr complements : Create fancier tables and figures within RMarkdown documents. Packages include tikzDevice , animate , xtables , and pander .
  • packrat : Save versions of each package used for the analysis, then load those package versions when code is run again in the future.

In this section, I will focus on using knitr and RMarkdown files.

5.2 Markdown

R Markdown files are mostly written using Markdown. To write R Markdown files, you need to understand what markup languages like Markdown are and how they work.

In Word and other word processing programs you have used, you can add formatting using buttons and keyboard shortcuts (e.g., “Ctrl-B” for bold). The file saves the words you type. It also saves the formatting, but you see the final output, rather than the formatting markup, when you edit the file (WYSIWYG – what you see is what you get).

In markup languages, on the other hand, you markup the document directly to show what formatting the final version should have (e.g., you type **bold** in the file to end up with a document with bold ).

Examples of markup languages include:

  • HTML (HyperText Markup Language)
  • Markdown (a “lightweight” markup language)

For example, Figure 5.1 some marked-up HTML code from CSU’s website, while Figure 5.2 shows how that file looks when it’s rendered by a web browser.

Example of the source of an HTML file.

Figure 5.1: Example of the source of an HTML file.

Example of a rendered HTML file.

Figure 5.2: Example of a rendered HTML file.

To write a file in Markdown, you’ll need to learn the conventions for creating formatting. This table shows what you would need to write in a flat file for some common formatting choices:

Code Rendering Explanation
boldface
italicized
hyperlink
first-level header
second-level header

Some other simple things you can do in Markdown include:

  • Lists (ordered or bulleted)
  • Figures from file
  • Block quotes
  • Superscripts

For more Markdown conventions, see RStudio’s R Markdown Reference Guide (link also available through “Help” in RStudio).

5.3 Literate programming in R

Literate programming , an idea developed by Donald Knuth, mixes code that can be executed with regular text. The files you create can then be rendered, to run any embedded code. The final output will have results from your code and the regular text.

The knitr package can be used for literate programming in R. In essence, knitr allows you to write an R Markdown file that can be rendered into a pdf, Word, or HTML document.

Here are the basics of opening and rendering an R Markdown file in RStudio:

  • To open a new R Markdown file, go to “File” -> “New File” -> “RMarkdown…” -> for now, chose a “Document” in “HTML” format.
  • This will open a new R Markdown file in RStudio. The file extension for R Markdown files is “.Rmd”.
  • The new file comes with some example code and text. You can run the file as-is to try out the example. You will ultimately delete this example code and text and replace it with your own.
  • Once you “knit” the R Markdown file, R will render an HTML file with the output. This is automatically saved in the same directory where you saved your .Rmd file.
  • Write everything besides R code using Markdown syntax.

To include R code in an RMarkdown document, you need to separate off the code chunk using the following syntax:

This syntax tells R how to find the start and end of pieces of R code when the file is rendered. R will walk through, find each piece of R code, run it and create output (printed output or figures, for example), and then pass the file along to another program to complete rendering (e.g., Tex for pdf files).

You can specify a name for each chunk, if you’d like, by including it after “r” when you begin your chunk. For example, to give the name load_nepali to a code chunk that loads the nepali dataset, specify that name in the start of the code chunk:

Here are a couple of tips for naming code chunks:

  • Chunk names must be unique across a document.
  • Any chunks you don’t name are given numbers by knitr .

You do not have to name each chunk. However, there are some advantages:

  • It will be easier to find any errors.
  • You can use the chunk labels in referencing for figure labels.
  • You can reference chunks later by name.

You can add options when you start a chunk. Many of these options can be set as TRUE / FALSE and include:

Option Action
Print out the R code?
Run the R code?
Print out messages?
Print out warnings?
If FALSE, run code, but don’t print code or results

Other chunk options take values other than TRUE / FALSE. Some you might want to include are:

Option Action
How to print results (e.g., runs the code, but doesn’t print the results)
Width to print your figure, in inches (e.g., )
Height to print your figure

Add these options in the opening brackets and separate multiple ones with commas:

I will cover other chunk options later, once you’ve gotten the chance to try writting R Markdown files.

You can set “global” options at the beginning of the document. This will create new defaults for all of the chunks in the document. For example, if you want echo , warning , and message to be FALSE by default in all code chunks, you can run:

If you set both global and local chunk options that you set specifically for a chunk will take precedence over global options. For example, running a document with:

would print the code for the check_nepali chunk, because the option specified for that specific chunk ( echo = TRUE ) would override the global option ( echo = FALSE ).

You can also include R output directly in your text (“inline”) using backticks:

“There are `r nrow(nepali)` observations in the nepali data set. The average age is `r mean(nepali$age, na.rm = TRUE)` months.”

Once the file is rendered, this gives:

“There are 1000 observations in the nepali data set. The average age is 37.662 months.”

Here are two tips that will help you diagnose some problems rendering R Markdown files:

  • Be sure to save your R Markdown file before you run it.
  • All the code in the file will run “from scratch”– as if you just opened a new R session.
  • The code will run using, as a working directory, the directory where you saved the R Markdown file.

You’ll want to try out pieces of your code as you write an R Markdown document. There are a few ways you can do that:

  • You can run code in chunks just like you can run code from a script (Ctrl-Return or the “Run” button).
  • You can run all the code in a chunk (or all the code in all chunks) using the different options under the “Run” button in RStudio.
  • All the “Run” options have keyboard shortcuts, so you can use those.

You can render R Markdown documents to other formats:

  • Pdf (requires that you’ve installed “Tex” on your computer.)
  • Slides (ioslides)

Click the button to the right of “Knit” to see different options for rendering on your computer.

You can freely post your RMarkdown documents at RPubs . If you want to post to RPubs, you need to create an account. Once you do, you can click the “Publish” button on the window that pops up with your rendered file. RPubs can also be a great place to look for interesting example code, although it sometimes can be pretty overwhelmed with MOOC homework.

If you’d like to find out more, here are two good how-to books on reproducible research in R (the CSU library has both in hard copy):

  • Reproducible Research with R and RStudio , Christopher Gandrud
  • Dynamic Documents with R and knitr , Yihui Xie

5.4 Style guidelines

R style guidelines provide rules for how to format code in an R script. Some people develop their own style as they learn to code. However, it is easy to get in the habit of following style guidelines, and they offer some important advantages:

  • Clean code is easier to read and interpret later.
  • It’s easier to catch and fix mistakes when code is clear.
  • Others can more easily follow and adapt your code if it’s clean.
  • Some style guidelines will help prevent possible problems (e.g., avoiding . in function names).

For this course, we will use R style guidelines from two sources:

  • Google’s R style guidelines
  • Hadley Wickham’s R style guidelines

These two sets of style guidelines are very similar.

Hear are a few guidelines we’ve already covered in class:

  • Use <- , not = , for assignment.
  • All lowercase letters or numbers
  • Use underscore ( _ ) to separate words, not camelCase or a dot ( . ) (this differs for Google and Wickham style guides)
  • Have some consistent names to use for “throw-away” objects (e.g., df , ex , a , b )
  • Descriptive names for R scripts (“random_group_assignment.R”)
  • Nouns for objects ( todays_groups for an object with group assignments)
  • Verbs for functions ( make_groups for the function to assign groups)

5.4.1 Line length

Google: Keep lines to 80 characters or less

To set your script pane to be limited to 80 characters, go to “RStudio” -> “Preferences” -> “Code” -> “Display”, and set “Margin Column” to 80.

This guideline helps ensure that your code is formatted in a way that you can see all of the code without scrolling horizontally (left and right).

5.4.2 Spacing

  • Binary operators (e.g., <- , + , - ) should have a space on either side
  • A comma should have a space after it, but not before.
  • Colons should not have a space on either side.
  • Put spaces before and after = when assigning parameter arguments

5.4.3 Semicolons

Although you can use a semicolon to put two lines of code on the same line, you should avoid it.

5.4.4 Commenting

  • For a comment on its own line, use # . Follow with a space, then the comment.
  • You can put a short comment at the end of a line of R code. In this case, put two spaces after the end of the code, one # , and one more space before the comment.
  • If it helps make it easier to read your code, separate sections using a comment character followed by many hyphens (e.g., #------------ ). Anything after the comment character is “muted”.

5.4.5 Indentation

  • Within function calls, line up new lines with first letter after opening parenthesis for parameters to function calls:

5.4.6 Code grouping

  • Group related pieces of code together.
  • Separate blocks of code by empty spaces.

Note that this grouping often happens naturally when using tidyverse functions, since they encourage piping ( %>% and + ).

5.4.7 Broader guidelines

  • Omit needless code.
  • Don’t repeat yourself.

We’ll learn more about satisfying these guidelines when we talk about writing your own functions in the next part of the class.

5.5 More with knitr

5.5.1 Equations in knitr

You can write equations in RMarkdown documents by setting them apart with dollar signs ( $ ). For an equation on a line by itself ( display equation ), you two $ s before and after the equation, on separate lines, then use LaTex syntax for writing the equations.

To help with this, you may want to use this LaTex math cheat sheet. . You may also find an online LaTex equation editor like Codecogs.com helpful.

Note: Equations denoted this way will always compile for pdf documents, but won’t always come through on Markdown files (for example, GitHub won’t compile math equations).

For example, writing this in your R Markdown file:

will result in this rendered equation:

\[ E(Y_{t}) \sim \beta_{0} + \beta_{1}X_{1} \]

To put math within a sentence ( inline equation ), just use one $ on either side of the math. For example, writing this in a R Markdown file:

The rendered document will show up as:

“We are trying to model \(E(Y_{t})\) .”

5.5.2 Figures from file

You can include not only figures that you create with R, but also figures that you have saved on your computer.

The best way to do that is with the include_graphics function in knitr :

reproducible research course project 1

This example would include a figure with the filename “MyFigure.png” that is saved in the “figures” sub-directory of the parent directory of the directory where your .Rmd is saved. Don’t forget that you will need to give an absolute pathway or the relative pathway from the directory where the .Rmd file is saved .

5.5.3 Saving graphics files

You can save figures that you create in R. Typically, you won’t need to save figures for an R Markdown file, since you can include figure code directly. However, you will sometimes want to save a figure from a script. You have two options:

  • Use the “Export” choice in RStudio
  • Write code to export the figure in your R script

To make your research more reproducible, use the second choice.

To use code export a figure you created in R, take three steps:

  • Open a graphics device (e.g., pdf("MyFile.pdf") ).
  • Write the code to print your plot.
  • Close the graphics device using dev.off() .

For example, the following code would save a scatterplot of time versus passes as a pdf named “MyFigure” in the “figures” subdirectory of the current working directory:

If you create multiple plots before you close the device, they’ll all save to different pages of the same pdf file.

You can open a number of different graphics devices. Here are some of the functions you can use to open graphics devices:

You will use a device-specific function to open a graphics device (e.g., pdf ). However, you will always close these devices with dev.off .

Most of the functions to open graphics devices include parameters like height and width . These can be used to specify the size of the output figure. The units for these depend on the device (e.g., inches for pdf , pixels by default for png ). Use the helpfile for the function to determine these details.

5.5.4 Tables in R Markdown

If you want to create a nice, formatted table from an R dataframe, you can do that using kable from the knitr package.

letters numbers
a 1
b 2
c 3

There are a few options for the kable function:

arg expl
Column names (default: column name in the dataframe)
A vector giving the alignment for each column (‘l’, ‘c’, ‘r’)
Table caption
Number of digits to round to. If you want to round columns different amounts, use a vector with one element for each column.
Table 5.1: My new table
First 3 letters First 3 numbers
a -1.13
b 0.21
c -0.21

From Yihui:

“ Want more features? No, that is all I have. You should turn to other packages for help. I’m not going to reinvent their wheels.”

If you want to do fancier tables, you may want to explore the xtable and pander packages. As a note, these might both be more effective when compiling to pdf, rather than html.

5.6 In-course exercise Chapter 5

For all of today’s tasks, you’ll use the code from last week’s in-course exercise to do the exercises. This week we are not focusing on writing new code, but rather on how to take R code and put it in an R Markdown file, so we can create reports from files that include the original code.

5.6.1 Creating a Markdown document

First, you’ll create a Markdown document, without any R code in it yet.

In RStudio, go to “File” -> “New File” -> “R Markdown”. From the window that brings up, choose “Document” on the left-hand column and “HTML” as the output format. A new file will open in the script pane of your RStudio session. Save this file (you may pick the name and directory). The file extension should be “.Rmd”.

First, before you try to write your own Markdown, try rendering the example that the script includes by default. (This code is always included, as a template, when you first open a new RMarkdown file using the RStudio “New file” interface we used in this example.) Try rendering this default R Markdown example by clicking the “Knit” button at the top of the script file.

For some of you, you may not yet have everything you need on your computer to be able to get this to work. If so, let me know. RStudio usually includes all the necessary tools when you install it, but there may be some exceptions.

If you could get the document to knit, do the following tasks:

  • Look through the HTML document that was created. Compare it to the R Markdown script that created it, and see if you can understand, at least broadly, what’s going on.
  • Look in the directory where you saved the R Markdown file. You should now also see a new, .html file in that folder. Try opening it with a web browser like Safari.
  • Go back to the R Markdown file. Delete everything after the initial header information (everything after the 6th line). In the header information, make sure the title, author, and date are things you’re happy with. If not, change them.
  • Bold and italic text
  • A list, either ordered or bulleted

5.6.2 Adding in R code

Now incorporate the R code from previous weeks’ exercises into your document. Once you get the document to render with some basic pieces of code in it, try the following:

  • Try some different chunk options. For example, try setting echo = FALSE in some of your code chunks. Similarly, try using the options results = "hide" and include = FALSE .
  • You should have at least one code chunk that generates figures. Try experimenting with the fig.width and fig.height options for the chunk to change the size of the figure.
  • Try using the global commands. See if you can switch the echo default value for this document from TRUE (the usual default) to FALSE.

5.6.3 Working with R Markdown documents

Finally, try the following tasks to get some experience working with R Markdown files in RStudio:

  • Go to one of your code chunks. Locate the small gray arrow just to the left of the line where you initiate the code chunk. Click on it and see what happens. Then click on it again.
  • Put your cursor inside one of your code chunks. Try using the “Run” button (or Ctrl-Return) to run code in that chunk at your R console. Did it work?
  • Pick a code chunk in your document. Put your cursor somewhere in the code in that chunk. Click on the “Run” button and choose “Run All Chunks Above”. What did that do? If it did not work, what do you think might be going on? (Hint: Check getwd() and think about which directory you’ve used to save your R Markdown file.)
  • Pick another chunk of code. Put the cursor somewhere in the code for that chunk. Click on the “Run” button and choose “Run Current Chunk”. Then try “Run Next Chunk”. Try to figure out all the options the “Run” button gives you and when each might be useful.
  • Click on the small gray arrow to the right of the “Knit HTML” button. If the option is offered, select “Knit Word” and try it. What does this do?

5.6.4 R style guidelines

Go through all the R code in your R Markdown file. Are there are places where your code is not following style conventions for R? Clean up your code to correct any of these issues.

Reproducible Research Course Project 1

This is my submission for Reproducible Research Course Project 1. To read more information view the ReadMe on GitHub.

The data for the assignment can be downloaded here .

Loading and preprocessing the data

Show any code that is needed to 1. Load the data (i.e. read.csv()) 2. Process/transform the data (if necessary) into a format suitable for your analysis

As we can see, the variables included in this dataset are: 1. steps : Number of steps taking in a 5-minute interval (missing values are coded as NA) 2. date : The date on which the measurement was taken in YYYY-MM-DD format 3. interval : Identifier for the 5-minute interval in which measurement was taken

Total number of steps taken per day

For this part of the assignment, you can ignore the missing values in the dataset. 1. Calculate the total number of steps taken per day 2. Make a histogram of the total number of steps taken each day 3. Calculate and report the mean and median total number of steps taken per day

1. Number of steps per day

2. Histogram of the total number of steps taken each day

3. Mean and median of total number of steps taken per day

Average daily activity pattern

1. Make a time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis) 2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

1. Time series plot of the 5 minute interval (x) and averaged number of steps taken averaged across all days (y)

2. 5-minute interval (on average across all the days) with the maximum number of steps

Imputing missing values

Note that there are a number of days/intervals where there are missing values (coded as NA). The presence of missing days may introduce bias into some calculations or summaries of the data. 1. Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with NAs) 2. Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc. 3. Create a new dataset that is equal to the original dataset but with the missing data filled in. 4. Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day. Do these values differ from the estimates from the first part of the assignment? What is the impact of imputing missing data on the estimates of the total daily number of steps?

1. Total number of missing values in the dataset

2. Replace missing values The rounded values of the average 5-minute interval is used to replace the NA values. CompleteSteps is the new column without missing values.

3. New dataset that is equal to the original dataset but with the missing data filled in The first ten values of the new dataset are shown below.

4A. Histogram of the total number of steps taken each day with missing data filled in

4B. Calculate and report the mean and median total number of steps taken per day. Do these values differ from the estimates from the first part of the assignment? What is the impact of imputing missing data on the estimates of the total daily number of steps?

Imputing missing data have only a little and transcurable impact on the mean ant the median of the total daily number of steps. Watching the histogram we can note than the only bin that is changed is the interval between 10000 and 12500 steps, grown from a frequency of 18 to a frequency of 26. Different methods for replace missing values could cause different results.

Are there differences in activity patterns between weekdays and weekends?

For this part the weekdays() function may be of some help here. Use the dataset with the filled-in missing values for this part. 1. Create a new factor variable in the dataset with two levels - “weekday” and “weekend” indicating whether a given date is a weekday or weekend day. 2. Make a panel plot containing a time series plot (i.e. type = “l”) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).

1. Create a new factor variable in the dataset with two levels - “weekday” and “weekend” indicating whether a given date is a weekday or weekend day. DayType is the new column indicating if the day is a weekday day or a weekend day: the first ten values of the new table are shown below

2. Two time series plot of the 5-minute interval (x) and the average number of steps taken averaged across weekday days or weekend days (y).

Instantly share code, notes, and snippets.

@mGalarnyk

mGalarnyk / PA1_Template.md

  • Download ZIP
  • Star ( 0 ) 0 You must be signed in to star a gist
  • Fork ( 0 ) 0 You must be signed in to fork a gist
  • Embed Embed this gist in your website.
  • Share Copy sharable link for this gist.
  • Clone via HTTPS Clone using the web URL.
  • Learn more about clone URLs
  • Save mGalarnyk/139ddfa95f17187535ef083fef2a6a8a to your computer and use it in GitHub Desktop.
title author date

github repo for rest of specialization: Data Science Coursera

Introduction

It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit, Nike Fuelband, or Jawbone Up. These type of devices are part of the “quantified self” movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. But these data remain under-utilized both because the raw data are hard to obtain and there is a lack of statistical methods and software for processing and interpreting the data.

This assignment makes use of data from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual collected during the months of October and November, 2012 and include the number of steps taken in 5 minute intervals each day.

The data for this assignment can be downloaded from the course web site:

  • Dataset: Activity monitoring data

The variables included in this dataset are:

steps: Number of steps taking in a 5-minute interval (missing values are coded as 𝙽𝙰) date: The date on which the measurement was taken in YYYY-MM-DD format interval: Identifier for the 5-minute interval in which measurement was taken The dataset is stored in a comma-separated-value (CSV) file and there are a total of 17,568 observations in this dataset.

Loading and preprocessing the data

Unzip data to obtain a csv file.

Reading csv Data into Data.Table.

What is mean total number of steps taken per day.

  • Calculate the total number of steps taken per day
  • If you do not understand the difference between a histogram and a barplot, research the difference between them. Make a histogram of the total number of steps taken each day.

reproducible research course project 1

  • Calculate and report the mean and median of the total number of steps taken per day

What is the average daily activity pattern?

  • Make a time series plot (i.e. 𝚝𝚢𝚙𝚎 = "𝚕") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)

reproducible research course project 1

  • Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?

Imputing missing values

  • Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with 𝙽𝙰s)
  • Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc.
  • Create a new dataset that is equal to the original dataset but with the missing data filled in.
  • Make a histogram of the total number of steps taken each day and calculate and report the mean and median total number of steps taken per day. Do these values differ from the estimates from the first part of the assignment? What is the impact of imputing missing data on the estimates of the total daily number of steps?

reproducible research course project 1

Type of Estimate Mean_Steps Median_Steps
First Part (with na) 10765 10765
Second Part (fillin in na with median) 9354.23 10395

Are there differences in activity patterns between weekdays and weekends?

  • Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.
  • Make a panel plot containing a time series plot (i.e. 𝚝𝚢𝚙𝚎 = "𝚕") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis). See the README file in the GitHub repository to see an example of what this plot should look like using simulated data.

reproducible research course project 1

lafnian1990 commented Dec 15, 2020

Sorry, something went wrong.

lafnian1990 commented Dec 15, 2020 • edited Loading

It is a popular consumer monitoring technique. I know big companies use eye-tracking or smart pick systems. This allows you to track the emotional state and stressful situations when choosing a product. The first site to use the smart gatherer system was an educational resource for students. This is a complex selection and variability of emotions. This system was supposed to show the productivity of the brain in stressful situations.

@finistratbob

finistratbob commented Mar 17, 2021 • edited Loading

It seems to me that you are talking about automation or false hopes. I recommend that you read a few books by Ray Bradbury https://freebooksummary.com/category/dark-they-were-and-golden-eyed or Elon Musk. Futurists will help you answer your questions. I like science fiction stories and it's not a secret. This forum is about programming.

@AngelaLifman

AngelaLifman commented Apr 20, 2021

Peer-graded Assignment: Course Project 1

This is an R Markdown document for “Course Project 1” of “Reproducible Research” in Coursera.

1. Code for reading in the dataset and/or processing the data

Initialization, reading dataset and additional preparation for the next operations.

2. Histogram of the total number of steps taken each day

You can see the histogram of total numer of steps taken each day.The most frequent step range is between 10,000 and 15,000. Also the curve looks like normal distribution. Not so often over 20,000 steps. The R script is simple using “tapply()” for “step” and “date”.

3. Mean and median number of steps taken each day

Mean and median number of steps taken each day are 1.076710^{4} and 10766. The median is rounded off using “round()”.

4. Time series plot of the average number of steps taken

Here are the R script and chart to show the average number of steps for each interval. The script was cleaned up the activity data, and the average steps were plotted for each 5-minute. The intervals are transformed into actual time in advance. Many steps were obseved around 9:00AM because of commuting time probably.

5. The 5-minute interval that, on average, contains the maximum number of steps

The 5-minute interval that, on average, contains the maximum number of steps is 806.

6. Code to describe and show a strategy for imputing missing data

The number of imputing missing data, specified as “NA”“, is 2304.

7. Histogram of the total number of steps taken each day after missing values are imputed

In comparison with the previous histogram shown in #3, this is total number of steps taken each day after missing values are imputed. That means all “NA” are taked into account. There are very small difference between them.

8. Panel plot comparing the average number of steps taken per 5-minute interval across weekdays and weekends

In comparison to #4, here are the R script and chart to show the average number of steps for each interval on weekends and weekdays. We can identify that the activity in the morning of weekdays was very high. Also the commuting timing in the morning may be much more accurate that that of in the evening.

  • Project Assignment 1 - Coursera Course Reproducible Research
  • by Philip Ohlsson
  • Last updated over 9 years ago
  • Hide Comments (–) Share Hide Toolbars

Twitter Facebook Google+

Or copy & paste this link into an email or IM:

IMAGES

  1. GitHub

    reproducible research course project 1

  2. Chapter 1 Introduction to Reproducible Research

    reproducible research course project 1

  3. GitHub

    reproducible research course project 1

  4. Forms: Enroll & Evaluate a Reproducible Research Course on Behance

    reproducible research course project 1

  5. Reproducible Research Fundamentals

    reproducible research course project 1

  6. GitHub

    reproducible research course project 1

VIDEO

  1. PRACTICAL RESEARCH 1

  2. Toward a common language to facilitate reproducible research and technology transfer. Grigori Fursin

  3. Workshop: Creating reproducible research reports using RMarkdown

  4. Emacs + org-mode + python in reproducible research; SciPy 2013 Presentation

  5. OPTIMISE SIMULATION CONFERENCE POSTER PRESENTATIONS

  6. Reproducible Research, week (1-4) All Quiz Answers with Assignments

COMMENTS

  1. Natasha-R/Reproducible-Research-Course-Project-1

    This is my submission for the Coursera assignment, for the Reproducible Research course. The files in this repo are: The R Markdown document, containing the R code and written explanations, can be found in the file PA1_template.Rmd.The same document, in Markdown format, can be found at PA1_template.md, and the HTML version at PA1_template.html.; The "figure" folder contains all of the ...

  2. schen57/Reproducible-Research-Course-Project-1

    ##Assignment Instructions 1.Code for reading in the dataset and/or processing the data 2.Histogram of the total number of steps taken each day 3.Mean and median number of steps taken each day 4.Time series plot of the average number of steps taken 5.The 5-minute interval that, on average, contains the maximum number of steps 6.Code to describe and show a strategy for imputing missing data 7 ...

  3. Course Project 1 Reproducible Research

    Course Project 1 aims to answer questions and accomplish tasks to create a Reproducible Research. The dataset used for this assignment is the repdata_data_activity.zip, which has data about personal movement (steps): 279 kBytes (calculate using pryr package) 3 variables. 17.568 observations.

  4. emilesilvis/reproducible_research_course_project_1

    This project forms part of the Reproducible Research course on Coursera. The goal of this project is to give the student an opportunity to conduct an analysis that is reproducible: "the ability of an entire [study] to be reproduced, etheir by the researcher or by someone else working independently" ( Wikipedia ).

  5. RPubs

    Course Project 1 of Reproducible Research by John Hopkins University on Coursera. by Anjana Ramesh.

  6. Reproducible Research: Course Project 1

    About. Reproducible Research (one of Data Science Specialization courses) Course Project 1 makes use of data from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual collected during the months of October and November, 2012 and include the number of steps taken in 5 ...

  7. Reproducible Research: Course Project 1

    1. Calculate the total number of steps taken per day. 2.If you do not understand the difference between a histogram and a barplot, research the difference between them. Make a histogram of the total number of steps taken each day. 3.Calculate and report the mean and median of the total number of steps taken per day.

  8. Chapter 5 Reproducible research #1

    Here are the basics of opening and rendering an R Markdown file in RStudio: To open a new R Markdown file, go to "File" -> "New File" -> "RMarkdown…" -> for now, chose a "Document" in "HTML" format. This will open a new R Markdown file in RStudio. The file extension for R Markdown files is ".Rmd".

  9. Reproducible-Research-Course-Project-1/Reproducible_Research ...

    First project of the Coursera Course Reproducible Research - schen57/Reproducible-Research-Course-Project-1

  10. Reproducible Research Course Project 1

    For this part of the assignment, you can ignore the missing values in the dataset. 1. Calculate the total number of steps taken per day. 2. Make a histogram of the total number of steps taken each day. 3. Calculate and report the mean and median total number of steps taken per day. 1.

  11. Reproducible Research: Course Project One

    Reproducible Research: Course Project One Adam Blanch 4/23/2020. This has been published at Rpubs to view the html without needing to fork the repositorary or knitr the r markdown file. ... For questions 1 to 3 we found 8 of 61 days contain no activity data. 0 days contain missing data. So we choose to just ignore the missing days.

  12. RPubs

    Sign inRegister. Reproducible Research Week 2 Course Project 1. by Jason Pemberton. Last updatedover 2 years ago. HideComments(-)ShareHide Toolbars. ×. Post on: TwitterFacebookGoogle+. Or copy & paste this link into an email or IM:

  13. RPubs

    RPubs. by RStudio. Sign inRegister. Coursera - Reproducible research - Course project 1. by Julien Balmont. Last updatedalmost 8 years ago. HideComments(-)ShareHide Toolbars. ×. Post on:

  14. Reproducible Research Week 2 Course Project 1

    Reproducible Research Week 2 Course Project 1 Jeff Charatan. Introduction. It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit, Nike Fuelband, or Jawbone Up. These type of devices are part of the "quantified self" movement - a group of enthusiasts who take ...

  15. Coursera / Johns Hopkins Reproducible Research Assignment: Course Project 1

    Saved searches Use saved searches to filter your results more quickly

  16. RPubs

    RPubs - Reproducible Research Course Project 1. by Josias Alvarenga. about 7 years ago.

  17. Reproducible Research. week 2. Course Project 1

    Reproducible Research. week 2. Course Project 1. 0. Turn off scientific notation. options (scipen = 999) 0.1. We load the libraries that we are going to use. packages <- c ('dplyr', #For data manipulation. 'lubridate', #To work with date-times and time-spans. 'ggplot2', #For graphics 'sqldf', #configure and transparently import a database ...

  18. Reproducible Research Project 1 John Hopkins Data Science

    The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc. # Filling in missing values with median of dataset. activityDT [is.na ( steps ), "steps"] <-activityDT [, c (lapply ( .SD, median, na.rm=TRUE )), .SDcols= c ( "steps" )] Create a new dataset that ...

  19. RPubs

    [Course Project 1] Reproducible Research; by Anderson Hitoshi Uyekita; Last updated about 2 years ago; Hide Comments (-) Share Hide Toolbars

  20. Peer-graded Assignment: Course Project 1

    Peer-graded Assignment: Course Project 1. This is an R Markdown document for "Course Project 1" of "Reproducible Research" in Coursera. 1. Code for reading in the dataset and/or processing the data. Initialization, reading dataset and additional preparation for the next operations.

  21. Bharath-kumar-R/Reproducible-Research-Week-2-Course-Project-1

    The data for this assignment can be downloaded from the course web site: Dataset: Activity monitoring data [52K] The variables included in this dataset are: steps: Number of steps taking in a 5-minute interval (missing values are coded as NA). date: The date on which the measurement was taken in YYYY-MM-DD format. interval: Identifier for the 5-minute interval in which measurement was taken

  22. Project Assignment 1

    RPubs - Project Assignment 1 - Coursera Course Reproducible Research. Project Assignment 1 - Coursera Course Reproducible Research. by Philip Ohlsson. Last updated about 9 years ago.