Jordan Tanley 2022-07-10

Outline

This project used the online news popularity dataset with the goal of predicting the number of shares for news articles for 6 different generes. We started with the Social Media genere and eventually expanded the code for all 6.

My contributions involved:

  • reading in the data
  • data cleaning / adding a new weekday variable for EDA
  • text describing interpretations for all EDA
  • 1 contingency table and 3 other numerial summary tables
  • 3 plots: 1 box plot and 2 scatterplots
  • set up the training and testing subsets
  • 1 Linear model and explanation
  • Random Forest model and explanation
  • completed and automated the Model comparisons

what would you do differently?

I feel like this project was more straightforward compared to the previous project. There wasn’t much room for confusion, which I appreciated. Looking back, I would’ve liked to have gotten an earlier start on the project - don’t get me wrong, we finished it plenty early, but with my other courses I’m currently taking, it would’ve been nice to have this project completed and out of the way for other projects that were also due this week. Overall though, I feel like I gave this project my full focus and there’s not much else I’d change.

what was the most difficult part for you?

The most difficult part for my portion of the project was the comparisons - even though it wasn’t that difficult once I wrapped my head around what I needed to do.

what are your big take-aways from this project?

I think it’s cool to see the several different methods for prediction models all in action and being able to compare them all. I can see how this is helpful in a lot of fields.

Links


<
Previous Post
3rd Blog Post
>
Next Post
4th Blog Post