Tanley-Wood-Project2
Jordan Tanley and Jonathan Wood 2022-07-05
Introduction - Jonathan
Data
The data in this analysis will be the online news popularity dataset. This data has a set of features on articles from Mashable.com over a two year period.
The goal of this project is to determine the number of shares (how many times the article was shared over social media) the article has. We will use this information to predict if an article can be popular by the number of shares.
Notable Variables
While there are 61 variables in the data set, we will not use all of them for this project. The notable variables are the following:
- “shares” - the number of shares the article has gotten over social media. This is the label or variable we want our models to predict for new articles
- “data_channel_is” - a set of variables that tells if the article is in a particular category, such as business, sports, or lifestyle.
- “weekday_is” - a set of variables that tells what day of the week the article was published on.
- “num_keywords” - the number of keywords within the article
- “num_images” - the number of images within the article
- “num_videos” - the number of videos within the article
Methods
Multiple methods will be used for this project to predict the number of shares a new article can generate, including
- Linear regression
- Tree-based models
- Random forest
- Boosted tree
Data - Jordan
In order to read in the data using a relative path, be sure to have the data file saved in your working directory.
# read in the data
news <- read_csv("OnlineNewsPopularity/OnlineNewsPopularity.csv")
## Rows: 39644 Columns: 61
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): url
## dbl (60): timedelta, n_tokens_title, n_tokens_content, n_unique_tokens, n_non_stop_words, n_non_stop_unique_token...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# sneek peek at the dataset
head(news)
# Creating a weekday variable (basically undoing the 7 dummy variables that came with the data) for EDA
news$weekday <- ifelse(news$weekday_is_friday == 1, "Friday",
ifelse(news$weekday_is_monday == 1, "Monday",
ifelse(news$weekday_is_tuesday == 1, "Tuesday",
ifelse(news$weekday_is_wednesday == 1, "Wednesday",
ifelse(news$weekday_is_thursday == 1, "Thursday",
ifelse(news$weekday_is_saturday == 1, "Saturday",
"Sunday"))))))
Next, let’s subset the data so that we can only look at the data channel of interest. We will look at articles with the “Social Media” data channel.
# Subset the data to one of the parameterized data channels and drop unnecessary variables
chan <- paste0("data_channel_is_", params$channel)
print(chan)
## [1] "data_channel_is_world"
filtered_channel <- news %>%
as_tibble() %>%
filter(news[chan] == 1) %>%
select(-c(url, timedelta))
# take a peek at the data
filtered_channel %>%
select(ends_with(chan))
Summarizations - Both (at least 3 plots each)
For the numerical summaries, we can look at several aspects. Contingency tables allow us to examine frequencies of categorical variables. The first output below, for example, shows the counts for each weekday. Similarly, the fifth table outputted shows the frequencies of number of tokens in the article content. Another set of summary statistics to look at are the 5 Number Summaries. These provide the minmum, 1st quantile, median, 3rd quantile, and maximum for a particular variable. Additionally, it may also be helful to look at the average. These are helpful in determining the skewness (if mean = median vs. mean < or > median) and helps in looking for outliers (anything outside (Q3 - Q1)1.5 from the median is generally considered an outlier). Below, the 5 Number summaries (plus mean) are shown for Shares, Number of words in the content, Number of words in the content for the upper quantile of Shares, number of images in the article, number of videos in the article, positive word rate, and negative word rate.
# Contingency table of frequencies for days of the week, added caption for clarity
kable(table(filtered_channel$weekday),
col.names = c("Weekday", "Frequency"),
caption = "Contingency table of frequencies for days of the week")
Weekday | Frequency |
---|---|
Friday | 1305 |
Monday | 1356 |
Saturday | 519 |
Sunday | 567 |
Thursday | 1569 |
Tuesday | 1546 |
Wednesday | 1565 |
Contingency table of frequencies for days of the week
# Numerical Summary of Shares, added caption for clarity
filtered_channel %>% summarise(Minimum = min(shares),
Q1 = quantile(shares, prob = 0.25),
Average = mean(shares),
Median = median(shares),
Q3 = quantile(shares, prob = 0.75),
Maximum = max(shares)) %>%
kable(caption = "Numerical Summary of Shares")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
35 | 827 | 2287.734 | 1100 | 1900 | 284700 |
Numerical Summary of Shares
# Numerical Summary of Number of words in the content, added caption for clarity
filtered_channel %>% summarise(Minimum = min(n_tokens_content),
Q1 = quantile(n_tokens_content, prob = 0.25),
Average = mean(n_tokens_content),
Median = median(n_tokens_content),
Q3 = quantile(n_tokens_content, prob = 0.75),
Maximum = max(n_tokens_content)) %>%
kable(caption = "Numerical Summary of Number of words in the content")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 332 | 597.2814 | 509 | 768 | 7081 |
Numerical Summary of Number of words in the content
# Numerical Summary of Number of words in the content for the upper quantile of Shares, added caption for clarity
filtered_channel %>% filter(shares > quantile(shares, prob = 0.75)) %>%
summarise(Minimum = min(n_tokens_content),
Q1 = quantile(n_tokens_content, prob = 0.25),
Average = mean(n_tokens_content),
Median = median(n_tokens_content),
Q3 = quantile(n_tokens_content, prob = 0.75),
Maximum = max(n_tokens_content)) %>%
kable(caption = "Numerical Summary of Number of words in the content for the upper quantile of Shares")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 303 | 598.1955 | 476 | 761 | 4661 |
Numerical Summary of Number of words in the content for the upper quantile of Shares
kable(table(filtered_channel$n_tokens_content),
col.names = c("Tokens", "Frequency"),
caption = "Contingency table of frequencies for number of tokens in the article content")
Tokens | Frequency |
---|---|
0 | 259 |
29 | 1 |
32 | 1 |
34 | 2 |
37 | 1 |
39 | 1 |
41 | 2 |
42 | 2 |
47 | 1 |
48 | 1 |
51 | 1 |
53 | 1 |
55 | 1 |
56 | 1 |
57 | 2 |
59 | 1 |
60 | 1 |
63 | 2 |
64 | 1 |
65 | 1 |
71 | 2 |
72 | 1 |
73 | 2 |
75 | 1 |
76 | 1 |
77 | 1 |
79 | 1 |
80 | 2 |
82 | 1 |
83 | 1 |
84 | 1 |
85 | 2 |
87 | 2 |
90 | 1 |
91 | 1 |
92 | 1 |
94 | 1 |
95 | 2 |
96 | 3 |
97 | 2 |
98 | 3 |
100 | 1 |
101 | 1 |
102 | 4 |
103 | 4 |
104 | 3 |
105 | 2 |
107 | 3 |
108 | 2 |
109 | 4 |
110 | 1 |
111 | 5 |
112 | 1 |
113 | 3 |
114 | 8 |
115 | 2 |
116 | 3 |
117 | 2 |
118 | 6 |
119 | 1 |
120 | 1 |
121 | 5 |
122 | 2 |
123 | 2 |
124 | 1 |
125 | 3 |
126 | 6 |
127 | 4 |
128 | 4 |
129 | 2 |
130 | 2 |
131 | 2 |
132 | 2 |
134 | 4 |
135 | 6 |
136 | 3 |
137 | 4 |
138 | 6 |
139 | 3 |
140 | 3 |
141 | 4 |
142 | 5 |
143 | 6 |
144 | 6 |
145 | 1 |
146 | 2 |
147 | 7 |
148 | 6 |
149 | 8 |
150 | 3 |
151 | 3 |
152 | 1 |
153 | 7 |
154 | 5 |
155 | 7 |
156 | 9 |
157 | 6 |
158 | 4 |
159 | 5 |
160 | 2 |
161 | 4 |
162 | 5 |
163 | 2 |
164 | 5 |
165 | 6 |
166 | 6 |
167 | 3 |
168 | 3 |
169 | 2 |
170 | 3 |
171 | 8 |
172 | 4 |
173 | 1 |
174 | 2 |
175 | 8 |
176 | 3 |
177 | 2 |
178 | 2 |
179 | 5 |
180 | 6 |
181 | 8 |
182 | 4 |
184 | 7 |
185 | 4 |
186 | 6 |
187 | 10 |
188 | 3 |
189 | 3 |
190 | 11 |
191 | 7 |
192 | 7 |
193 | 5 |
194 | 7 |
195 | 3 |
196 | 9 |
197 | 5 |
198 | 6 |
199 | 8 |
200 | 6 |
201 | 3 |
202 | 3 |
203 | 6 |
204 | 6 |
205 | 9 |
206 | 6 |
207 | 9 |
208 | 7 |
209 | 13 |
210 | 4 |
211 | 7 |
212 | 9 |
213 | 6 |
214 | 8 |
215 | 9 |
216 | 4 |
217 | 7 |
218 | 5 |
219 | 5 |
220 | 7 |
221 | 12 |
222 | 11 |
223 | 11 |
224 | 8 |
225 | 18 |
226 | 10 |
227 | 5 |
228 | 10 |
229 | 13 |
230 | 12 |
231 | 13 |
232 | 8 |
233 | 9 |
234 | 12 |
235 | 7 |
236 | 14 |
237 | 11 |
238 | 11 |
239 | 14 |
240 | 10 |
241 | 10 |
242 | 9 |
243 | 2 |
244 | 6 |
245 | 7 |
246 | 9 |
247 | 10 |
248 | 9 |
249 | 10 |
250 | 9 |
251 | 10 |
252 | 8 |
253 | 13 |
254 | 7 |
255 | 11 |
256 | 10 |
257 | 4 |
258 | 14 |
259 | 12 |
260 | 9 |
261 | 13 |
262 | 9 |
263 | 12 |
264 | 8 |
265 | 7 |
266 | 6 |
267 | 7 |
268 | 7 |
269 | 9 |
270 | 9 |
271 | 11 |
272 | 10 |
273 | 11 |
274 | 7 |
275 | 12 |
276 | 16 |
277 | 7 |
278 | 7 |
279 | 16 |
280 | 11 |
281 | 17 |
282 | 10 |
283 | 11 |
284 | 9 |
285 | 11 |
286 | 15 |
287 | 7 |
288 | 10 |
289 | 11 |
290 | 7 |
291 | 8 |
292 | 18 |
293 | 9 |
294 | 13 |
295 | 15 |
296 | 11 |
297 | 9 |
298 | 13 |
299 | 16 |
300 | 11 |
301 | 13 |
302 | 11 |
303 | 19 |
304 | 11 |
305 | 9 |
306 | 14 |
307 | 13 |
308 | 9 |
309 | 8 |
310 | 8 |
311 | 14 |
312 | 10 |
313 | 15 |
314 | 8 |
315 | 19 |
316 | 17 |
317 | 19 |
318 | 11 |
319 | 8 |
320 | 19 |
321 | 10 |
322 | 12 |
323 | 17 |
324 | 13 |
325 | 16 |
326 | 10 |
327 | 17 |
328 | 16 |
329 | 14 |
330 | 15 |
331 | 10 |
332 | 14 |
333 | 17 |
334 | 10 |
335 | 16 |
336 | 16 |
337 | 13 |
338 | 15 |
339 | 13 |
340 | 14 |
341 | 13 |
342 | 13 |
343 | 9 |
344 | 13 |
345 | 11 |
346 | 14 |
347 | 12 |
348 | 15 |
349 | 11 |
350 | 15 |
351 | 12 |
352 | 11 |
353 | 8 |
354 | 15 |
355 | 15 |
356 | 9 |
357 | 14 |
358 | 19 |
359 | 6 |
360 | 11 |
361 | 14 |
362 | 15 |
363 | 17 |
364 | 14 |
365 | 8 |
366 | 11 |
367 | 12 |
368 | 14 |
369 | 10 |
370 | 12 |
371 | 9 |
372 | 13 |
373 | 11 |
374 | 13 |
375 | 16 |
376 | 14 |
377 | 13 |
378 | 8 |
379 | 13 |
380 | 16 |
381 | 11 |
382 | 8 |
383 | 13 |
384 | 14 |
385 | 13 |
386 | 10 |
387 | 7 |
388 | 14 |
389 | 7 |
390 | 15 |
391 | 17 |
392 | 18 |
393 | 17 |
394 | 13 |
395 | 20 |
396 | 10 |
397 | 13 |
398 | 7 |
399 | 13 |
400 | 9 |
401 | 13 |
402 | 11 |
403 | 11 |
404 | 18 |
405 | 11 |
406 | 12 |
407 | 7 |
408 | 15 |
409 | 9 |
410 | 13 |
411 | 13 |
412 | 6 |
413 | 14 |
414 | 7 |
415 | 15 |
416 | 16 |
417 | 16 |
418 | 15 |
419 | 13 |
420 | 12 |
421 | 14 |
422 | 9 |
423 | 10 |
424 | 13 |
425 | 12 |
426 | 4 |
427 | 8 |
428 | 13 |
429 | 17 |
430 | 7 |
431 | 12 |
432 | 14 |
433 | 11 |
434 | 17 |
435 | 15 |
436 | 14 |
437 | 20 |
438 | 13 |
439 | 10 |
440 | 12 |
441 | 16 |
442 | 10 |
443 | 11 |
444 | 15 |
445 | 15 |
446 | 12 |
447 | 10 |
448 | 10 |
449 | 13 |
450 | 15 |
451 | 13 |
452 | 15 |
453 | 19 |
454 | 11 |
455 | 6 |
456 | 9 |
457 | 8 |
458 | 10 |
459 | 12 |
460 | 22 |
461 | 16 |
462 | 6 |
463 | 10 |
464 | 11 |
465 | 10 |
466 | 4 |
467 | 7 |
468 | 11 |
469 | 11 |
470 | 11 |
471 | 8 |
472 | 15 |
473 | 11 |
474 | 7 |
475 | 9 |
476 | 15 |
477 | 13 |
478 | 3 |
479 | 10 |
480 | 8 |
481 | 8 |
482 | 7 |
483 | 9 |
484 | 11 |
485 | 5 |
486 | 12 |
487 | 12 |
488 | 13 |
489 | 10 |
490 | 17 |
491 | 7 |
492 | 10 |
493 | 12 |
494 | 6 |
495 | 12 |
496 | 12 |
497 | 5 |
498 | 12 |
499 | 18 |
500 | 7 |
501 | 17 |
502 | 14 |
503 | 8 |
504 | 10 |
505 | 13 |
506 | 11 |
507 | 9 |
508 | 11 |
509 | 18 |
510 | 10 |
511 | 8 |
512 | 10 |
513 | 14 |
514 | 9 |
515 | 12 |
516 | 12 |
517 | 13 |
518 | 11 |
519 | 13 |
520 | 13 |
521 | 5 |
522 | 13 |
523 | 7 |
524 | 6 |
525 | 18 |
526 | 16 |
527 | 13 |
528 | 5 |
529 | 8 |
530 | 13 |
531 | 18 |
532 | 12 |
533 | 13 |
534 | 12 |
535 | 5 |
536 | 8 |
537 | 14 |
538 | 7 |
539 | 12 |
540 | 11 |
541 | 11 |
542 | 8 |
543 | 12 |
544 | 11 |
545 | 8 |
546 | 10 |
547 | 8 |
548 | 12 |
549 | 7 |
550 | 14 |
551 | 9 |
552 | 13 |
553 | 10 |
554 | 15 |
555 | 7 |
556 | 11 |
557 | 7 |
558 | 7 |
559 | 13 |
560 | 14 |
561 | 6 |
562 | 14 |
563 | 4 |
564 | 14 |
565 | 9 |
566 | 7 |
567 | 5 |
568 | 9 |
569 | 9 |
570 | 13 |
571 | 10 |
572 | 8 |
573 | 12 |
574 | 6 |
575 | 11 |
576 | 10 |
577 | 13 |
578 | 6 |
579 | 12 |
580 | 10 |
581 | 9 |
582 | 8 |
583 | 6 |
584 | 2 |
585 | 8 |
586 | 8 |
587 | 9 |
588 | 11 |
589 | 6 |
590 | 11 |
591 | 5 |
592 | 8 |
593 | 14 |
594 | 10 |
595 | 9 |
596 | 8 |
597 | 6 |
598 | 8 |
599 | 5 |
600 | 5 |
601 | 8 |
602 | 17 |
603 | 13 |
604 | 10 |
605 | 9 |
606 | 9 |
607 | 4 |
608 | 6 |
609 | 12 |
610 | 8 |
611 | 15 |
612 | 8 |
613 | 14 |
614 | 10 |
615 | 14 |
616 | 13 |
617 | 6 |
618 | 4 |
619 | 5 |
620 | 5 |
621 | 6 |
622 | 8 |
623 | 6 |
624 | 5 |
625 | 5 |
626 | 7 |
627 | 7 |
628 | 11 |
629 | 8 |
630 | 8 |
631 | 8 |
632 | 8 |
633 | 9 |
634 | 6 |
635 | 4 |
636 | 8 |
637 | 6 |
638 | 7 |
639 | 9 |
640 | 7 |
641 | 6 |
642 | 20 |
643 | 9 |
644 | 7 |
645 | 10 |
646 | 4 |
647 | 10 |
648 | 16 |
649 | 8 |
650 | 3 |
651 | 6 |
652 | 12 |
653 | 8 |
654 | 10 |
655 | 8 |
656 | 10 |
657 | 11 |
658 | 10 |
659 | 5 |
660 | 6 |
661 | 7 |
662 | 8 |
663 | 6 |
664 | 5 |
665 | 6 |
666 | 4 |
667 | 9 |
668 | 4 |
669 | 6 |
670 | 7 |
671 | 4 |
672 | 5 |
673 | 6 |
674 | 5 |
675 | 5 |
676 | 8 |
677 | 11 |
678 | 5 |
679 | 2 |
680 | 9 |
681 | 7 |
682 | 14 |
683 | 7 |
684 | 6 |
685 | 3 |
686 | 11 |
687 | 4 |
688 | 6 |
689 | 4 |
690 | 11 |
691 | 4 |
692 | 5 |
693 | 7 |
694 | 4 |
695 | 13 |
696 | 9 |
697 | 7 |
698 | 7 |
699 | 6 |
700 | 4 |
701 | 8 |
702 | 5 |
703 | 9 |
704 | 6 |
705 | 4 |
706 | 5 |
707 | 8 |
708 | 10 |
709 | 5 |
710 | 10 |
711 | 9 |
712 | 4 |
713 | 8 |
714 | 6 |
715 | 7 |
716 | 11 |
717 | 5 |
718 | 7 |
719 | 6 |
720 | 5 |
721 | 6 |
722 | 7 |
723 | 10 |
724 | 11 |
725 | 6 |
726 | 8 |
727 | 3 |
728 | 7 |
729 | 5 |
730 | 9 |
731 | 10 |
732 | 6 |
733 | 7 |
734 | 2 |
735 | 10 |
736 | 4 |
737 | 8 |
738 | 7 |
739 | 10 |
740 | 4 |
741 | 7 |
742 | 5 |
743 | 3 |
744 | 12 |
745 | 7 |
746 | 2 |
747 | 5 |
748 | 8 |
749 | 4 |
750 | 9 |
751 | 6 |
752 | 3 |
753 | 5 |
754 | 6 |
755 | 6 |
756 | 5 |
757 | 8 |
758 | 2 |
759 | 3 |
760 | 5 |
761 | 11 |
762 | 5 |
763 | 5 |
764 | 5 |
765 | 8 |
766 | 1 |
767 | 2 |
768 | 4 |
769 | 6 |
770 | 5 |
771 | 5 |
772 | 11 |
773 | 7 |
774 | 3 |
775 | 1 |
776 | 8 |
777 | 5 |
778 | 6 |
779 | 3 |
780 | 4 |
781 | 6 |
782 | 3 |
783 | 6 |
784 | 7 |
785 | 9 |
786 | 5 |
787 | 11 |
788 | 7 |
789 | 5 |
790 | 9 |
791 | 6 |
792 | 9 |
793 | 6 |
794 | 7 |
795 | 6 |
796 | 7 |
797 | 7 |
798 | 8 |
799 | 9 |
800 | 7 |
801 | 4 |
802 | 9 |
803 | 5 |
804 | 12 |
805 | 3 |
806 | 6 |
807 | 4 |
808 | 7 |
809 | 7 |
810 | 8 |
811 | 7 |
812 | 8 |
813 | 3 |
814 | 5 |
815 | 7 |
816 | 3 |
817 | 4 |
818 | 6 |
819 | 7 |
820 | 2 |
821 | 7 |
822 | 1 |
823 | 6 |
824 | 5 |
825 | 7 |
826 | 9 |
827 | 5 |
828 | 7 |
829 | 4 |
830 | 6 |
831 | 2 |
832 | 5 |
833 | 2 |
834 | 6 |
835 | 5 |
836 | 3 |
837 | 2 |
838 | 11 |
839 | 4 |
840 | 4 |
841 | 5 |
842 | 7 |
843 | 3 |
844 | 4 |
845 | 2 |
846 | 3 |
847 | 8 |
848 | 6 |
849 | 3 |
850 | 5 |
851 | 4 |
852 | 8 |
853 | 2 |
854 | 7 |
855 | 1 |
856 | 2 |
857 | 2 |
858 | 5 |
859 | 1 |
860 | 8 |
861 | 6 |
862 | 6 |
863 | 5 |
864 | 4 |
865 | 4 |
866 | 4 |
867 | 3 |
869 | 4 |
870 | 4 |
871 | 3 |
872 | 2 |
873 | 6 |
874 | 9 |
875 | 2 |
876 | 2 |
877 | 8 |
878 | 9 |
879 | 2 |
880 | 3 |
881 | 1 |
882 | 3 |
883 | 2 |
884 | 3 |
885 | 4 |
886 | 4 |
887 | 3 |
888 | 3 |
889 | 4 |
890 | 5 |
891 | 3 |
892 | 6 |
893 | 10 |
894 | 2 |
895 | 7 |
896 | 3 |
897 | 3 |
898 | 3 |
899 | 9 |
900 | 5 |
901 | 4 |
902 | 6 |
903 | 1 |
904 | 4 |
905 | 1 |
906 | 6 |
908 | 4 |
909 | 1 |
910 | 2 |
911 | 3 |
912 | 3 |
913 | 7 |
914 | 5 |
915 | 3 |
916 | 3 |
917 | 4 |
918 | 6 |
919 | 6 |
920 | 3 |
921 | 7 |
922 | 2 |
923 | 6 |
924 | 4 |
925 | 3 |
926 | 3 |
927 | 3 |
928 | 1 |
929 | 5 |
930 | 5 |
931 | 2 |
932 | 4 |
933 | 1 |
934 | 3 |
935 | 1 |
936 | 3 |
937 | 2 |
938 | 5 |
939 | 3 |
940 | 5 |
941 | 3 |
942 | 1 |
943 | 6 |
944 | 6 |
945 | 4 |
946 | 4 |
947 | 6 |
948 | 4 |
949 | 3 |
950 | 1 |
951 | 6 |
952 | 3 |
953 | 2 |
954 | 5 |
955 | 5 |
956 | 7 |
957 | 4 |
958 | 6 |
959 | 2 |
960 | 2 |
961 | 7 |
962 | 4 |
963 | 4 |
964 | 5 |
965 | 6 |
966 | 4 |
967 | 2 |
968 | 2 |
969 | 5 |
970 | 2 |
971 | 4 |
972 | 3 |
973 | 4 |
974 | 7 |
975 | 6 |
976 | 3 |
977 | 5 |
978 | 4 |
979 | 6 |
980 | 9 |
981 | 1 |
982 | 2 |
983 | 3 |
984 | 4 |
985 | 1 |
986 | 3 |
987 | 1 |
988 | 3 |
989 | 5 |
990 | 5 |
991 | 5 |
992 | 7 |
994 | 3 |
995 | 2 |
996 | 2 |
997 | 5 |
998 | 2 |
1000 | 6 |
1001 | 2 |
1002 | 2 |
1004 | 3 |
1005 | 4 |
1006 | 2 |
1008 | 3 |
1009 | 1 |
1010 | 5 |
1011 | 4 |
1012 | 6 |
1013 | 3 |
1014 | 4 |
1015 | 2 |
1016 | 4 |
1017 | 2 |
1018 | 2 |
1019 | 5 |
1020 | 1 |
1021 | 2 |
1022 | 2 |
1023 | 2 |
1024 | 3 |
1025 | 3 |
1026 | 7 |
1027 | 1 |
1028 | 2 |
1029 | 5 |
1030 | 3 |
1031 | 2 |
1032 | 3 |
1033 | 1 |
1034 | 4 |
1035 | 1 |
1037 | 8 |
1038 | 2 |
1039 | 2 |
1040 | 5 |
1041 | 4 |
1042 | 8 |
1043 | 4 |
1044 | 3 |
1045 | 4 |
1046 | 3 |
1047 | 2 |
1048 | 1 |
1049 | 5 |
1050 | 5 |
1051 | 3 |
1052 | 2 |
1053 | 2 |
1054 | 1 |
1055 | 2 |
1056 | 3 |
1057 | 4 |
1058 | 1 |
1059 | 3 |
1060 | 3 |
1061 | 4 |
1062 | 2 |
1063 | 2 |
1065 | 7 |
1066 | 4 |
1067 | 4 |
1068 | 5 |
1069 | 5 |
1070 | 3 |
1071 | 5 |
1072 | 2 |
1073 | 1 |
1074 | 4 |
1076 | 5 |
1077 | 3 |
1078 | 3 |
1079 | 1 |
1080 | 3 |
1081 | 4 |
1082 | 3 |
1083 | 2 |
1084 | 1 |
1085 | 4 |
1086 | 2 |
1087 | 1 |
1088 | 3 |
1089 | 4 |
1090 | 3 |
1091 | 2 |
1092 | 4 |
1093 | 3 |
1094 | 3 |
1095 | 1 |
1096 | 7 |
1097 | 2 |
1098 | 3 |
1100 | 3 |
1101 | 3 |
1104 | 2 |
1105 | 1 |
1106 | 3 |
1108 | 2 |
1109 | 2 |
1110 | 5 |
1111 | 2 |
1112 | 3 |
1113 | 3 |
1114 | 2 |
1115 | 2 |
1116 | 3 |
1117 | 2 |
1118 | 3 |
1119 | 2 |
1120 | 1 |
1121 | 4 |
1122 | 1 |
1123 | 3 |
1124 | 1 |
1125 | 1 |
1126 | 1 |
1127 | 2 |
1128 | 4 |
1129 | 3 |
1131 | 3 |
1132 | 1 |
1133 | 3 |
1134 | 4 |
1135 | 1 |
1136 | 2 |
1137 | 3 |
1138 | 2 |
1139 | 2 |
1140 | 2 |
1141 | 2 |
1142 | 1 |
1143 | 1 |
1144 | 1 |
1145 | 4 |
1146 | 1 |
1147 | 2 |
1148 | 2 |
1149 | 1 |
1150 | 2 |
1151 | 5 |
1152 | 1 |
1153 | 1 |
1154 | 2 |
1155 | 5 |
1156 | 1 |
1158 | 3 |
1159 | 1 |
1161 | 2 |
1162 | 5 |
1163 | 2 |
1164 | 2 |
1165 | 1 |
1166 | 1 |
1167 | 2 |
1169 | 4 |
1172 | 3 |
1174 | 4 |
1175 | 2 |
1176 | 2 |
1177 | 3 |
1178 | 1 |
1179 | 1 |
1180 | 2 |
1181 | 1 |
1182 | 4 |
1183 | 2 |
1184 | 1 |
1185 | 1 |
1186 | 1 |
1187 | 2 |
1188 | 3 |
1189 | 1 |
1190 | 1 |
1191 | 2 |
1193 | 3 |
1194 | 1 |
1195 | 1 |
1196 | 1 |
1197 | 5 |
1198 | 2 |
1199 | 3 |
1200 | 1 |
1202 | 3 |
1204 | 2 |
1205 | 3 |
1206 | 2 |
1207 | 5 |
1208 | 1 |
1209 | 3 |
1210 | 1 |
1211 | 2 |
1212 | 1 |
1213 | 2 |
1214 | 3 |
1215 | 3 |
1216 | 1 |
1217 | 2 |
1218 | 1 |
1219 | 2 |
1220 | 1 |
1222 | 4 |
1223 | 3 |
1224 | 1 |
1225 | 1 |
1226 | 3 |
1228 | 3 |
1229 | 1 |
1230 | 1 |
1231 | 1 |
1232 | 1 |
1233 | 3 |
1234 | 1 |
1235 | 1 |
1237 | 1 |
1238 | 1 |
1239 | 1 |
1240 | 1 |
1242 | 1 |
1244 | 1 |
1245 | 1 |
1246 | 1 |
1247 | 1 |
1248 | 3 |
1249 | 1 |
1250 | 2 |
1251 | 4 |
1252 | 3 |
1253 | 3 |
1254 | 1 |
1255 | 2 |
1256 | 1 |
1257 | 2 |
1258 | 3 |
1259 | 1 |
1260 | 1 |
1261 | 3 |
1262 | 1 |
1265 | 1 |
1266 | 2 |
1267 | 1 |
1269 | 1 |
1270 | 1 |
1274 | 2 |
1275 | 2 |
1279 | 1 |
1280 | 1 |
1281 | 1 |
1282 | 1 |
1284 | 1 |
1286 | 1 |
1287 | 3 |
1288 | 1 |
1289 | 1 |
1290 | 1 |
1291 | 1 |
1292 | 1 |
1293 | 2 |
1294 | 3 |
1295 | 1 |
1296 | 2 |
1298 | 1 |
1299 | 1 |
1300 | 3 |
1302 | 1 |
1303 | 2 |
1305 | 2 |
1306 | 1 |
1307 | 2 |
1309 | 3 |
1310 | 1 |
1312 | 2 |
1315 | 1 |
1316 | 2 |
1318 | 1 |
1319 | 1 |
1320 | 2 |
1324 | 1 |
1325 | 2 |
1330 | 2 |
1331 | 1 |
1332 | 1 |
1333 | 1 |
1334 | 1 |
1335 | 2 |
1337 | 2 |
1339 | 1 |
1340 | 1 |
1341 | 3 |
1343 | 2 |
1344 | 3 |
1350 | 1 |
1351 | 2 |
1352 | 1 |
1356 | 2 |
1357 | 1 |
1358 | 3 |
1360 | 1 |
1361 | 1 |
1363 | 1 |
1368 | 1 |
1370 | 2 |
1372 | 3 |
1373 | 3 |
1374 | 2 |
1375 | 2 |
1376 | 2 |
1377 | 4 |
1378 | 1 |
1379 | 1 |
1380 | 1 |
1382 | 2 |
1383 | 3 |
1384 | 2 |
1388 | 1 |
1389 | 1 |
1390 | 1 |
1392 | 3 |
1394 | 1 |
1396 | 1 |
1397 | 1 |
1398 | 1 |
1400 | 2 |
1401 | 1 |
1405 | 1 |
1406 | 2 |
1409 | 1 |
1410 | 2 |
1411 | 1 |
1412 | 1 |
1413 | 1 |
1414 | 1 |
1416 | 1 |
1417 | 3 |
1418 | 2 |
1421 | 2 |
1423 | 1 |
1424 | 1 |
1427 | 1 |
1428 | 5 |
1431 | 2 |
1432 | 1 |
1433 | 2 |
1434 | 1 |
1436 | 2 |
1439 | 2 |
1440 | 1 |
1441 | 2 |
1443 | 1 |
1445 | 1 |
1446 | 2 |
1447 | 3 |
1449 | 2 |
1451 | 1 |
1453 | 1 |
1456 | 1 |
1457 | 1 |
1458 | 1 |
1462 | 1 |
1463 | 1 |
1464 | 1 |
1465 | 1 |
1468 | 1 |
1469 | 1 |
1472 | 1 |
1473 | 2 |
1476 | 1 |
1477 | 1 |
1481 | 1 |
1483 | 1 |
1484 | 1 |
1486 | 1 |
1489 | 1 |
1491 | 1 |
1492 | 1 |
1494 | 1 |
1497 | 1 |
1498 | 1 |
1499 | 1 |
1500 | 1 |
1501 | 1 |
1503 | 1 |
1504 | 1 |
1506 | 1 |
1515 | 1 |
1516 | 1 |
1517 | 1 |
1520 | 3 |
1521 | 1 |
1522 | 3 |
1524 | 2 |
1531 | 1 |
1532 | 1 |
1536 | 3 |
1537 | 2 |
1540 | 1 |
1541 | 1 |
1542 | 1 |
1543 | 1 |
1552 | 1 |
1553 | 2 |
1558 | 1 |
1567 | 1 |
1568 | 1 |
1569 | 2 |
1571 | 2 |
1572 | 1 |
1577 | 1 |
1579 | 1 |
1580 | 1 |
1582 | 1 |
1589 | 1 |
1598 | 1 |
1605 | 1 |
1608 | 2 |
1609 | 3 |
1612 | 4 |
1614 | 1 |
1616 | 1 |
1620 | 3 |
1625 | 1 |
1627 | 1 |
1632 | 1 |
1634 | 1 |
1637 | 2 |
1638 | 1 |
1640 | 2 |
1642 | 1 |
1643 | 1 |
1650 | 2 |
1651 | 1 |
1653 | 2 |
1656 | 2 |
1665 | 1 |
1668 | 1 |
1669 | 1 |
1670 | 1 |
1671 | 1 |
1678 | 1 |
1682 | 1 |
1684 | 1 |
1687 | 2 |
1690 | 2 |
1691 | 1 |
1694 | 2 |
1696 | 2 |
1699 | 1 |
1708 | 1 |
1709 | 2 |
1714 | 1 |
1719 | 1 |
1721 | 3 |
1728 | 1 |
1737 | 1 |
1744 | 1 |
1747 | 1 |
1751 | 1 |
1757 | 1 |
1765 | 2 |
1767 | 1 |
1771 | 1 |
1776 | 1 |
1786 | 1 |
1789 | 1 |
1800 | 1 |
1807 | 1 |
1814 | 2 |
1816 | 1 |
1821 | 1 |
1823 | 2 |
1824 | 1 |
1825 | 1 |
1827 | 1 |
1831 | 1 |
1834 | 1 |
1837 | 1 |
1844 | 1 |
1848 | 1 |
1864 | 1 |
1881 | 1 |
1885 | 1 |
1888 | 1 |
1891 | 2 |
1894 | 2 |
1896 | 1 |
1898 | 1 |
1903 | 1 |
1905 | 1 |
1920 | 1 |
1921 | 1 |
1922 | 1 |
1924 | 1 |
1938 | 1 |
1949 | 1 |
1959 | 1 |
1961 | 1 |
1964 | 1 |
1973 | 1 |
1977 | 2 |
1982 | 1 |
1985 | 1 |
1995 | 1 |
2004 | 1 |
2011 | 1 |
2028 | 2 |
2037 | 1 |
2040 | 1 |
2042 | 1 |
2047 | 1 |
2048 | 1 |
2066 | 1 |
2068 | 1 |
2069 | 1 |
2078 | 1 |
2088 | 1 |
2093 | 1 |
2095 | 1 |
2096 | 1 |
2105 | 1 |
2117 | 1 |
2133 | 1 |
2136 | 1 |
2137 | 1 |
2142 | 1 |
2156 | 1 |
2164 | 1 |
2183 | 1 |
2189 | 1 |
2200 | 1 |
2204 | 1 |
2237 | 1 |
2251 | 1 |
2255 | 1 |
2258 | 1 |
2314 | 1 |
2318 | 1 |
2324 | 1 |
2325 | 1 |
2334 | 1 |
2361 | 1 |
2365 | 1 |
2380 | 1 |
2385 | 1 |
2397 | 1 |
2399 | 1 |
2421 | 1 |
2424 | 1 |
2432 | 1 |
2439 | 1 |
2470 | 1 |
2490 | 1 |
2495 | 1 |
2513 | 1 |
2517 | 1 |
2521 | 1 |
2523 | 2 |
2556 | 1 |
2610 | 1 |
2649 | 1 |
2651 | 1 |
2689 | 2 |
2704 | 1 |
2708 | 1 |
2727 | 1 |
2732 | 1 |
2747 | 1 |
2793 | 1 |
2844 | 1 |
2889 | 1 |
2951 | 1 |
2957 | 1 |
2958 | 1 |
2997 | 1 |
3037 | 1 |
3125 | 1 |
3153 | 1 |
3220 | 1 |
3259 | 1 |
3290 | 1 |
3326 | 1 |
3332 | 1 |
3376 | 1 |
3467 | 1 |
3510 | 1 |
3547 | 1 |
3662 | 1 |
3810 | 1 |
3888 | 1 |
3905 | 1 |
4110 | 1 |
4155 | 1 |
4172 | 1 |
4331 | 1 |
4462 | 1 |
4661 | 1 |
7081 | 1 |
Contingency table of frequencies for number of tokens in the article content
# Summarizing the number of images in the article
filtered_channel %>%
summarise(Minimum = min(num_imgs),
Q1 = quantile(num_imgs, prob = 0.25),
Average = mean(num_imgs),
Median = median(num_imgs),
Q3 = quantile(num_imgs, prob = 0.75),
Maximum = max(num_imgs)) %>%
kable(caption = "Numerical summary of number of images in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 1 | 2.841225 | 1 | 2 | 100 |
Numerical summary of number of images in an article
# Summarizing the number of videos in the article
filtered_channel %>%
summarise(Minimum = min(num_videos),
Q1 = quantile(num_videos, prob = 0.25),
Average = mean(num_videos),
Median = median(num_videos),
Q3 = quantile(num_videos, prob = 0.75),
Maximum = max(num_videos)) %>%
kable(caption = "Numerical summary of number of videos in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0 | 0.5495431 | 0 | 1 | 51 |
Numerical summary of number of videos in an article
# Summarizing the number of positive word rate
filtered_channel %>%
summarise(Minimum = min(rate_positive_words),
Q1 = quantile(rate_positive_words, prob = 0.25),
Average = mean(rate_positive_words),
Median = median(rate_positive_words),
Q3 = quantile(rate_positive_words, prob = 0.75),
Maximum = max(rate_positive_words)) %>%
kable(caption = "Numerical Summary of the rate of positive words in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0.5357143 | 0.6233722 | 0.6428571 | 0.7416574 | 1 |
Numerical Summary of the rate of positive words in an article
# Summarizing the number of negative word rate
filtered_channel %>%
summarise(Minimum = min(rate_negative_words),
Q1 = quantile(rate_negative_words, prob = 0.25),
Average = mean(rate_negative_words),
Median = median(rate_negative_words),
Q3 = quantile(rate_negative_words, prob = 0.75),
Maximum = max(rate_negative_words)) %>%
kable(caption = "Numerical Summary of the rate of negative words in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0.25 | 0.3458933 | 0.3461538 | 0.4482759 | 1 |
Numerical Summary of the rate of negative words in an article
The graphical summaries more dramatically show the trends in the data, including skewness and outliers. The boxplots below show a visual representation of the 5 Number summaries for Shares, split up by weekday, and shares split up by text sentiment polarity. Boxplots make it even easier to look out for outliers (look for the dots separated from the main boxplot). Next, we can examine several scatterplots. Scatterplots allow us to look at one numerical variable vs another to see if there is any correlation between them. Look out for any plots that have most of the points on a diagonal line! There are four scatterplots below, investigating shares vs Number of words in the content, Number of words in the title, rate of positive words, and rate of negative words. Finally, a histogram can show the overall distribution of a numerical variable, including skewness. The histogram below sows the distribution of the shares variable. Look for a left or right tail to signify skewness, and look out for multiple peaks to signify a multi-modal variable.
# Boxplot of Shares for Each Weekday, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = weekday, y = shares)) +
geom_boxplot(fill = "grey") +
labs(x = "Weekday", title = "Boxplot of Shares for Each Weekday", y = "Shares") +
theme_classic()
# Scatterplot of Number of words in the content vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_content, y = shares)) +
geom_point(color = "grey") +
labs(x = "Number of words in the content", y = "Shares",
title = "Scatterplot of Number of words in the content vs Shares") +
theme_classic()
# Scatterplot of Number of words in the title vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_title, y = shares)) +
geom_point(color = "grey") +
labs(x = "Number of words in the title", y = "Shares",
title = "Scatterplot of Number of words in the title vs Shares") +
theme_classic()
ggplot(filtered_channel, aes(x=shares)) +
geom_histogram(color="grey", binwidth = 2000) +
labs(x = "Shares",
title = "Histogram of number of shares") +
theme_classic()
ggplot(filtered_channel, aes(x=rate_positive_words, y=shares)) +
geom_point(color="grey") +
labs(x = "rate of positive words in an article", y = "Shares",
title = "Scatterplot of rate of positive words in an article vs shares") +
theme_classic()
ggplot(filtered_channel, aes(x=rate_negative_words, y=shares)) +
geom_point(color="grey") +
labs(x = "rate of negative words in an article", y = "Shares",
title = "Scatterplot of rate of negative words in an article vs shares") +
theme_classic()
ggplot(filtered_channel, aes(x=global_sentiment_polarity, y=shares)) +
geom_point(color="grey") +
labs(x = "global sentiment polarity in an article", y = "Shares",
title = "Scatterplot of global sentiment polarity in an article vs shares") +
theme_classic()
# drop the weekday variable created for EDA (will get in the way for our models if we don't drop it)
filtered_channel <- subset(filtered_channel, select = -c(weekday))
Modeling
Splitting the Data
First, let’s split up the data into a testing set and a training set using the proportions: 70% training and 30% testing.
set.seed(9876)
# Split the data into a training and test set (70/30 split)
# indices
train <- sample(1:nrow(filtered_channel), size = nrow(filtered_channel)*.70)
test <- setdiff(1:nrow(filtered_channel), train)
# training and testing subsets
Training <- filtered_channel[train, ]
Testing <- filtered_channel[test, ]
Linear Models
Linear regression models allow us to look at relationships between one response variable and several explanatory variables. A model can also include interaction terms and even higher order terms. The general form for a linear model is , where each represents a predictor variable and the “…” can include more predictors, interactions and/or higher order terms. Since our goal is to predict shares, we will be using these models to predict of a subset of the data created for training, and then we will later test the models on the other subsetted data set aside for testing.
Linear Model #1: - Jordan
# linear model on training dataset with 5-fold cv
fit1 <- train(shares ~ . , data = Training, method = "lm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5))
Linear Model #2: - Jonathan
lm_fit <- train(
shares ~ .^2,
data=Training,
method="lm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5)
)
Random Forest - Jordan
Random Forest is a tree based method for fitting predictive models, that averages across all trees. One may choose to use a tree based methood due to their prediction accuracy, the fact that predictors do not need to be scaled, no statistical assumptions, and a built-in variable selection process. Random forest, in particular, randomly selects a subset of predictors. This corrects the bagging issue where every bootstrap contains a strong predictor for the first split.
# fandom forest model on training dataset with 5-fold cv
ranfor <- train(shares ~ ., data = Training, method = "rf", preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5),
tuneGrid = expand.grid(mtry = c(1:round(ncol(Training)/3))))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
ranfor
## Random Forest
##
## 5898 samples
## 58 predictor
##
## Pre-processing: centered (58), scaled (58)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 4720, 4718, 4718, 4718, 4718
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 5345.573 0.02647603 1870.972
## 2 5322.853 0.03188500 1881.222
## 3 5335.355 0.03149379 1913.333
## 4 5359.654 0.02785236 1931.492
## 5 5361.715 0.02918497 1945.100
## 6 5380.631 0.02673864 1959.963
## 7 5381.394 0.02792917 1962.778
## 8 5391.320 0.02716143 1976.360
## 9 5398.391 0.02676870 1974.328
## 10 5406.409 0.02697802 1984.644
## 11 5413.265 0.02564483 1985.580
## 12 5418.456 0.02629804 1996.720
## 13 5429.877 0.02537852 2002.805
## 14 5434.809 0.02435986 1998.388
## 15 5446.111 0.02321781 2005.398
## 16 5447.412 0.02366329 2005.893
## 17 5461.479 0.02265279 2011.009
## 18 5460.770 0.02299980 2021.381
## 19 5462.733 0.02246101 2017.977
## 20 5469.755 0.02265865 2021.448
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 2.
Boosted Tree - Jonathan
tune_grid <- expand.grid(
n.trees = c(5, 10, 50, 100),
interaction.depth = c(1,2,3, 4),
shrinkage = 0.1,
n.minobsinnode = 10
)
bt_fit <- train(
shares ~ .,
data=Training,
method="gbm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5)
)
## Warning in preProcess.default(method = c("center", "scale"), x = structure(c(12, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 29264604.7162 nan 0.1000 61836.0089
## 2 29196359.9080 nan 0.1000 -7929.6096
## 3 29108711.9370 nan 0.1000 45402.8961
## 4 28997581.9665 nan 0.1000 53286.2615
## 5 28909376.8021 nan 0.1000 39659.7637
## 6 28823682.0484 nan 0.1000 33892.0311
## 7 28766984.2616 nan 0.1000 31692.5585
## 8 28716839.8662 nan 0.1000 -10845.0167
## 9 28653933.7658 nan 0.1000 25398.5578
## 10 28609847.6974 nan 0.1000 32222.4909
## 20 28147125.4339 nan 0.1000 29887.6417
## 40 27679620.2654 nan 0.1000 -15302.4912
## 60 27426876.6636 nan 0.1000 2454.2492
## 80 27228651.4542 nan 0.1000 905.0252
## 100 27095830.9434 nan 0.1000 -10987.2975
## 120 26930200.0818 nan 0.1000 -22808.9263
## 140 26771227.8311 nan 0.1000 -17589.8773
## 150 26679120.2772 nan 0.1000 -6478.9072
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 29188203.1607 nan 0.1000 19519.4706
## 2 28958451.2920 nan 0.1000 -6418.2257
## 3 28804927.1331 nan 0.1000 44516.8021
## 4 28435075.7068 nan 0.1000 76776.5898
## 5 28299753.4646 nan 0.1000 56152.3688
## 6 28145609.6246 nan 0.1000 -2319.1242
## 7 28031954.7424 nan 0.1000 -6194.2521
## 8 27910710.0460 nan 0.1000 65574.0219
## 9 27789784.7100 nan 0.1000 -51637.0581
## 10 27700721.3489 nan 0.1000 2241.5853
## 20 26627588.6009 nan 0.1000 11644.1148
## 40 25319022.9876 nan 0.1000 -10858.6012
## 60 24240808.0393 nan 0.1000 -11632.0009
## 80 23630685.2330 nan 0.1000 -27574.3158
## 100 23175042.9926 nan 0.1000 -46281.8703
## 120 22639910.5941 nan 0.1000 -62013.0875
## 140 22069985.5322 nan 0.1000 27742.9889
## 150 21755275.5963 nan 0.1000 -10251.6188
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 29020059.8034 nan 0.1000 272682.6076
## 2 28765378.5977 nan 0.1000 93606.1204
## 3 28666996.4149 nan 0.1000 33145.4331
## 4 28342731.9627 nan 0.1000 10853.0874
## 5 28154604.6211 nan 0.1000 -6602.7392
## 6 27977328.4967 nan 0.1000 11129.9392
## 7 27857186.6472 nan 0.1000 18691.3107
## 8 27561765.8823 nan 0.1000 47333.1438
## 9 27237951.8906 nan 0.1000 -103420.6852
## 10 27013336.0118 nan 0.1000 -6852.3875
## 20 25333508.1366 nan 0.1000 -54839.2357
## 40 23626826.1727 nan 0.1000 -87906.7666
## 60 22387312.2050 nan 0.1000 -59587.8955
## 80 21506441.2696 nan 0.1000 -19707.6403
## 100 20575345.8013 nan 0.1000 -21328.8213
## 120 19626346.5377 nan 0.1000 -34383.7622
## 140 18981322.8675 nan 0.1000 -16431.6213
## 150 18768626.6481 nan 0.1000 -53189.5615
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30932049.0055 nan 0.1000 63679.8847
## 2 30805189.8539 nan 0.1000 89899.8494
## 3 30686598.6889 nan 0.1000 37778.0633
## 4 30641901.2560 nan 0.1000 21196.3586
## 5 30557926.0879 nan 0.1000 54681.4020
## 6 30523586.8417 nan 0.1000 -8887.5946
## 7 30474151.4663 nan 0.1000 13461.7504
## 8 30374724.9624 nan 0.1000 62785.8326
## 9 30304077.2970 nan 0.1000 22654.4562
## 10 30208006.5521 nan 0.1000 -14425.6910
## 20 29721785.8435 nan 0.1000 -18903.2676
## 40 29177934.6202 nan 0.1000 602.1123
## 60 28800170.5423 nan 0.1000 -76058.4376
## 80 28530404.5330 nan 0.1000 4470.0985
## 100 28261767.7557 nan 0.1000 -38580.1477
## 120 28085700.9562 nan 0.1000 -40795.6669
## 140 27889514.5927 nan 0.1000 19323.3526
## 150 27771878.6679 nan 0.1000 -58108.1982
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30934703.5984 nan 0.1000 35669.6516
## 2 30582029.6190 nan 0.1000 -7066.7822
## 3 30361696.5492 nan 0.1000 112492.7280
## 4 30053020.9030 nan 0.1000 20918.5515
## 5 29815814.4801 nan 0.1000 -113.9750
## 6 29504133.9788 nan 0.1000 38243.9020
## 7 29333249.5731 nan 0.1000 109382.9248
## 8 29185894.7646 nan 0.1000 61430.6242
## 9 29099121.6130 nan 0.1000 32619.9296
## 10 28843394.9850 nan 0.1000 -25989.9709
## 20 27766112.8005 nan 0.1000 20810.0116
## 40 26187314.7564 nan 0.1000 -29628.7604
## 60 25508875.1951 nan 0.1000 -43108.6459
## 80 24814815.2366 nan 0.1000 -52659.4604
## 100 24264657.2770 nan 0.1000 -62891.9210
## 120 23869060.5724 nan 0.1000 -37029.8676
## 140 23498877.6033 nan 0.1000 -37555.6471
## 150 23255795.1341 nan 0.1000 -4910.4751
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30812317.1735 nan 0.1000 126038.6600
## 2 30395096.1463 nan 0.1000 503.1965
## 3 30013498.9250 nan 0.1000 75662.7102
## 4 29791863.0498 nan 0.1000 1811.6108
## 5 29646392.5901 nan 0.1000 -11546.4511
## 6 29427565.5737 nan 0.1000 -5121.8606
## 7 29284607.7766 nan 0.1000 -25663.5292
## 8 29178963.3859 nan 0.1000 -10107.5054
## 9 28885018.5066 nan 0.1000 54445.3816
## 10 28709984.3348 nan 0.1000 105473.1604
## 20 26793323.8797 nan 0.1000 -44325.3534
## 40 24971873.7217 nan 0.1000 -135088.6769
## 60 23729846.1352 nan 0.1000 -9487.3679
## 80 22679254.7988 nan 0.1000 -66254.9954
## 100 21936301.7523 nan 0.1000 -42029.4602
## 120 21271332.6501 nan 0.1000 -35698.8292
## 140 20598256.4660 nan 0.1000 -81957.9635
## 150 20376316.7255 nan 0.1000 -29258.5822
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 26883116.7173 nan 0.1000 29030.2858
## 2 26811090.3886 nan 0.1000 45803.5275
## 3 26739979.8601 nan 0.1000 13768.3896
## 4 26674170.4152 nan 0.1000 49151.5210
## 5 26614153.9557 nan 0.1000 1809.0761
## 6 26549819.2700 nan 0.1000 35188.9170
## 7 26460453.4018 nan 0.1000 51722.7162
## 8 26411462.8062 nan 0.1000 37773.4472
## 9 26354508.7993 nan 0.1000 9327.6710
## 10 26307142.8546 nan 0.1000 17029.7098
## 20 25905886.1641 nan 0.1000 -5206.7313
## 40 25461627.8565 nan 0.1000 -638.2573
## 60 25281273.3239 nan 0.1000 -16783.9208
## 80 25035948.1564 nan 0.1000 -11541.4967
## 100 24890327.8232 nan 0.1000 -34259.8705
## 120 24751028.3026 nan 0.1000 -22291.3524
## 140 24656434.2367 nan 0.1000 -7510.8783
## 150 24574182.0966 nan 0.1000 -572.3681
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 26616491.9035 nan 0.1000 58044.2865
## 2 26515552.3031 nan 0.1000 27355.4438
## 3 26413354.9589 nan 0.1000 57703.8452
## 4 26212452.2355 nan 0.1000 623.4567
## 5 25999621.4658 nan 0.1000 -14614.9734
## 6 25894642.9877 nan 0.1000 62621.5598
## 7 25775822.2131 nan 0.1000 35421.7411
## 8 25685900.5866 nan 0.1000 40407.0831
## 9 25587345.0771 nan 0.1000 -24530.6114
## 10 25432055.7338 nan 0.1000 -36402.2007
## 20 24518424.6879 nan 0.1000 -27520.0528
## 40 23206858.6548 nan 0.1000 -28306.5986
## 60 22604618.9345 nan 0.1000 -17902.4601
## 80 22043522.5133 nan 0.1000 -15676.1289
## 100 21636987.2257 nan 0.1000 -29610.5895
## 120 21227202.8376 nan 0.1000 -28604.1290
## 140 20785547.4983 nan 0.1000 -69616.7645
## 150 20482405.6502 nan 0.1000 -72902.2067
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 26725545.4944 nan 0.1000 -3659.5361
## 2 26574823.3041 nan 0.1000 22213.2008
## 3 26269211.1945 nan 0.1000 10755.9305
## 4 26136852.7398 nan 0.1000 60763.6772
## 5 25976851.9458 nan 0.1000 26223.9027
## 6 25812393.6424 nan 0.1000 13110.7043
## 7 25697098.4355 nan 0.1000 -4289.9734
## 8 25570195.0529 nan 0.1000 -2810.5342
## 9 25341909.1762 nan 0.1000 -37632.5196
## 10 25144655.0968 nan 0.1000 41783.9296
## 20 23604751.8879 nan 0.1000 -72112.4247
## 40 21963610.9123 nan 0.1000 -11261.0860
## 60 20547180.6400 nan 0.1000 -52042.5297
## 80 19454207.8775 nan 0.1000 -25685.8714
## 100 18904517.3517 nan 0.1000 -5607.3178
## 120 18253505.5042 nan 0.1000 -54106.2997
## 140 17498224.3781 nan 0.1000 -73921.2261
## 150 17305416.1734 nan 0.1000 -35266.9876
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30925841.7301 nan 0.1000 24968.6492
## 2 30823728.7785 nan 0.1000 70775.8891
## 3 30766490.8744 nan 0.1000 16210.5110
## 4 30697036.7630 nan 0.1000 26778.9835
## 5 30659114.8582 nan 0.1000 -16509.2064
## 6 30592598.2118 nan 0.1000 59853.6057
## 7 30503595.7297 nan 0.1000 35172.0975
## 8 30420644.4411 nan 0.1000 40520.2205
## 9 30355875.1648 nan 0.1000 39963.3769
## 10 30296589.2626 nan 0.1000 62596.6132
## 20 29794202.6881 nan 0.1000 7610.4034
## 40 29221686.5847 nan 0.1000 -11298.2000
## 60 28871033.4340 nan 0.1000 -29046.9068
## 80 28745532.5767 nan 0.1000 -45345.6352
## 100 28518195.4831 nan 0.1000 -14726.0158
## 120 28278214.0011 nan 0.1000 -12729.5430
## 140 28102506.1996 nan 0.1000 11323.4313
## 150 28063225.3212 nan 0.1000 -49304.6849
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30753903.4736 nan 0.1000 17087.5011
## 2 30523510.6373 nan 0.1000 96345.6738
## 3 30223502.3581 nan 0.1000 -17011.2526
## 4 29968088.0830 nan 0.1000 -4350.2841
## 5 29845234.4025 nan 0.1000 80561.2103
## 6 29764613.8378 nan 0.1000 72734.7848
## 7 29651510.7507 nan 0.1000 47281.6226
## 8 29261888.2702 nan 0.1000 18203.8168
## 9 29162364.5705 nan 0.1000 24241.8382
## 10 28853157.1070 nan 0.1000 55152.1681
## 20 27783364.9341 nan 0.1000 30939.9048
## 40 26359180.1108 nan 0.1000 -16593.1322
## 60 24968393.0637 nan 0.1000 -3779.2909
## 80 24114398.6154 nan 0.1000 -36330.1719
## 100 23622045.4645 nan 0.1000 -71817.2338
## 120 23165608.1886 nan 0.1000 -14343.6940
## 140 22798477.7154 nan 0.1000 -52588.9827
## 150 22575297.5629 nan 0.1000 -30203.5940
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30623331.2585 nan 0.1000 143080.2943
## 2 30454821.3524 nan 0.1000 67779.4230
## 3 29930279.3303 nan 0.1000 32900.6666
## 4 29583935.3764 nan 0.1000 -27414.8590
## 5 29445318.1875 nan 0.1000 89072.4408
## 6 29119560.5139 nan 0.1000 2839.4824
## 7 28815665.1505 nan 0.1000 15752.9229
## 8 28454073.6783 nan 0.1000 -5193.2734
## 9 28186914.3990 nan 0.1000 44187.4510
## 10 28037149.7492 nan 0.1000 -3828.2462
## 20 26430700.1916 nan 0.1000 -29959.3460
## 40 24393530.2007 nan 0.1000 -94463.4543
## 60 23222804.8493 nan 0.1000 -64121.8405
## 80 22289175.3609 nan 0.1000 -18912.6286
## 100 21648543.3408 nan 0.1000 -41755.5889
## 120 20995226.0975 nan 0.1000 -69224.5666
## 140 20140285.4790 nan 0.1000 -107209.6875
## 150 19712030.1117 nan 0.1000 -47758.2194
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 34054749.0627 nan 0.1000 55853.4023
## 2 33984388.0580 nan 0.1000 44933.6927
## 3 33889505.4221 nan 0.1000 15301.8419
## 4 33804572.2258 nan 0.1000 49990.7761
## 5 33726440.5477 nan 0.1000 36151.6104
## 6 33676369.4922 nan 0.1000 24280.3068
## 7 33594821.4502 nan 0.1000 40300.0645
## 8 33523705.0983 nan 0.1000 33636.5717
## 9 33447560.4104 nan 0.1000 10822.8620
## 10 33376972.5561 nan 0.1000 23904.4532
## 20 32854005.7861 nan 0.1000 -32214.8952
## 40 32351377.9957 nan 0.1000 1597.7000
## 60 32067529.2143 nan 0.1000 -30842.9899
## 80 31801186.6423 nan 0.1000 -1031.7182
## 100 31682550.3155 nan 0.1000 1401.6818
## 120 31575105.1736 nan 0.1000 -31299.0906
## 140 31386594.8317 nan 0.1000 29584.6099
## 150 31310562.0616 nan 0.1000 -22348.9562
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 34030350.5443 nan 0.1000 -28116.1996
## 2 33703495.3490 nan 0.1000 77296.5065
## 3 33500608.8208 nan 0.1000 85072.9102
## 4 33162166.9714 nan 0.1000 22336.5888
## 5 32997129.9958 nan 0.1000 42487.1908
## 6 32741609.6410 nan 0.1000 -10417.3359
## 7 32655648.5585 nan 0.1000 -1174.5096
## 8 32460228.3047 nan 0.1000 59644.6503
## 9 32206362.1889 nan 0.1000 -54610.8548
## 10 32079212.3803 nan 0.1000 6458.3804
## 20 30906454.6161 nan 0.1000 -10829.1176
## 40 29293382.4840 nan 0.1000 -56417.1533
## 60 28220600.5675 nan 0.1000 -29122.6479
## 80 27301795.8936 nan 0.1000 -24758.0899
## 100 26663732.4174 nan 0.1000 -58326.2607
## 120 26246854.7767 nan 0.1000 -31114.9156
## 140 25705428.9384 nan 0.1000 -137562.9005
## 150 25444559.2525 nan 0.1000 -20859.0501
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 33985999.3234 nan 0.1000 46270.5590
## 2 33753105.1758 nan 0.1000 66531.3274
## 3 33496765.5874 nan 0.1000 17416.7306
## 4 33400189.9782 nan 0.1000 26083.1259
## 5 33174616.3512 nan 0.1000 20256.4427
## 6 32968551.7236 nan 0.1000 53615.9867
## 7 32864944.4321 nan 0.1000 -30009.8999
## 8 32669362.8305 nan 0.1000 21845.5438
## 9 32584930.1020 nan 0.1000 9077.7721
## 10 32212592.5519 nan 0.1000 104539.9562
## 20 29883278.2422 nan 0.1000 -19885.8635
## 40 27252167.4540 nan 0.1000 -98771.0957
## 60 26103044.3488 nan 0.1000 -30661.5789
## 80 25180390.5232 nan 0.1000 -57060.8641
## 100 24404829.4473 nan 0.1000 -70163.9802
## 120 23589552.7478 nan 0.1000 -25752.4919
## 140 22745163.4650 nan 0.1000 -96207.2084
## 150 22517918.5695 nan 0.1000 -36817.5736
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 30419566.0074 nan 0.1000 37259.6557
## 2 30346448.9746 nan 0.1000 40213.6494
## 3 30251855.8699 nan 0.1000 45726.1742
## 4 30187009.9710 nan 0.1000 45728.2218
## 5 30111678.2665 nan 0.1000 61553.6041
## 6 30035862.2940 nan 0.1000 14870.1298
## 7 29998878.2434 nan 0.1000 -158.9341
## 8 29937511.8572 nan 0.1000 49689.2459
## 9 29904667.3883 nan 0.1000 13603.5112
## 10 29869229.4971 nan 0.1000 -4247.6697
## 20 29433405.9689 nan 0.1000 -343.2730
## 40 28887949.1899 nan 0.1000 -22252.5555
## 60 28611046.8965 nan 0.1000 -337.9091
## 80 28446284.4009 nan 0.1000 -37489.1241
## 100 28320089.3364 nan 0.1000 -5238.6498
## 120 28146778.2384 nan 0.1000 -8375.1433
## 140 28021572.2770 nan 0.1000 -29879.3715
## 150 27968117.8495 nan 0.1000 -15256.2617
bt_fit
## Stochastic Gradient Boosting
##
## 5898 samples
## 58 predictor
##
## Pre-processing: centered (58), scaled (58)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 4718, 4718, 4718, 4720, 4718
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 5400.354 0.02485865 1846.406
## 1 100 5397.992 0.02855140 1853.253
## 1 150 5397.704 0.02939816 1854.367
## 2 50 5458.574 0.01796376 1868.672
## 2 100 5481.362 0.02020405 1900.717
## 2 150 5521.553 0.01841250 1927.497
## 3 50 5431.065 0.02670551 1857.143
## 3 100 5471.017 0.02565600 1903.657
## 3 150 5512.787 0.02309192 1946.188
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## Tuning parameter 'n.minobsinnode' was held
## constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 150, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode
## = 10.
Comparison - Jordan
Finally, let’s compare our four models: 2 linear models, 1 random forest model, and 1 boosted tree model.
# random forest prediction on testing model and its performance
predRF <- predict(ranfor, newdata = Testing)
RF <- postResample(predRF, Testing$shares)
# linear model 1 prediction on testing model and its performance
predlm1 <- predict(fit1, newdata = Testing)
LM <- postResample(predlm1, Testing$shares)
# linear model 2 prediction on testing model and its performance
predlm2 <- predict(lm_fit, newdata = Testing)
LM2 <- postResample(predlm2, Testing$shares)
# boosted tree prediction on testing model and its performance
predbt <- predict(bt_fit, newdata = Testing)
BT <- postResample(predbt, Testing$shares)
# combine each of the performance stats for the models and add a column with the model names
dat <- data.frame(rbind(t(data.frame(LM)), t(data.frame(RF)), t(data.frame(LM2)), t(data.frame(BT))))
df <- as_tibble(rownames_to_column(dat, "models"))
# find the model with the lowesr RMSE
best <- df %>% filter(RMSE == min(RMSE)) %>% select(models)
# print "The Best fitting model according to RMSE is [insert model name for lowest RMSE here]"
paste("The Best fitting model according to RMSE is", best$models, sep = " ")
## [1] "The Best fitting model according to RMSE is RF"
Automation - Jonathan
#rmarkdown::render(
# "Tanley-Wood-Project2.Rmd",
# output_format="github_document",
# output_dir="./Analysis",
# output_options = list(
# html_preview = FALSE
# )
#)