Tanley-Wood-Project2
Jordan Tanley and Jonathan Wood 2022-07-05
Introduction - Jonathan
Data
The data in this analysis will be the online news popularity dataset. This data has a set of features on articles from Mashable.com over a two year period.
The goal of this project is to determine the number of shares (how many times the article was shared over social media) the article has. We will use this information to predict if an article can be popular by the number of shares.
Notable Variables
While there are 61 variables in the data set, we will not use all of them for this project. The notable variables are the following:
- “shares” - the number of shares the article has gotten over social media. This is the label or variable we want our models to predict for new articles
- “data_channel_is” - a set of variables that tells if the article is in a particular category, such as business, sports, or lifestyle.
- “weekday_is” - a set of variables that tells what day of the week the article was published on.
- “num_keywords” - the number of keywords within the article
- “num_images” - the number of images within the article
- “num_videos” - the number of videos within the article
Methods
Multiple methods will be used for this project to predict the number of shares a new article can generate, including
- Linear regression
- Tree-based models
- Random forest
- Boosted tree
Data - Jordan
In order to read in the data using a relative path, be sure to have the data file saved in your working directory.
# read in the data
news <- read_csv("OnlineNewsPopularity/OnlineNewsPopularity.csv")
## Rows: 39644 Columns: 61
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): url
## dbl (60): timedelta, n_tokens_title, n_tokens_content, n_unique_tokens, n_non_stop_words, n_non_stop_unique_token...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# sneek peek at the dataset
head(news)
# Creating a weekday variable (basically undoing the 7 dummy variables that came with the data) for EDA
news$weekday <- ifelse(news$weekday_is_friday == 1, "Friday",
ifelse(news$weekday_is_monday == 1, "Monday",
ifelse(news$weekday_is_tuesday == 1, "Tuesday",
ifelse(news$weekday_is_wednesday == 1, "Wednesday",
ifelse(news$weekday_is_thursday == 1, "Thursday",
ifelse(news$weekday_is_saturday == 1, "Saturday",
"Sunday"))))))
Next, let’s subset the data so that we can only look at the data channel of interest. We will look at articles with the “Social Media” data channel.
# Subset the data to one of the parameterized data channels and drop unnecessary variables
chan <- paste0("data_channel_is_", params$channel)
print(chan)
## [1] "data_channel_is_entertainment"
filtered_channel <- news %>%
as_tibble() %>%
filter(news[chan] == 1) %>%
select(-c(url, timedelta))
# take a peek at the data
filtered_channel %>%
select(ends_with(chan))
Summarizations - Both (at least 3 plots each)
For the numerical summaries, we can look at several aspects. Contingency tables allow us to examine frequencies of categorical variables. The first output below, for example, shows the counts for each weekday. Similarly, the fifth table outputted shows the frequencies of number of tokens in the article content. Another set of summary statistics to look at are the 5 Number Summaries. These provide the minmum, 1st quantile, median, 3rd quantile, and maximum for a particular variable. Additionally, it may also be helful to look at the average. These are helpful in determining the skewness (if mean = median vs. mean < or > median) and helps in looking for outliers (anything outside (Q3 - Q1)1.5 from the median is generally considered an outlier). Below, the 5 Number summaries (plus mean) are shown for Shares, Number of words in the content, Number of words in the content for the upper quantile of Shares, number of images in the article, number of videos in the article, positive word rate, and negative word rate.
# Contingency table of frequencies for days of the week, added caption for clarity
kable(table(filtered_channel$weekday),
col.names = c("Weekday", "Frequency"),
caption = "Contingency table of frequencies for days of the week")
Weekday | Frequency |
---|---|
Friday | 972 |
Monday | 1358 |
Saturday | 380 |
Sunday | 536 |
Thursday | 1231 |
Tuesday | 1285 |
Wednesday | 1295 |
Contingency table of frequencies for days of the week
# Numerical Summary of Shares, added caption for clarity
filtered_channel %>% summarise(Minimum = min(shares),
Q1 = quantile(shares, prob = 0.25),
Average = mean(shares),
Median = median(shares),
Q3 = quantile(shares, prob = 0.75),
Maximum = max(shares)) %>%
kable(caption = "Numerical Summary of Shares")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
47 | 833 | 2970.487 | 1200 | 2100 | 210300 |
Numerical Summary of Shares
# Numerical Summary of Number of words in the content, added caption for clarity
filtered_channel %>% summarise(Minimum = min(n_tokens_content),
Q1 = quantile(n_tokens_content, prob = 0.25),
Average = mean(n_tokens_content),
Median = median(n_tokens_content),
Q3 = quantile(n_tokens_content, prob = 0.75),
Maximum = max(n_tokens_content)) %>%
kable(caption = "Numerical Summary of Number of words in the content")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 255 | 607.4574 | 433 | 805 | 6505 |
Numerical Summary of Number of words in the content
# Numerical Summary of Number of words in the content for the upper quantile of Shares, added caption for clarity
filtered_channel %>% filter(shares > quantile(shares, prob = 0.75)) %>%
summarise(Minimum = min(n_tokens_content),
Q1 = quantile(n_tokens_content, prob = 0.25),
Average = mean(n_tokens_content),
Median = median(n_tokens_content),
Q3 = quantile(n_tokens_content, prob = 0.75),
Maximum = max(n_tokens_content)) %>%
kable(caption = "Numerical Summary of Number of words in the content for the upper quantile of Shares")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 238 | 601.5838 | 410 | 809 | 6159 |
Numerical Summary of Number of words in the content for the upper quantile of Shares
kable(table(filtered_channel$n_tokens_content),
col.names = c("Tokens", "Frequency"),
caption = "Contingency table of frequencies for number of tokens in the article content")
Tokens | Frequency |
---|---|
0 | 201 |
31 | 1 |
43 | 1 |
51 | 1 |
53 | 1 |
54 | 2 |
55 | 1 |
58 | 1 |
66 | 2 |
69 | 1 |
70 | 1 |
73 | 3 |
74 | 1 |
75 | 1 |
76 | 2 |
77 | 1 |
78 | 2 |
79 | 1 |
80 | 1 |
81 | 3 |
82 | 2 |
83 | 2 |
84 | 1 |
86 | 2 |
87 | 3 |
88 | 2 |
90 | 2 |
91 | 1 |
92 | 3 |
93 | 6 |
94 | 1 |
95 | 4 |
96 | 2 |
97 | 2 |
98 | 3 |
99 | 1 |
100 | 1 |
101 | 1 |
102 | 2 |
103 | 2 |
104 | 3 |
105 | 5 |
106 | 4 |
107 | 5 |
108 | 1 |
109 | 8 |
110 | 5 |
111 | 4 |
112 | 4 |
113 | 5 |
114 | 2 |
115 | 4 |
116 | 2 |
117 | 6 |
118 | 7 |
119 | 5 |
120 | 3 |
121 | 1 |
122 | 6 |
123 | 9 |
124 | 6 |
125 | 4 |
126 | 9 |
127 | 4 |
128 | 8 |
129 | 10 |
130 | 7 |
131 | 4 |
132 | 11 |
133 | 11 |
134 | 6 |
135 | 7 |
136 | 12 |
137 | 5 |
138 | 8 |
139 | 7 |
140 | 8 |
141 | 12 |
142 | 17 |
143 | 11 |
144 | 14 |
145 | 10 |
146 | 11 |
147 | 6 |
148 | 10 |
149 | 6 |
150 | 3 |
151 | 9 |
152 | 9 |
153 | 7 |
154 | 9 |
155 | 5 |
156 | 8 |
157 | 12 |
158 | 11 |
159 | 10 |
160 | 10 |
161 | 11 |
162 | 9 |
163 | 6 |
164 | 12 |
165 | 11 |
166 | 12 |
167 | 10 |
168 | 12 |
169 | 9 |
170 | 11 |
171 | 7 |
172 | 15 |
173 | 12 |
174 | 11 |
175 | 14 |
176 | 11 |
177 | 17 |
178 | 16 |
179 | 13 |
180 | 12 |
181 | 11 |
182 | 8 |
183 | 5 |
184 | 12 |
185 | 15 |
186 | 13 |
187 | 8 |
188 | 6 |
189 | 11 |
190 | 14 |
191 | 9 |
192 | 11 |
193 | 14 |
194 | 15 |
195 | 12 |
196 | 11 |
197 | 18 |
198 | 18 |
199 | 13 |
200 | 11 |
201 | 13 |
202 | 11 |
203 | 14 |
204 | 6 |
205 | 9 |
206 | 8 |
207 | 17 |
208 | 9 |
209 | 11 |
210 | 13 |
211 | 18 |
212 | 12 |
213 | 7 |
214 | 17 |
215 | 9 |
216 | 8 |
217 | 12 |
218 | 16 |
219 | 12 |
220 | 13 |
221 | 11 |
222 | 16 |
223 | 10 |
224 | 10 |
225 | 12 |
226 | 9 |
227 | 15 |
228 | 8 |
229 | 8 |
230 | 17 |
231 | 12 |
232 | 15 |
233 | 8 |
234 | 14 |
235 | 12 |
236 | 10 |
237 | 6 |
238 | 10 |
239 | 11 |
240 | 10 |
241 | 14 |
242 | 16 |
243 | 7 |
244 | 12 |
245 | 9 |
246 | 22 |
247 | 9 |
248 | 12 |
249 | 9 |
250 | 11 |
251 | 8 |
252 | 11 |
253 | 9 |
254 | 11 |
255 | 9 |
256 | 8 |
257 | 7 |
258 | 15 |
259 | 12 |
260 | 11 |
261 | 13 |
262 | 11 |
263 | 14 |
264 | 13 |
265 | 11 |
266 | 10 |
267 | 16 |
268 | 13 |
269 | 16 |
270 | 12 |
271 | 12 |
272 | 10 |
273 | 13 |
274 | 17 |
275 | 13 |
276 | 11 |
277 | 16 |
278 | 18 |
279 | 14 |
280 | 7 |
281 | 12 |
282 | 16 |
283 | 17 |
284 | 11 |
285 | 11 |
286 | 18 |
287 | 12 |
288 | 15 |
289 | 11 |
290 | 11 |
291 | 10 |
292 | 13 |
293 | 12 |
294 | 17 |
295 | 7 |
296 | 11 |
297 | 10 |
298 | 10 |
299 | 7 |
300 | 14 |
301 | 11 |
302 | 14 |
303 | 7 |
304 | 11 |
305 | 9 |
306 | 14 |
307 | 13 |
308 | 17 |
309 | 13 |
310 | 14 |
311 | 13 |
312 | 13 |
313 | 6 |
314 | 13 |
315 | 12 |
316 | 9 |
317 | 11 |
318 | 8 |
319 | 4 |
320 | 12 |
321 | 7 |
322 | 13 |
323 | 12 |
324 | 14 |
325 | 3 |
326 | 12 |
327 | 15 |
328 | 10 |
329 | 10 |
330 | 4 |
331 | 13 |
332 | 11 |
333 | 13 |
334 | 11 |
335 | 14 |
336 | 13 |
337 | 11 |
338 | 8 |
339 | 15 |
340 | 10 |
341 | 8 |
342 | 9 |
343 | 9 |
344 | 14 |
345 | 13 |
346 | 11 |
347 | 9 |
348 | 11 |
349 | 10 |
350 | 13 |
351 | 9 |
352 | 10 |
353 | 9 |
354 | 17 |
355 | 7 |
356 | 14 |
357 | 8 |
358 | 6 |
359 | 12 |
360 | 5 |
361 | 9 |
362 | 8 |
363 | 7 |
364 | 8 |
365 | 15 |
366 | 5 |
367 | 5 |
368 | 9 |
369 | 13 |
370 | 3 |
371 | 6 |
372 | 4 |
373 | 7 |
374 | 7 |
375 | 12 |
376 | 10 |
377 | 9 |
378 | 8 |
379 | 8 |
380 | 13 |
381 | 4 |
382 | 12 |
383 | 5 |
384 | 8 |
385 | 8 |
386 | 10 |
387 | 7 |
388 | 10 |
389 | 9 |
390 | 5 |
391 | 13 |
392 | 7 |
393 | 8 |
394 | 9 |
395 | 11 |
396 | 10 |
397 | 4 |
398 | 5 |
399 | 11 |
400 | 5 |
401 | 4 |
402 | 4 |
403 | 6 |
404 | 5 |
405 | 6 |
406 | 6 |
407 | 5 |
408 | 6 |
409 | 11 |
410 | 12 |
411 | 7 |
412 | 7 |
413 | 7 |
414 | 10 |
415 | 9 |
416 | 6 |
417 | 2 |
418 | 8 |
419 | 6 |
420 | 10 |
421 | 5 |
422 | 8 |
423 | 7 |
424 | 10 |
425 | 10 |
426 | 5 |
427 | 7 |
428 | 9 |
429 | 6 |
430 | 8 |
431 | 2 |
432 | 3 |
433 | 9 |
434 | 6 |
435 | 10 |
436 | 12 |
437 | 12 |
438 | 6 |
439 | 5 |
440 | 6 |
441 | 7 |
442 | 8 |
443 | 6 |
444 | 11 |
445 | 8 |
446 | 6 |
447 | 7 |
448 | 2 |
449 | 4 |
450 | 2 |
451 | 6 |
452 | 10 |
453 | 11 |
454 | 6 |
455 | 7 |
456 | 11 |
457 | 4 |
458 | 5 |
459 | 9 |
460 | 9 |
461 | 11 |
462 | 7 |
464 | 3 |
465 | 7 |
466 | 3 |
467 | 6 |
468 | 6 |
469 | 11 |
470 | 5 |
471 | 8 |
472 | 5 |
473 | 4 |
474 | 9 |
475 | 8 |
476 | 7 |
477 | 6 |
478 | 5 |
479 | 7 |
480 | 9 |
481 | 6 |
482 | 9 |
483 | 5 |
484 | 3 |
485 | 8 |
486 | 3 |
487 | 8 |
488 | 7 |
489 | 6 |
490 | 7 |
491 | 3 |
492 | 3 |
493 | 7 |
494 | 5 |
495 | 9 |
496 | 3 |
497 | 6 |
498 | 8 |
499 | 2 |
500 | 2 |
501 | 8 |
502 | 2 |
503 | 9 |
504 | 5 |
505 | 10 |
506 | 6 |
507 | 7 |
508 | 6 |
509 | 4 |
510 | 6 |
511 | 8 |
512 | 3 |
513 | 4 |
514 | 4 |
515 | 6 |
516 | 9 |
517 | 8 |
518 | 9 |
519 | 6 |
520 | 6 |
521 | 10 |
522 | 3 |
523 | 3 |
524 | 4 |
525 | 4 |
526 | 8 |
527 | 8 |
528 | 3 |
529 | 4 |
530 | 6 |
531 | 8 |
532 | 4 |
533 | 4 |
534 | 7 |
535 | 8 |
536 | 6 |
537 | 6 |
538 | 2 |
539 | 6 |
540 | 8 |
541 | 2 |
542 | 7 |
543 | 3 |
544 | 6 |
545 | 6 |
546 | 3 |
547 | 5 |
548 | 3 |
549 | 5 |
550 | 5 |
551 | 1 |
552 | 3 |
553 | 6 |
554 | 9 |
555 | 5 |
556 | 6 |
557 | 10 |
558 | 3 |
559 | 5 |
560 | 6 |
561 | 5 |
562 | 5 |
563 | 5 |
564 | 3 |
565 | 4 |
566 | 7 |
567 | 6 |
568 | 2 |
569 | 5 |
570 | 2 |
571 | 4 |
572 | 8 |
573 | 5 |
574 | 2 |
575 | 4 |
576 | 5 |
577 | 2 |
578 | 3 |
579 | 4 |
580 | 3 |
581 | 6 |
582 | 6 |
583 | 3 |
584 | 5 |
585 | 4 |
586 | 4 |
587 | 4 |
588 | 3 |
589 | 3 |
590 | 4 |
591 | 6 |
592 | 6 |
593 | 7 |
594 | 14 |
595 | 3 |
596 | 1 |
597 | 5 |
598 | 1 |
599 | 7 |
600 | 3 |
601 | 6 |
602 | 4 |
603 | 1 |
604 | 9 |
605 | 8 |
606 | 5 |
607 | 2 |
608 | 5 |
609 | 4 |
610 | 3 |
611 | 3 |
612 | 3 |
613 | 4 |
614 | 1 |
615 | 7 |
616 | 6 |
617 | 9 |
618 | 5 |
619 | 4 |
620 | 3 |
621 | 6 |
622 | 6 |
623 | 5 |
624 | 9 |
625 | 3 |
626 | 2 |
627 | 2 |
628 | 2 |
629 | 4 |
630 | 5 |
631 | 2 |
632 | 6 |
633 | 5 |
634 | 6 |
635 | 5 |
636 | 2 |
637 | 3 |
638 | 4 |
639 | 7 |
640 | 8 |
641 | 5 |
642 | 8 |
643 | 7 |
644 | 1 |
645 | 3 |
646 | 2 |
647 | 9 |
648 | 7 |
649 | 7 |
650 | 6 |
651 | 2 |
652 | 4 |
653 | 4 |
654 | 2 |
655 | 6 |
656 | 3 |
657 | 4 |
658 | 5 |
659 | 7 |
660 | 7 |
661 | 4 |
662 | 4 |
664 | 4 |
665 | 3 |
666 | 3 |
667 | 2 |
668 | 6 |
669 | 5 |
670 | 5 |
671 | 3 |
672 | 3 |
673 | 4 |
674 | 3 |
675 | 7 |
676 | 1 |
677 | 4 |
678 | 4 |
679 | 3 |
680 | 6 |
681 | 6 |
682 | 6 |
683 | 3 |
684 | 5 |
685 | 5 |
686 | 1 |
687 | 1 |
688 | 2 |
689 | 4 |
690 | 1 |
691 | 2 |
692 | 1 |
693 | 2 |
694 | 7 |
695 | 4 |
696 | 2 |
697 | 2 |
698 | 3 |
699 | 3 |
700 | 1 |
701 | 3 |
702 | 5 |
703 | 4 |
704 | 5 |
705 | 5 |
706 | 7 |
707 | 4 |
708 | 4 |
709 | 1 |
710 | 6 |
711 | 5 |
712 | 1 |
713 | 2 |
714 | 7 |
715 | 1 |
716 | 4 |
717 | 4 |
719 | 8 |
720 | 8 |
721 | 2 |
723 | 2 |
725 | 5 |
726 | 2 |
727 | 7 |
728 | 3 |
729 | 3 |
730 | 5 |
731 | 2 |
732 | 4 |
733 | 3 |
734 | 5 |
735 | 6 |
736 | 7 |
737 | 4 |
738 | 5 |
739 | 3 |
740 | 4 |
741 | 4 |
742 | 1 |
743 | 4 |
744 | 6 |
745 | 4 |
746 | 5 |
747 | 4 |
748 | 5 |
749 | 9 |
750 | 3 |
751 | 1 |
752 | 7 |
753 | 4 |
754 | 3 |
755 | 6 |
756 | 3 |
757 | 3 |
758 | 6 |
759 | 1 |
760 | 5 |
761 | 3 |
762 | 4 |
763 | 7 |
764 | 3 |
765 | 2 |
766 | 4 |
767 | 3 |
768 | 3 |
769 | 2 |
770 | 5 |
771 | 5 |
772 | 3 |
773 | 4 |
774 | 3 |
775 | 1 |
776 | 1 |
777 | 5 |
778 | 3 |
779 | 8 |
780 | 3 |
781 | 3 |
782 | 5 |
783 | 3 |
784 | 3 |
785 | 1 |
786 | 2 |
787 | 4 |
788 | 3 |
789 | 2 |
790 | 4 |
791 | 2 |
792 | 5 |
793 | 2 |
794 | 1 |
795 | 5 |
796 | 5 |
797 | 2 |
798 | 5 |
799 | 3 |
800 | 5 |
801 | 9 |
802 | 6 |
803 | 2 |
804 | 1 |
805 | 2 |
806 | 1 |
807 | 1 |
808 | 5 |
809 | 3 |
810 | 2 |
811 | 2 |
812 | 1 |
813 | 1 |
814 | 5 |
815 | 1 |
816 | 2 |
817 | 1 |
818 | 2 |
819 | 2 |
820 | 3 |
822 | 2 |
823 | 2 |
824 | 2 |
825 | 2 |
826 | 3 |
827 | 3 |
828 | 1 |
829 | 3 |
830 | 5 |
831 | 1 |
832 | 3 |
833 | 3 |
834 | 4 |
835 | 4 |
836 | 4 |
838 | 5 |
839 | 6 |
840 | 1 |
841 | 3 |
842 | 4 |
843 | 1 |
844 | 7 |
845 | 1 |
846 | 3 |
847 | 2 |
848 | 3 |
849 | 2 |
850 | 1 |
851 | 1 |
852 | 3 |
853 | 4 |
854 | 2 |
855 | 7 |
856 | 4 |
857 | 4 |
859 | 3 |
862 | 3 |
863 | 4 |
864 | 4 |
865 | 5 |
866 | 1 |
867 | 5 |
869 | 1 |
870 | 1 |
871 | 2 |
872 | 2 |
874 | 6 |
875 | 1 |
876 | 1 |
877 | 1 |
879 | 4 |
880 | 4 |
881 | 1 |
882 | 1 |
883 | 4 |
884 | 4 |
885 | 4 |
886 | 1 |
887 | 2 |
888 | 2 |
889 | 4 |
890 | 3 |
891 | 4 |
892 | 6 |
893 | 5 |
894 | 3 |
895 | 4 |
896 | 4 |
897 | 3 |
898 | 1 |
901 | 3 |
902 | 2 |
904 | 2 |
905 | 4 |
906 | 6 |
907 | 4 |
909 | 2 |
910 | 3 |
911 | 2 |
912 | 3 |
913 | 4 |
914 | 4 |
915 | 2 |
916 | 5 |
917 | 5 |
918 | 2 |
919 | 1 |
920 | 4 |
921 | 3 |
922 | 5 |
923 | 2 |
924 | 2 |
925 | 1 |
926 | 3 |
927 | 2 |
928 | 1 |
929 | 4 |
930 | 2 |
931 | 2 |
932 | 6 |
934 | 1 |
935 | 2 |
936 | 2 |
937 | 2 |
938 | 3 |
939 | 5 |
940 | 4 |
941 | 1 |
942 | 3 |
943 | 2 |
944 | 4 |
945 | 4 |
946 | 3 |
947 | 6 |
948 | 1 |
949 | 2 |
950 | 2 |
951 | 3 |
952 | 4 |
955 | 2 |
956 | 4 |
957 | 2 |
959 | 3 |
960 | 4 |
961 | 4 |
962 | 3 |
963 | 4 |
964 | 2 |
965 | 2 |
966 | 2 |
967 | 4 |
968 | 2 |
969 | 3 |
970 | 7 |
971 | 3 |
972 | 5 |
973 | 2 |
974 | 1 |
975 | 1 |
976 | 1 |
977 | 1 |
978 | 3 |
979 | 3 |
980 | 2 |
981 | 4 |
983 | 4 |
985 | 3 |
986 | 2 |
987 | 2 |
989 | 2 |
990 | 5 |
991 | 4 |
992 | 2 |
993 | 3 |
994 | 2 |
995 | 3 |
996 | 1 |
997 | 1 |
998 | 5 |
999 | 2 |
1000 | 3 |
1002 | 3 |
1003 | 5 |
1004 | 4 |
1005 | 2 |
1007 | 1 |
1008 | 3 |
1009 | 4 |
1012 | 3 |
1013 | 3 |
1014 | 1 |
1015 | 6 |
1016 | 1 |
1018 | 1 |
1019 | 4 |
1020 | 1 |
1021 | 1 |
1022 | 1 |
1023 | 3 |
1024 | 2 |
1025 | 2 |
1026 | 2 |
1027 | 4 |
1029 | 4 |
1030 | 2 |
1031 | 4 |
1033 | 2 |
1034 | 2 |
1035 | 3 |
1036 | 2 |
1037 | 4 |
1038 | 3 |
1039 | 2 |
1040 | 3 |
1041 | 1 |
1042 | 4 |
1043 | 1 |
1044 | 1 |
1045 | 3 |
1046 | 5 |
1047 | 3 |
1048 | 3 |
1049 | 5 |
1050 | 2 |
1051 | 1 |
1052 | 3 |
1053 | 6 |
1054 | 1 |
1055 | 1 |
1056 | 6 |
1057 | 4 |
1058 | 1 |
1059 | 3 |
1060 | 2 |
1061 | 2 |
1062 | 1 |
1063 | 4 |
1064 | 1 |
1065 | 2 |
1066 | 3 |
1067 | 1 |
1068 | 1 |
1069 | 2 |
1070 | 3 |
1071 | 2 |
1072 | 2 |
1074 | 3 |
1075 | 1 |
1076 | 2 |
1077 | 3 |
1078 | 3 |
1079 | 1 |
1080 | 3 |
1081 | 1 |
1082 | 3 |
1083 | 3 |
1084 | 1 |
1085 | 1 |
1086 | 2 |
1087 | 1 |
1088 | 4 |
1089 | 1 |
1090 | 1 |
1091 | 4 |
1092 | 4 |
1093 | 2 |
1095 | 1 |
1096 | 5 |
1097 | 3 |
1098 | 2 |
1099 | 1 |
1100 | 1 |
1101 | 1 |
1103 | 3 |
1104 | 2 |
1105 | 3 |
1106 | 2 |
1107 | 3 |
1108 | 5 |
1109 | 5 |
1110 | 2 |
1111 | 2 |
1113 | 4 |
1114 | 3 |
1115 | 2 |
1117 | 2 |
1120 | 2 |
1123 | 1 |
1124 | 4 |
1125 | 2 |
1126 | 1 |
1127 | 2 |
1128 | 2 |
1129 | 1 |
1130 | 4 |
1132 | 2 |
1133 | 2 |
1135 | 2 |
1137 | 1 |
1138 | 1 |
1139 | 3 |
1140 | 1 |
1141 | 1 |
1142 | 2 |
1144 | 2 |
1145 | 1 |
1146 | 4 |
1147 | 1 |
1148 | 2 |
1149 | 2 |
1150 | 1 |
1151 | 1 |
1152 | 1 |
1153 | 2 |
1154 | 1 |
1155 | 4 |
1157 | 2 |
1158 | 1 |
1159 | 2 |
1160 | 2 |
1163 | 4 |
1165 | 4 |
1166 | 2 |
1167 | 1 |
1168 | 2 |
1169 | 2 |
1170 | 3 |
1171 | 2 |
1172 | 1 |
1173 | 1 |
1174 | 2 |
1175 | 3 |
1176 | 3 |
1177 | 1 |
1179 | 3 |
1181 | 1 |
1182 | 1 |
1183 | 2 |
1184 | 1 |
1185 | 2 |
1186 | 2 |
1187 | 1 |
1188 | 1 |
1189 | 3 |
1190 | 1 |
1191 | 2 |
1192 | 2 |
1193 | 1 |
1194 | 1 |
1195 | 2 |
1196 | 1 |
1197 | 2 |
1198 | 1 |
1199 | 2 |
1201 | 2 |
1202 | 1 |
1203 | 3 |
1204 | 2 |
1205 | 2 |
1207 | 2 |
1208 | 3 |
1209 | 1 |
1210 | 1 |
1211 | 4 |
1212 | 1 |
1213 | 1 |
1214 | 1 |
1215 | 3 |
1216 | 2 |
1217 | 3 |
1218 | 2 |
1220 | 2 |
1222 | 1 |
1223 | 1 |
1224 | 3 |
1225 | 3 |
1226 | 1 |
1227 | 2 |
1229 | 3 |
1231 | 3 |
1232 | 1 |
1233 | 2 |
1234 | 1 |
1235 | 1 |
1237 | 1 |
1238 | 1 |
1240 | 3 |
1241 | 1 |
1243 | 2 |
1244 | 4 |
1245 | 2 |
1247 | 2 |
1248 | 2 |
1249 | 1 |
1250 | 5 |
1251 | 1 |
1252 | 2 |
1253 | 2 |
1255 | 2 |
1256 | 3 |
1257 | 2 |
1259 | 2 |
1261 | 1 |
1262 | 1 |
1263 | 2 |
1264 | 1 |
1265 | 2 |
1267 | 2 |
1268 | 1 |
1269 | 2 |
1270 | 1 |
1271 | 1 |
1272 | 2 |
1273 | 2 |
1274 | 1 |
1275 | 2 |
1276 | 1 |
1277 | 3 |
1278 | 2 |
1280 | 3 |
1281 | 2 |
1282 | 1 |
1283 | 1 |
1284 | 2 |
1285 | 4 |
1286 | 1 |
1287 | 1 |
1288 | 2 |
1289 | 1 |
1290 | 1 |
1292 | 1 |
1293 | 1 |
1294 | 1 |
1295 | 3 |
1296 | 4 |
1297 | 4 |
1298 | 3 |
1302 | 2 |
1304 | 1 |
1305 | 2 |
1306 | 2 |
1308 | 3 |
1309 | 2 |
1310 | 3 |
1311 | 1 |
1312 | 1 |
1313 | 2 |
1314 | 1 |
1316 | 2 |
1317 | 1 |
1318 | 1 |
1319 | 1 |
1320 | 2 |
1321 | 1 |
1322 | 1 |
1324 | 1 |
1325 | 2 |
1326 | 1 |
1327 | 1 |
1328 | 1 |
1330 | 3 |
1331 | 3 |
1332 | 4 |
1334 | 4 |
1335 | 1 |
1336 | 3 |
1337 | 1 |
1338 | 4 |
1339 | 1 |
1340 | 3 |
1341 | 1 |
1342 | 4 |
1343 | 1 |
1345 | 1 |
1346 | 1 |
1347 | 1 |
1348 | 1 |
1349 | 1 |
1350 | 1 |
1351 | 3 |
1352 | 1 |
1353 | 1 |
1355 | 2 |
1357 | 1 |
1358 | 1 |
1361 | 1 |
1363 | 1 |
1364 | 1 |
1365 | 2 |
1367 | 1 |
1368 | 1 |
1369 | 3 |
1371 | 1 |
1372 | 3 |
1374 | 2 |
1375 | 3 |
1376 | 2 |
1377 | 2 |
1379 | 2 |
1381 | 1 |
1382 | 1 |
1383 | 1 |
1384 | 2 |
1385 | 2 |
1386 | 1 |
1387 | 3 |
1388 | 1 |
1390 | 3 |
1393 | 2 |
1396 | 2 |
1398 | 1 |
1399 | 1 |
1401 | 2 |
1404 | 2 |
1405 | 2 |
1406 | 1 |
1407 | 1 |
1408 | 1 |
1409 | 2 |
1410 | 1 |
1411 | 3 |
1412 | 2 |
1414 | 2 |
1415 | 2 |
1416 | 1 |
1422 | 1 |
1423 | 1 |
1424 | 1 |
1425 | 1 |
1426 | 1 |
1427 | 2 |
1429 | 1 |
1430 | 2 |
1432 | 1 |
1433 | 4 |
1434 | 1 |
1436 | 2 |
1438 | 1 |
1439 | 2 |
1440 | 3 |
1442 | 1 |
1443 | 2 |
1444 | 1 |
1445 | 4 |
1446 | 1 |
1447 | 2 |
1448 | 3 |
1450 | 3 |
1452 | 1 |
1453 | 1 |
1454 | 1 |
1456 | 2 |
1457 | 2 |
1458 | 2 |
1459 | 2 |
1460 | 2 |
1461 | 1 |
1462 | 1 |
1463 | 1 |
1464 | 1 |
1465 | 1 |
1467 | 1 |
1468 | 1 |
1469 | 2 |
1470 | 1 |
1471 | 1 |
1473 | 1 |
1477 | 1 |
1478 | 1 |
1479 | 2 |
1480 | 1 |
1481 | 2 |
1482 | 2 |
1485 | 2 |
1486 | 2 |
1487 | 2 |
1489 | 4 |
1491 | 4 |
1492 | 1 |
1493 | 1 |
1494 | 1 |
1496 | 1 |
1497 | 1 |
1499 | 5 |
1500 | 1 |
1501 | 1 |
1503 | 1 |
1504 | 1 |
1506 | 1 |
1507 | 1 |
1508 | 1 |
1511 | 3 |
1513 | 1 |
1514 | 2 |
1515 | 1 |
1516 | 1 |
1517 | 1 |
1518 | 2 |
1519 | 1 |
1520 | 1 |
1524 | 2 |
1525 | 2 |
1527 | 2 |
1528 | 1 |
1530 | 1 |
1531 | 1 |
1534 | 3 |
1536 | 1 |
1538 | 1 |
1541 | 1 |
1543 | 1 |
1545 | 1 |
1546 | 1 |
1549 | 2 |
1551 | 1 |
1554 | 2 |
1555 | 3 |
1556 | 2 |
1557 | 1 |
1558 | 2 |
1560 | 1 |
1562 | 2 |
1563 | 2 |
1564 | 1 |
1566 | 3 |
1567 | 2 |
1570 | 2 |
1571 | 2 |
1572 | 1 |
1573 | 1 |
1575 | 2 |
1576 | 2 |
1580 | 2 |
1581 | 1 |
1585 | 1 |
1586 | 1 |
1593 | 1 |
1597 | 1 |
1599 | 2 |
1602 | 1 |
1603 | 3 |
1604 | 1 |
1605 | 1 |
1608 | 1 |
1610 | 2 |
1612 | 1 |
1613 | 2 |
1615 | 1 |
1616 | 1 |
1617 | 3 |
1618 | 2 |
1622 | 1 |
1623 | 2 |
1625 | 1 |
1626 | 1 |
1628 | 1 |
1629 | 2 |
1630 | 1 |
1631 | 1 |
1633 | 2 |
1637 | 1 |
1642 | 2 |
1643 | 1 |
1644 | 4 |
1646 | 1 |
1648 | 1 |
1651 | 1 |
1653 | 1 |
1654 | 1 |
1655 | 1 |
1656 | 1 |
1659 | 1 |
1662 | 1 |
1665 | 1 |
1666 | 1 |
1668 | 2 |
1672 | 2 |
1673 | 1 |
1674 | 2 |
1676 | 2 |
1677 | 1 |
1678 | 1 |
1679 | 3 |
1682 | 1 |
1683 | 2 |
1684 | 1 |
1686 | 2 |
1687 | 1 |
1691 | 1 |
1692 | 2 |
1693 | 1 |
1694 | 1 |
1698 | 1 |
1701 | 1 |
1702 | 1 |
1703 | 1 |
1707 | 1 |
1709 | 1 |
1711 | 1 |
1712 | 1 |
1715 | 2 |
1716 | 1 |
1718 | 2 |
1722 | 1 |
1723 | 1 |
1725 | 1 |
1726 | 1 |
1730 | 1 |
1731 | 1 |
1735 | 1 |
1736 | 2 |
1739 | 1 |
1740 | 1 |
1742 | 2 |
1746 | 1 |
1752 | 1 |
1754 | 1 |
1759 | 1 |
1760 | 2 |
1764 | 1 |
1769 | 1 |
1770 | 1 |
1771 | 1 |
1776 | 2 |
1777 | 1 |
1783 | 3 |
1784 | 1 |
1785 | 2 |
1790 | 1 |
1792 | 2 |
1793 | 1 |
1803 | 1 |
1804 | 2 |
1807 | 1 |
1811 | 1 |
1816 | 1 |
1817 | 1 |
1819 | 1 |
1820 | 1 |
1824 | 1 |
1825 | 1 |
1830 | 1 |
1831 | 2 |
1832 | 1 |
1834 | 1 |
1839 | 2 |
1842 | 1 |
1843 | 2 |
1845 | 2 |
1846 | 1 |
1848 | 1 |
1849 | 1 |
1850 | 1 |
1852 | 2 |
1857 | 1 |
1861 | 1 |
1868 | 1 |
1872 | 1 |
1873 | 1 |
1874 | 1 |
1876 | 1 |
1878 | 1 |
1879 | 1 |
1880 | 1 |
1881 | 1 |
1884 | 2 |
1885 | 1 |
1887 | 2 |
1889 | 1 |
1892 | 1 |
1895 | 1 |
1896 | 2 |
1898 | 2 |
1900 | 1 |
1901 | 1 |
1912 | 1 |
1916 | 1 |
1920 | 2 |
1923 | 1 |
1929 | 1 |
1934 | 2 |
1938 | 1 |
1939 | 1 |
1945 | 2 |
1947 | 1 |
1949 | 1 |
1951 | 1 |
1952 | 1 |
1955 | 2 |
1956 | 1 |
1957 | 1 |
1960 | 1 |
1961 | 1 |
1963 | 1 |
1964 | 1 |
1965 | 1 |
1966 | 1 |
1970 | 1 |
1974 | 2 |
1977 | 1 |
1978 | 1 |
1989 | 1 |
1993 | 1 |
1997 | 1 |
1998 | 1 |
2002 | 1 |
2003 | 1 |
2009 | 1 |
2010 | 1 |
2011 | 1 |
2013 | 1 |
2016 | 1 |
2018 | 1 |
2021 | 1 |
2028 | 1 |
2029 | 1 |
2030 | 1 |
2032 | 1 |
2033 | 1 |
2036 | 1 |
2037 | 1 |
2038 | 1 |
2039 | 1 |
2040 | 1 |
2045 | 1 |
2049 | 1 |
2061 | 2 |
2063 | 1 |
2064 | 1 |
2066 | 1 |
2068 | 1 |
2072 | 1 |
2075 | 1 |
2083 | 1 |
2084 | 3 |
2086 | 1 |
2088 | 1 |
2089 | 1 |
2090 | 1 |
2095 | 1 |
2100 | 1 |
2102 | 1 |
2104 | 1 |
2105 | 1 |
2113 | 2 |
2118 | 1 |
2119 | 1 |
2122 | 1 |
2123 | 1 |
2141 | 1 |
2148 | 1 |
2153 | 1 |
2161 | 1 |
2166 | 1 |
2167 | 1 |
2170 | 1 |
2173 | 1 |
2178 | 1 |
2181 | 1 |
2182 | 1 |
2184 | 2 |
2188 | 2 |
2191 | 1 |
2201 | 1 |
2204 | 1 |
2208 | 1 |
2210 | 1 |
2216 | 1 |
2223 | 1 |
2224 | 1 |
2239 | 1 |
2254 | 1 |
2257 | 1 |
2276 | 1 |
2278 | 1 |
2290 | 1 |
2299 | 1 |
2314 | 2 |
2325 | 2 |
2335 | 1 |
2338 | 1 |
2343 | 1 |
2349 | 1 |
2360 | 1 |
2364 | 1 |
2365 | 1 |
2369 | 1 |
2371 | 1 |
2375 | 1 |
2377 | 1 |
2384 | 1 |
2386 | 1 |
2391 | 1 |
2419 | 1 |
2420 | 1 |
2424 | 1 |
2428 | 1 |
2436 | 2 |
2439 | 1 |
2452 | 1 |
2461 | 1 |
2486 | 1 |
2490 | 1 |
2505 | 1 |
2517 | 1 |
2521 | 1 |
2523 | 1 |
2540 | 1 |
2541 | 1 |
2565 | 1 |
2572 | 1 |
2580 | 1 |
2593 | 1 |
2610 | 1 |
2632 | 1 |
2636 | 1 |
2654 | 1 |
2668 | 1 |
2674 | 1 |
2679 | 1 |
2716 | 1 |
2718 | 1 |
2731 | 1 |
2735 | 1 |
2778 | 1 |
2795 | 1 |
2799 | 1 |
2808 | 1 |
2814 | 1 |
2828 | 1 |
2845 | 1 |
2849 | 1 |
2893 | 1 |
2917 | 1 |
2936 | 1 |
2942 | 1 |
2946 | 1 |
2963 | 1 |
2973 | 1 |
3019 | 1 |
3174 | 1 |
3342 | 1 |
3349 | 1 |
3359 | 1 |
3395 | 1 |
3428 | 1 |
3550 | 1 |
3652 | 1 |
3702 | 1 |
3736 | 1 |
3751 | 1 |
3815 | 1 |
4046 | 1 |
4306 | 1 |
4514 | 1 |
4574 | 1 |
5194 | 1 |
5553 | 1 |
5562 | 1 |
6159 | 1 |
6505 | 1 |
Contingency table of frequencies for number of tokens in the article content
# Summarizing the number of images in the article
filtered_channel %>%
summarise(Minimum = min(num_imgs),
Q1 = quantile(num_imgs, prob = 0.25),
Average = mean(num_imgs),
Median = median(num_imgs),
Q3 = quantile(num_imgs, prob = 0.75),
Maximum = max(num_imgs)) %>%
kable(caption = "Numerical summary of number of images in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 1 | 6.317699 | 1 | 8 | 128 |
Numerical summary of number of images in an article
# Summarizing the number of videos in the article
filtered_channel %>%
summarise(Minimum = min(num_videos),
Q1 = quantile(num_videos, prob = 0.25),
Average = mean(num_videos),
Median = median(num_videos),
Q3 = quantile(num_videos, prob = 0.75),
Maximum = max(num_videos)) %>%
kable(caption = "Numerical summary of number of videos in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0 | 2.545841 | 1 | 1 | 74 |
Numerical summary of number of videos in an article
# Summarizing the number of positive word rate
filtered_channel %>%
summarise(Minimum = min(rate_positive_words),
Q1 = quantile(rate_positive_words, prob = 0.25),
Average = mean(rate_positive_words),
Median = median(rate_positive_words),
Q3 = quantile(rate_positive_words, prob = 0.75),
Maximum = max(rate_positive_words)) %>%
kable(caption = "Numerical Summary of the rate of positive words in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0.5789474 | 0.6663317 | 0.6875 | 0.7843137 | 1 |
Numerical Summary of the rate of positive words in an article
# Summarizing the number of negative word rate
filtered_channel %>%
summarise(Minimum = min(rate_negative_words),
Q1 = quantile(rate_negative_words, prob = 0.25),
Average = mean(rate_negative_words),
Median = median(rate_negative_words),
Q3 = quantile(rate_negative_words, prob = 0.75),
Maximum = max(rate_negative_words)) %>%
kable(caption = "Numerical Summary of the rate of negative words in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0.2 | 0.3050442 | 0.3 | 0.4038462 | 1 |
Numerical Summary of the rate of negative words in an article
The graphical summaries more dramatically show the trends in the data, including skewness and outliers. The boxplots below show a visual representation of the 5 Number summaries for Shares, split up by weekday, and shares split up by text sentiment polarity. Boxplots make it even easier to look out for outliers (look for the dots separated from the main boxplot). Next, we can examine several scatterplots. Scatterplots allow us to look at one numerical variable vs another to see if there is any correlation between them. Look out for any plots that have most of the points on a diagonal line! There are four scatterplots below, investigating shares vs Number of words in the content, Number of words in the title, rate of positive words, and rate of negative words. Finally, a histogram can show the overall distribution of a numerical variable, including skewness. The histogram below sows the distribution of the shares variable. Look for a left or right tail to signify skewness, and look out for multiple peaks to signify a multi-modal variable.
# Boxplot of Shares for Each Weekday, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = weekday, y = shares)) +
geom_boxplot(fill = "grey") +
labs(x = "Weekday", title = "Boxplot of Shares for Each Weekday", y = "Shares") +
theme_classic()
# Scatterplot of Number of words in the content vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_content, y = shares)) +
geom_point(color = "grey") +
labs(x = "Number of words in the content", y = "Shares",
title = "Scatterplot of Number of words in the content vs Shares") +
theme_classic()
# Scatterplot of Number of words in the title vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_title, y = shares)) +
geom_point(color = "grey") +
labs(x = "Number of words in the title", y = "Shares",
title = "Scatterplot of Number of words in the title vs Shares") +
theme_classic()
ggplot(filtered_channel, aes(x=shares)) +
geom_histogram(color="grey", binwidth = 2000) +
labs(x = "Shares",
title = "Histogram of number of shares") +
theme_classic()
ggplot(filtered_channel, aes(x=rate_positive_words, y=shares)) +
geom_point(color="grey") +
labs(x = "rate of positive words in an article", y = "Shares",
title = "Scatterplot of rate of positive words in an article vs shares") +
theme_classic()
ggplot(filtered_channel, aes(x=rate_negative_words, y=shares)) +
geom_point(color="grey") +
labs(x = "rate of negative words in an article", y = "Shares",
title = "Scatterplot of rate of negative words in an article vs shares") +
theme_classic()
ggplot(filtered_channel, aes(x=global_sentiment_polarity, y=shares)) +
geom_point(color="grey") +
labs(x = "global sentiment polarity in an article", y = "Shares",
title = "Scatterplot of global sentiment polarity in an article vs shares") +
theme_classic()
# drop the weekday variable created for EDA (will get in the way for our models if we don't drop it)
filtered_channel <- subset(filtered_channel, select = -c(weekday))
Modeling
Splitting the Data
First, let’s split up the data into a testing set and a training set using the proportions: 70% training and 30% testing.
set.seed(9876)
# Split the data into a training and test set (70/30 split)
# indices
train <- sample(1:nrow(filtered_channel), size = nrow(filtered_channel)*.70)
test <- setdiff(1:nrow(filtered_channel), train)
# training and testing subsets
Training <- filtered_channel[train, ]
Testing <- filtered_channel[test, ]
Linear Models
Linear regression models allow us to look at relationships between one response variable and several explanatory variables. A model can also include interaction terms and even higher order terms. The general form for a linear model is , where each represents a predictor variable and the “…” can include more predictors, interactions and/or higher order terms. Since our goal is to predict shares, we will be using these models to predict of a subset of the data created for training, and then we will later test the models on the other subsetted data set aside for testing.
Linear Model #1: - Jordan
# linear model on training dataset with 5-fold cv
fit1 <- train(shares ~ . , data = Training, method = "lm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5))
Linear Model #2: - Jonathan
lm_fit <- train(
shares ~ .^2,
data=Training,
method="lm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5)
)
Random Forest - Jordan
Random Forest is a tree based method for fitting predictive models, that averages across all trees. One may choose to use a tree based methood due to their prediction accuracy, the fact that predictors do not need to be scaled, no statistical assumptions, and a built-in variable selection process. Random forest, in particular, randomly selects a subset of predictors. This corrects the bagging issue where every bootstrap contains a strong predictor for the first split.
# fandom forest model on training dataset with 5-fold cv
ranfor <- train(shares ~ ., data = Training, method = "rf", preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5),
tuneGrid = expand.grid(mtry = c(1:round(ncol(Training)/3))))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
ranfor
## Random Forest
##
## 4939 samples
## 58 predictor
##
## Pre-processing: centered (58), scaled (58)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 3952, 3952, 3950, 3952, 3950
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 7322.801 0.03719910 2884.251
## 2 7261.771 0.04416517 2905.477
## 3 7259.764 0.04683093 2945.705
## 4 7257.441 0.04813149 2969.800
## 5 7274.750 0.04651551 2981.350
## 6 7270.060 0.04894000 2980.080
## 7 7286.204 0.04692995 2994.422
## 8 7291.769 0.04644447 3010.634
## 9 7315.709 0.04309422 3018.477
## 10 7323.113 0.04357281 3028.397
## 11 7329.393 0.04418740 3026.235
## 12 7333.955 0.04400734 3035.712
## 13 7323.320 0.04555137 3028.184
## 14 7339.371 0.04346928 3037.240
## 15 7344.910 0.04406313 3041.250
## 16 7342.588 0.04518231 3046.621
## 17 7327.670 0.04779090 3035.731
## 18 7348.218 0.04595008 3053.670
## 19 7378.930 0.04128517 3056.058
## 20 7354.649 0.04561489 3053.310
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 4.
Boosted Tree - Jonathan
tune_grid <- expand.grid(
n.trees = c(5, 10, 50, 100),
interaction.depth = c(1,2,3, 4),
shrinkage = 0.1,
n.minobsinnode = 10
)
bt_fit <- train(
shares ~ .,
data=Training,
method="gbm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5)
)
## Warning in preProcess.default(method = c("center", "scale"), x = structure(c(13, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 54251791.7616 nan 0.1000 237314.8847
## 2 53992161.4797 nan 0.1000 129904.0034
## 3 53738055.1779 nan 0.1000 48786.8624
## 4 53409366.9828 nan 0.1000 147660.7754
## 5 53224345.2766 nan 0.1000 171099.6320
## 6 52993564.4760 nan 0.1000 28181.0123
## 7 52828338.6556 nan 0.1000 -103380.8733
## 8 52686834.1232 nan 0.1000 138578.0817
## 9 52556745.9142 nan 0.1000 55864.4673
## 10 52412074.0512 nan 0.1000 -72220.9100
## 20 51365420.1527 nan 0.1000 -21778.2527
## 40 50424749.0667 nan 0.1000 -95290.2867
## 60 49884724.4365 nan 0.1000 -183778.3618
## 80 49469657.8269 nan 0.1000 -236233.3011
## 100 49138347.6260 nan 0.1000 -105562.3049
## 120 48885058.0393 nan 0.1000 -172495.4635
## 140 48642743.5955 nan 0.1000 -70834.1521
## 150 48566666.3101 nan 0.1000 -149299.2271
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 54170824.6348 nan 0.1000 199061.8610
## 2 53174417.9441 nan 0.1000 122421.4854
## 3 52901663.3075 nan 0.1000 171650.9865
## 4 52521228.2318 nan 0.1000 107373.9772
## 5 52206753.4887 nan 0.1000 48329.8708
## 6 52043755.8266 nan 0.1000 73736.7390
## 7 51935397.0900 nan 0.1000 50779.8031
## 8 51618515.1220 nan 0.1000 43357.5465
## 9 51380614.1682 nan 0.1000 -102181.4008
## 10 51176350.5529 nan 0.1000 -40285.6388
## 20 49454297.9489 nan 0.1000 -160856.2174
## 40 48038888.1653 nan 0.1000 -42887.5057
## 60 47547901.7347 nan 0.1000 -143257.4584
## 80 45936817.3564 nan 0.1000 -45658.5727
## 100 45182974.7463 nan 0.1000 -118846.1240
## 120 44092587.4313 nan 0.1000 -93068.8352
## 140 43446071.7913 nan 0.1000 -48529.2856
## 150 42476758.8676 nan 0.1000 -6859.9032
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 53754124.8033 nan 0.1000 307266.0764
## 2 53229233.2799 nan 0.1000 266960.5336
## 3 52749232.4040 nan 0.1000 -23587.2271
## 4 52214497.9988 nan 0.1000 214993.4762
## 5 52023743.1060 nan 0.1000 -74413.9534
## 6 51683839.8499 nan 0.1000 -28849.8338
## 7 51407116.8397 nan 0.1000 56780.0364
## 8 51100810.8438 nan 0.1000 24337.1057
## 9 50653782.0769 nan 0.1000 -199470.1282
## 10 50338906.5793 nan 0.1000 85867.0832
## 20 48423590.1394 nan 0.1000 -91546.2173
## 40 45884439.4624 nan 0.1000 -280512.0341
## 60 44204871.3287 nan 0.1000 -190295.8244
## 80 43041635.7395 nan 0.1000 -128361.7293
## 100 40789852.0312 nan 0.1000 -2458.7234
## 120 39819261.3826 nan 0.1000 -127332.4953
## 140 38637635.3767 nan 0.1000 -34486.1095
## 150 38173161.4008 nan 0.1000 -151681.7042
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 61615136.7379 nan 0.1000 132471.3528
## 2 61476416.6747 nan 0.1000 144183.3525
## 3 61113944.0358 nan 0.1000 303515.2975
## 4 60956398.9223 nan 0.1000 159478.3026
## 5 60519675.8070 nan 0.1000 -30469.9238
## 6 60252649.9272 nan 0.1000 203970.2792
## 7 59998843.0687 nan 0.1000 75372.1235
## 8 59847618.2947 nan 0.1000 122395.4506
## 9 59701783.8650 nan 0.1000 -4804.1592
## 10 59465868.1495 nan 0.1000 47907.6933
## 20 58025133.0042 nan 0.1000 -202012.7613
## 40 56791725.7496 nan 0.1000 29486.6901
## 60 55854365.9157 nan 0.1000 -104886.6501
## 80 55309201.6109 nan 0.1000 -33145.0696
## 100 55112349.7103 nan 0.1000 -215418.1093
## 120 54490683.3358 nan 0.1000 -58435.3181
## 140 53990365.3650 nan 0.1000 -38820.1269
## 150 53928195.1677 nan 0.1000 -153158.2262
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 61634837.2752 nan 0.1000 -11033.7087
## 2 61355313.5397 nan 0.1000 70957.8768
## 3 60867733.6209 nan 0.1000 454835.0973
## 4 60213927.4742 nan 0.1000 88639.6178
## 5 59933773.6125 nan 0.1000 44457.5191
## 6 59624583.4286 nan 0.1000 142493.1795
## 7 59097823.0697 nan 0.1000 -210147.9858
## 8 58691821.0981 nan 0.1000 -39655.8413
## 9 58353929.2484 nan 0.1000 151640.8943
## 10 58082387.0481 nan 0.1000 127042.4161
## 20 56311439.0676 nan 0.1000 -367305.1835
## 40 54044110.2151 nan 0.1000 -73964.6882
## 60 51594069.5111 nan 0.1000 -128687.7846
## 80 49895267.2927 nan 0.1000 -188754.6236
## 100 48668198.3339 nan 0.1000 -99501.5119
## 120 47459670.9373 nan 0.1000 -288776.2628
## 140 46110806.8622 nan 0.1000 -47712.4426
## 150 45714297.0763 nan 0.1000 -237119.2456
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 60948848.4954 nan 0.1000 124857.9055
## 2 60289096.0179 nan 0.1000 380768.3733
## 3 59806052.7044 nan 0.1000 90227.9455
## 4 59322739.8958 nan 0.1000 62198.6476
## 5 58989347.5539 nan 0.1000 -12445.0055
## 6 58235534.3599 nan 0.1000 -87720.9689
## 7 57996718.8810 nan 0.1000 41042.2524
## 8 57836302.6332 nan 0.1000 50972.0019
## 9 57496867.8390 nan 0.1000 -48575.2493
## 10 57293118.1286 nan 0.1000 -57488.6969
## 20 54825170.5647 nan 0.1000 -116004.5128
## 40 50779465.7364 nan 0.1000 -395132.1164
## 60 48144831.0324 nan 0.1000 -255253.2850
## 80 46325045.6942 nan 0.1000 -133814.1554
## 100 45069975.9825 nan 0.1000 -247123.6826
## 120 43769803.4811 nan 0.1000 -182455.7507
## 140 42141944.4139 nan 0.1000 -259239.7558
## 150 41441698.9913 nan 0.1000 -165272.1195
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 64209161.1876 nan 0.1000 521956.8568
## 2 63871776.2573 nan 0.1000 270892.3016
## 3 63429190.2589 nan 0.1000 433150.5340
## 4 63224592.8991 nan 0.1000 13142.3371
## 5 62635699.5720 nan 0.1000 -4724.5590
## 6 62478640.1681 nan 0.1000 91108.3045
## 7 62367822.1430 nan 0.1000 75191.6201
## 8 61956400.6269 nan 0.1000 -143432.9306
## 9 61652450.0062 nan 0.1000 -314718.1098
## 10 61551172.0285 nan 0.1000 111080.3270
## 20 60432565.9959 nan 0.1000 -32508.2786
## 40 59582751.9174 nan 0.1000 -201802.4320
## 60 58591677.3318 nan 0.1000 -300563.6443
## 80 57961052.5976 nan 0.1000 -272845.7421
## 100 57637102.2941 nan 0.1000 -134672.7259
## 120 57250500.5410 nan 0.1000 -143486.0030
## 140 56613535.9694 nan 0.1000 -123058.0542
## 150 56372198.9067 nan 0.1000 -390579.7086
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 64377941.3730 nan 0.1000 -48162.6115
## 2 64100601.1805 nan 0.1000 207713.8031
## 3 63611369.2530 nan 0.1000 238145.1221
## 4 63374883.4268 nan 0.1000 -32384.7373
## 5 63046045.7225 nan 0.1000 92162.4872
## 6 62700860.3800 nan 0.1000 -13021.1858
## 7 62135817.9955 nan 0.1000 192182.2202
## 8 61766734.0342 nan 0.1000 109671.0909
## 9 61236697.4416 nan 0.1000 403248.8510
## 10 60758761.2541 nan 0.1000 -112055.9786
## 20 58433716.6936 nan 0.1000 -129436.5127
## 40 55254253.8324 nan 0.1000 -72172.7643
## 60 53663069.9308 nan 0.1000 -229454.0264
## 80 52236090.4372 nan 0.1000 -103428.8113
## 100 50373888.4306 nan 0.1000 -132443.6718
## 120 48325416.5832 nan 0.1000 -2521.6225
## 140 47566721.7621 nan 0.1000 -172582.3839
## 150 47307627.3069 nan 0.1000 14941.6953
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 63369569.5020 nan 0.1000 287546.2003
## 2 62684602.1379 nan 0.1000 544191.5133
## 3 62209418.3793 nan 0.1000 -15245.4881
## 4 61342445.4873 nan 0.1000 -61413.0519
## 5 60708385.3285 nan 0.1000 136495.6383
## 6 60104374.4751 nan 0.1000 -68615.3401
## 7 59558436.7593 nan 0.1000 -149244.6786
## 8 59296293.2853 nan 0.1000 16488.5315
## 9 59009338.1459 nan 0.1000 -19509.8144
## 10 58722843.7726 nan 0.1000 -14606.8158
## 20 55825472.8258 nan 0.1000 -187258.9529
## 40 52200806.4432 nan 0.1000 -335456.5447
## 60 49945795.7220 nan 0.1000 -224667.9492
## 80 47740092.1342 nan 0.1000 -291999.0280
## 100 46128751.9716 nan 0.1000 -342165.1825
## 120 44477462.6093 nan 0.1000 -173573.5566
## 140 42982472.8301 nan 0.1000 -173228.9748
## 150 42266263.3228 nan 0.1000 -123374.6155
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 61722748.1842 nan 0.1000 -58311.4506
## 2 61444063.1655 nan 0.1000 67111.0253
## 3 61292321.3404 nan 0.1000 -9670.3428
## 4 60904845.2135 nan 0.1000 258979.5924
## 5 60463312.9537 nan 0.1000 -160330.8167
## 6 60334564.7980 nan 0.1000 -96513.2707
## 7 60114379.0941 nan 0.1000 209336.6560
## 8 59867098.6070 nan 0.1000 162800.6981
## 9 59560243.6981 nan 0.1000 -182353.5260
## 10 59397736.7839 nan 0.1000 23347.1638
## 20 58154133.3214 nan 0.1000 -182502.8065
## 40 57176220.8311 nan 0.1000 -375776.8573
## 60 56430031.7028 nan 0.1000 -20296.7819
## 80 55270973.4844 nan 0.1000 -205598.4379
## 100 54815172.8912 nan 0.1000 -158761.9683
## 120 54168696.2857 nan 0.1000 -70805.8870
## 140 53654648.5812 nan 0.1000 -182157.9761
## 150 53272664.4876 nan 0.1000 -346347.3236
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 61742572.4384 nan 0.1000 360731.1021
## 2 61319218.5495 nan 0.1000 321079.8199
## 3 60930069.3797 nan 0.1000 92362.4347
## 4 60691073.7808 nan 0.1000 -65747.7811
## 5 59863272.8315 nan 0.1000 -135684.6573
## 6 59658509.0360 nan 0.1000 3891.1120
## 7 59318685.7940 nan 0.1000 231928.1649
## 8 59059847.4691 nan 0.1000 -11922.1633
## 9 58542033.4743 nan 0.1000 -141607.4021
## 10 58172311.9263 nan 0.1000 -28971.3531
## 20 56241977.1385 nan 0.1000 -62158.5923
## 40 53621136.8237 nan 0.1000 -77319.2964
## 60 52365230.4222 nan 0.1000 -195830.8060
## 80 51136355.7956 nan 0.1000 -171189.8751
## 100 49267473.0459 nan 0.1000 -208381.8033
## 120 48002174.5657 nan 0.1000 -79359.5649
## 140 47042880.8101 nan 0.1000 -80807.3893
## 150 46604823.7837 nan 0.1000 -206773.8107
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 61361732.3182 nan 0.1000 22918.5390
## 2 60527180.2921 nan 0.1000 21008.1028
## 3 59829524.7755 nan 0.1000 -165263.0243
## 4 59350835.2167 nan 0.1000 -33096.5772
## 5 58975561.6758 nan 0.1000 91619.4672
## 6 58448858.3458 nan 0.1000 -214008.9410
## 7 58156173.1259 nan 0.1000 -68820.7895
## 8 57354028.8176 nan 0.1000 376110.2793
## 9 57015859.2136 nan 0.1000 -211659.9204
## 10 56802151.1642 nan 0.1000 -88442.6351
## 20 53967496.9270 nan 0.1000 -209140.2222
## 40 51044634.7436 nan 0.1000 -237544.2832
## 60 48952864.9747 nan 0.1000 -187874.9891
## 80 46736305.9114 nan 0.1000 -195424.6276
## 100 44827435.1396 nan 0.1000 42044.1723
## 120 43714208.8820 nan 0.1000 -151743.5425
## 140 42291031.6747 nan 0.1000 -260554.1759
## 150 41842287.7746 nan 0.1000 -115608.4363
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 45359626.6870 nan 0.1000 152375.5121
## 2 45131413.6063 nan 0.1000 40264.6931
## 3 44842867.0793 nan 0.1000 -170.7789
## 4 44658882.5694 nan 0.1000 188398.1732
## 5 44398523.0187 nan 0.1000 21026.6723
## 6 44211472.3245 nan 0.1000 -62307.1540
## 7 44061510.4699 nan 0.1000 42235.2246
## 8 43872108.9692 nan 0.1000 81727.1423
## 9 43699498.6605 nan 0.1000 126660.4728
## 10 43566532.2997 nan 0.1000 -46071.8861
## 20 42555610.5080 nan 0.1000 39958.4089
## 40 41600821.5580 nan 0.1000 -65711.2295
## 60 41116478.9935 nan 0.1000 -88685.5627
## 80 40770096.9544 nan 0.1000 -102054.1074
## 100 40509981.0456 nan 0.1000 -107386.4962
## 120 40269251.3421 nan 0.1000 -77379.0889
## 140 40148802.4980 nan 0.1000 -55677.7440
## 150 39991295.5460 nan 0.1000 -120621.3023
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 45269568.8170 nan 0.1000 141093.3922
## 2 44818138.5969 nan 0.1000 119129.0935
## 3 44495913.5877 nan 0.1000 120907.0369
## 4 44192413.1497 nan 0.1000 104588.6299
## 5 43881499.9987 nan 0.1000 104176.2211
## 6 43701537.2930 nan 0.1000 142322.9591
## 7 43512783.3041 nan 0.1000 -26614.5970
## 8 43313503.9783 nan 0.1000 -9222.1103
## 9 43097376.3834 nan 0.1000 1710.4018
## 10 42991158.6154 nan 0.1000 35189.6813
## 20 40831168.0092 nan 0.1000 31963.2064
## 40 38200116.5546 nan 0.1000 6301.9427
## 60 36761188.7320 nan 0.1000 -45024.5309
## 80 36050676.4381 nan 0.1000 -25575.7023
## 100 34850106.4707 nan 0.1000 -78042.0243
## 120 34240235.7902 nan 0.1000 -73781.0716
## 140 33632433.8315 nan 0.1000 -143496.7814
## 150 33123151.9142 nan 0.1000 -37703.4616
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 44860379.2244 nan 0.1000 335818.3685
## 2 44552761.9917 nan 0.1000 109045.7013
## 3 44297227.8064 nan 0.1000 147136.9047
## 4 43742997.3767 nan 0.1000 129320.2471
## 5 43305030.9107 nan 0.1000 34272.0629
## 6 43025077.6160 nan 0.1000 83304.7097
## 7 42569335.5351 nan 0.1000 122168.1302
## 8 42159800.5583 nan 0.1000 -47541.4092
## 9 41832408.4814 nan 0.1000 143720.8269
## 10 41550780.6300 nan 0.1000 120019.4003
## 20 39384501.6341 nan 0.1000 -129400.7025
## 40 36594556.7323 nan 0.1000 -29488.3701
## 60 35199305.1595 nan 0.1000 -131734.5064
## 80 33598775.0705 nan 0.1000 -31885.4157
## 100 32405299.5829 nan 0.1000 -106593.0136
## 120 31643660.2913 nan 0.1000 -92378.3149
## 140 30459562.0953 nan 0.1000 -132838.8905
## 150 30204680.8710 nan 0.1000 -79761.3002
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 57627640.1786 nan 0.1000 -11927.1630
## 2 57461052.5242 nan 0.1000 -103953.5982
## 3 57182495.7558 nan 0.1000 184411.1647
## 4 56867859.4972 nan 0.1000 43467.4084
## 5 56537115.3978 nan 0.1000 -77690.9639
## 6 56303884.8874 nan 0.1000 156197.1188
## 7 56110725.3429 nan 0.1000 138691.7657
## 8 55935822.6258 nan 0.1000 -104938.1087
## 9 55791581.3300 nan 0.1000 17400.1433
## 10 55631480.0037 nan 0.1000 70018.8459
## 20 54763737.4088 nan 0.1000 -19426.9318
## 40 53829618.3410 nan 0.1000 -64323.2236
## 50 53397196.7901 nan 0.1000 100716.1634
bt_fit
## Stochastic Gradient Boosting
##
## 4939 samples
## 58 predictor
##
## Pre-processing: centered (58), scaled (58)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 3952, 3951, 3951, 3952, 3950
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 7395.282 0.02085300 2865.590
## 1 100 7404.537 0.02388135 2878.526
## 1 150 7418.735 0.02181471 2868.734
## 2 50 7410.129 0.02126213 2877.848
## 2 100 7460.038 0.02148872 2904.557
## 2 150 7479.911 0.02126598 2912.081
## 3 50 7433.061 0.02215608 2862.701
## 3 100 7494.895 0.02224647 2867.994
## 3 150 7522.705 0.02286536 2896.889
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## Tuning parameter 'n.minobsinnode' was held
## constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode
## = 10.
Comparison - Jordan
Finally, let’s compare our four models: 2 linear models, 1 random forest model, and 1 boosted tree model.
# random forest prediction on testing model and its performance
predRF <- predict(ranfor, newdata = Testing)
RF <- postResample(predRF, Testing$shares)
# linear model 1 prediction on testing model and its performance
predlm1 <- predict(fit1, newdata = Testing)
LM <- postResample(predlm1, Testing$shares)
# linear model 2 prediction on testing model and its performance
predlm2 <- predict(lm_fit, newdata = Testing)
LM2 <- postResample(predlm2, Testing$shares)
# boosted tree prediction on testing model and its performance
predbt <- predict(bt_fit, newdata = Testing)
BT <- postResample(predbt, Testing$shares)
# combine each of the performance stats for the models and add a column with the model names
dat <- data.frame(rbind(t(data.frame(LM)), t(data.frame(RF)), t(data.frame(LM2)), t(data.frame(BT))))
df <- as_tibble(rownames_to_column(dat, "models"))
# find the model with the lowesr RMSE
best <- df %>% filter(RMSE == min(RMSE)) %>% select(models)
# print "The Best fitting model according to RMSE is [insert model name for lowest RMSE here]"
paste("The Best fitting model according to RMSE is", best$models, sep = " ")
## [1] "The Best fitting model according to RMSE is RF"
Automation - Jonathan
#rmarkdown::render(
# "Tanley-Wood-Project2.Rmd",
# output_format="github_document",
# output_dir="./Analysis",
# output_options = list(
# html_preview = FALSE
# )
#)