Tanley-Wood-Project2
Jordan Tanley and Jonathan Wood 2022-07-05
Introduction - Jonathan
Data
The data in this analysis will be the online news popularity dataset. This data has a set of features on articles from Mashable.com over a two year period.
The goal of this project is to determine the number of shares (how many times the article was shared over social media) the article has. We will use this information to predict if an article can be popular by the number of shares.
Notable Variables
While there are 61 variables in the data set, we will not use all of them for this project. The notable variables are the following:
- “shares” - the number of shares the article has gotten over social media. This is the label or variable we want our models to predict for new articles
- “data_channel_is” - a set of variables that tells if the article is in a particular category, such as business, sports, or lifestyle.
- “weekday_is” - a set of variables that tells what day of the week the article was published on.
- “num_keywords” - the number of keywords within the article
- “num_images” - the number of images within the article
- “num_videos” - the number of videos within the article
Methods
Multiple methods will be used for this project to predict the number of shares a new article can generate, including
- Linear regression
- Tree-based models
- Random forest
- Boosted tree
Data - Jordan
In order to read in the data using a relative path, be sure to have the data file saved in your working directory.
# read in the data
news <- read_csv("OnlineNewsPopularity/OnlineNewsPopularity.csv")
## Rows: 39644 Columns: 61
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): url
## dbl (60): timedelta, n_tokens_title, n_tokens_content, n_unique_tokens, n_non_stop_words, n_non_stop_unique_token...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# sneek peek at the dataset
head(news)
# Creating a weekday variable (basically undoing the 7 dummy variables that came with the data) for EDA
news$weekday <- ifelse(news$weekday_is_friday == 1, "Friday",
ifelse(news$weekday_is_monday == 1, "Monday",
ifelse(news$weekday_is_tuesday == 1, "Tuesday",
ifelse(news$weekday_is_wednesday == 1, "Wednesday",
ifelse(news$weekday_is_thursday == 1, "Thursday",
ifelse(news$weekday_is_saturday == 1, "Saturday",
"Sunday"))))))
Next, let’s subset the data so that we can only look at the data channel of interest. We will look at articles with the “Social Media” data channel.
# Subset the data to one of the parameterized data channels and drop unnecessary variables
chan <- paste0("data_channel_is_", params$channel)
print(chan)
## [1] "data_channel_is_bus"
filtered_channel <- news %>%
as_tibble() %>%
filter(news[chan] == 1) %>%
select(-c(url, timedelta))
# take a peek at the data
filtered_channel %>%
select(ends_with(chan))
Summarizations - Both (at least 3 plots each)
For the numerical summaries, we can look at several aspects. Contingency tables allow us to examine frequencies of categorical variables. The first output below, for example, shows the counts for each weekday. Similarly, the fifth table outputted shows the frequencies of number of tokens in the article content. Another set of summary statistics to look at are the 5 Number Summaries. These provide the minmum, 1st quantile, median, 3rd quantile, and maximum for a particular variable. Additionally, it may also be helful to look at the average. These are helpful in determining the skewness (if mean = median vs. mean < or > median) and helps in looking for outliers (anything outside (Q3 - Q1)1.5 from the median is generally considered an outlier). Below, the 5 Number summaries (plus mean) are shown for Shares, Number of words in the content, Number of words in the content for the upper quantile of Shares, number of images in the article, number of videos in the article, positive word rate, and negative word rate.
# Contingency table of frequencies for days of the week, added caption for clarity
kable(table(filtered_channel$weekday),
col.names = c("Weekday", "Frequency"),
caption = "Contingency table of frequencies for days of the week")
Weekday | Frequency |
---|---|
Friday | 832 |
Monday | 1153 |
Saturday | 243 |
Sunday | 343 |
Thursday | 1234 |
Tuesday | 1182 |
Wednesday | 1271 |
Contingency table of frequencies for days of the week
# Numerical Summary of Shares, added caption for clarity
filtered_channel %>% summarise(Minimum = min(shares),
Q1 = quantile(shares, prob = 0.25),
Average = mean(shares),
Median = median(shares),
Q3 = quantile(shares, prob = 0.75),
Maximum = max(shares)) %>%
kable(caption = "Numerical Summary of Shares")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
1 | 952.25 | 3063.019 | 1400 | 2500 | 690400 |
Numerical Summary of Shares
# Numerical Summary of Number of words in the content, added caption for clarity
filtered_channel %>% summarise(Minimum = min(n_tokens_content),
Q1 = quantile(n_tokens_content, prob = 0.25),
Average = mean(n_tokens_content),
Median = median(n_tokens_content),
Q3 = quantile(n_tokens_content, prob = 0.75),
Maximum = max(n_tokens_content)) %>%
kable(caption = "Numerical Summary of Number of words in the content")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 244 | 539.8714 | 400 | 727 | 6336 |
Numerical Summary of Number of words in the content
# Numerical Summary of Number of words in the content for the upper quantile of Shares, added caption for clarity
filtered_channel %>% filter(shares > quantile(shares, prob = 0.75)) %>%
summarise(Minimum = min(n_tokens_content),
Q1 = quantile(n_tokens_content, prob = 0.25),
Average = mean(n_tokens_content),
Median = median(n_tokens_content),
Q3 = quantile(n_tokens_content, prob = 0.75),
Maximum = max(n_tokens_content)) %>%
kable(caption = "Numerical Summary of Number of words in the content for the upper quantile of Shares")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 263.25 | 685.6176 | 567 | 948 | 6336 |
Numerical Summary of Number of words in the content for the upper quantile of Shares
kable(table(filtered_channel$n_tokens_content),
col.names = c("Tokens", "Frequency"),
caption = "Contingency table of frequencies for number of tokens in the article content")
Tokens | Frequency |
---|---|
0 | 23 |
47 | 1 |
50 | 1 |
61 | 1 |
67 | 1 |
72 | 1 |
73 | 1 |
74 | 1 |
76 | 2 |
78 | 1 |
80 | 1 |
81 | 1 |
83 | 2 |
84 | 2 |
85 | 1 |
86 | 1 |
87 | 2 |
88 | 1 |
89 | 5 |
90 | 1 |
91 | 4 |
92 | 3 |
93 | 1 |
94 | 3 |
95 | 4 |
96 | 4 |
97 | 4 |
98 | 2 |
99 | 2 |
100 | 5 |
101 | 9 |
102 | 2 |
103 | 8 |
104 | 7 |
105 | 4 |
106 | 5 |
107 | 1 |
108 | 2 |
109 | 7 |
110 | 9 |
111 | 5 |
112 | 6 |
113 | 12 |
114 | 5 |
115 | 6 |
116 | 9 |
117 | 9 |
118 | 14 |
119 | 8 |
120 | 5 |
121 | 4 |
122 | 3 |
123 | 6 |
124 | 6 |
125 | 5 |
126 | 8 |
127 | 8 |
128 | 12 |
129 | 5 |
130 | 7 |
131 | 3 |
132 | 8 |
133 | 7 |
134 | 7 |
135 | 5 |
136 | 8 |
137 | 11 |
138 | 9 |
139 | 8 |
140 | 5 |
141 | 8 |
142 | 11 |
143 | 7 |
144 | 9 |
145 | 6 |
146 | 10 |
147 | 7 |
148 | 6 |
149 | 8 |
150 | 10 |
151 | 10 |
152 | 5 |
153 | 8 |
154 | 11 |
155 | 15 |
156 | 13 |
157 | 4 |
158 | 11 |
159 | 12 |
160 | 7 |
161 | 8 |
162 | 8 |
163 | 9 |
164 | 8 |
165 | 11 |
166 | 18 |
167 | 8 |
168 | 17 |
169 | 9 |
170 | 13 |
171 | 9 |
172 | 13 |
173 | 14 |
174 | 8 |
175 | 9 |
176 | 12 |
177 | 12 |
178 | 14 |
179 | 10 |
180 | 14 |
181 | 14 |
182 | 15 |
183 | 14 |
184 | 8 |
185 | 15 |
186 | 11 |
187 | 11 |
188 | 6 |
189 | 11 |
190 | 16 |
191 | 11 |
192 | 16 |
193 | 10 |
194 | 11 |
195 | 7 |
196 | 11 |
197 | 16 |
198 | 14 |
199 | 10 |
200 | 13 |
201 | 9 |
202 | 7 |
203 | 9 |
204 | 20 |
205 | 16 |
206 | 11 |
207 | 14 |
208 | 17 |
209 | 15 |
210 | 6 |
211 | 11 |
212 | 17 |
213 | 11 |
214 | 11 |
215 | 15 |
216 | 18 |
217 | 11 |
218 | 9 |
219 | 17 |
220 | 17 |
221 | 13 |
222 | 15 |
223 | 22 |
224 | 18 |
225 | 16 |
226 | 9 |
227 | 14 |
228 | 10 |
229 | 11 |
230 | 15 |
231 | 11 |
232 | 18 |
233 | 11 |
234 | 13 |
235 | 14 |
236 | 10 |
237 | 11 |
238 | 8 |
239 | 10 |
240 | 14 |
241 | 14 |
242 | 15 |
243 | 12 |
244 | 17 |
245 | 14 |
246 | 11 |
247 | 13 |
248 | 11 |
249 | 13 |
250 | 8 |
251 | 11 |
252 | 10 |
253 | 9 |
254 | 9 |
255 | 14 |
256 | 20 |
257 | 12 |
258 | 12 |
259 | 9 |
260 | 18 |
261 | 9 |
262 | 19 |
263 | 17 |
264 | 10 |
265 | 11 |
266 | 11 |
267 | 13 |
268 | 15 |
269 | 14 |
270 | 6 |
271 | 11 |
272 | 12 |
273 | 13 |
274 | 12 |
275 | 17 |
276 | 9 |
277 | 15 |
278 | 10 |
279 | 14 |
280 | 5 |
281 | 12 |
282 | 14 |
283 | 11 |
284 | 7 |
285 | 6 |
286 | 17 |
287 | 10 |
288 | 11 |
289 | 11 |
290 | 11 |
291 | 16 |
292 | 13 |
293 | 13 |
294 | 18 |
295 | 13 |
296 | 11 |
297 | 11 |
298 | 11 |
299 | 15 |
300 | 8 |
301 | 18 |
302 | 13 |
303 | 13 |
304 | 12 |
305 | 7 |
306 | 9 |
307 | 14 |
308 | 10 |
309 | 12 |
310 | 7 |
311 | 13 |
312 | 11 |
313 | 11 |
314 | 10 |
315 | 14 |
316 | 10 |
317 | 10 |
318 | 13 |
319 | 10 |
320 | 9 |
321 | 8 |
322 | 10 |
323 | 8 |
324 | 11 |
325 | 8 |
326 | 14 |
327 | 18 |
328 | 8 |
329 | 4 |
330 | 7 |
331 | 3 |
332 | 11 |
333 | 9 |
334 | 11 |
335 | 14 |
336 | 7 |
337 | 9 |
338 | 9 |
339 | 6 |
340 | 6 |
341 | 10 |
342 | 7 |
343 | 10 |
344 | 8 |
345 | 6 |
346 | 5 |
347 | 10 |
348 | 9 |
349 | 7 |
350 | 12 |
351 | 7 |
352 | 8 |
353 | 5 |
354 | 4 |
355 | 10 |
356 | 7 |
357 | 11 |
358 | 4 |
359 | 13 |
360 | 8 |
361 | 9 |
362 | 8 |
363 | 4 |
364 | 13 |
365 | 5 |
366 | 7 |
367 | 14 |
368 | 9 |
369 | 8 |
370 | 4 |
371 | 2 |
372 | 8 |
373 | 14 |
374 | 9 |
375 | 9 |
376 | 6 |
377 | 11 |
378 | 8 |
379 | 6 |
380 | 9 |
381 | 9 |
382 | 4 |
383 | 8 |
384 | 11 |
385 | 8 |
386 | 12 |
387 | 9 |
388 | 11 |
389 | 13 |
390 | 2 |
391 | 6 |
392 | 7 |
393 | 8 |
394 | 8 |
395 | 6 |
396 | 5 |
397 | 8 |
398 | 4 |
399 | 7 |
400 | 7 |
401 | 9 |
402 | 8 |
403 | 9 |
404 | 5 |
405 | 10 |
406 | 6 |
407 | 9 |
408 | 5 |
409 | 6 |
410 | 4 |
411 | 2 |
412 | 5 |
413 | 8 |
414 | 6 |
415 | 10 |
416 | 9 |
417 | 7 |
418 | 7 |
419 | 4 |
420 | 6 |
421 | 6 |
422 | 8 |
423 | 8 |
424 | 3 |
425 | 8 |
426 | 8 |
427 | 6 |
428 | 6 |
429 | 6 |
430 | 7 |
431 | 5 |
432 | 8 |
433 | 4 |
434 | 7 |
435 | 3 |
436 | 5 |
437 | 9 |
438 | 2 |
439 | 6 |
440 | 4 |
441 | 12 |
442 | 7 |
443 | 2 |
444 | 7 |
445 | 4 |
446 | 7 |
447 | 6 |
448 | 3 |
449 | 3 |
450 | 4 |
451 | 4 |
452 | 4 |
453 | 2 |
454 | 6 |
455 | 8 |
456 | 2 |
457 | 9 |
458 | 2 |
459 | 3 |
460 | 4 |
461 | 3 |
462 | 5 |
463 | 7 |
464 | 6 |
465 | 5 |
466 | 9 |
467 | 7 |
468 | 6 |
469 | 4 |
470 | 3 |
471 | 9 |
472 | 5 |
473 | 8 |
474 | 7 |
475 | 4 |
476 | 8 |
477 | 10 |
478 | 4 |
479 | 8 |
480 | 3 |
481 | 7 |
482 | 4 |
483 | 3 |
484 | 4 |
485 | 9 |
486 | 6 |
487 | 7 |
488 | 7 |
489 | 7 |
490 | 4 |
491 | 8 |
492 | 8 |
493 | 6 |
494 | 5 |
495 | 5 |
496 | 6 |
497 | 5 |
498 | 7 |
499 | 3 |
500 | 7 |
501 | 6 |
502 | 8 |
503 | 6 |
504 | 1 |
505 | 3 |
506 | 7 |
507 | 6 |
508 | 5 |
509 | 9 |
510 | 2 |
511 | 12 |
512 | 3 |
513 | 2 |
514 | 3 |
515 | 3 |
516 | 2 |
517 | 7 |
518 | 5 |
519 | 2 |
520 | 7 |
521 | 4 |
522 | 7 |
523 | 4 |
524 | 8 |
525 | 3 |
526 | 5 |
527 | 7 |
528 | 4 |
529 | 3 |
530 | 5 |
531 | 3 |
532 | 4 |
533 | 5 |
534 | 4 |
535 | 1 |
536 | 5 |
537 | 9 |
538 | 5 |
539 | 5 |
540 | 7 |
541 | 6 |
543 | 3 |
544 | 8 |
545 | 7 |
546 | 5 |
547 | 5 |
548 | 7 |
549 | 3 |
550 | 3 |
551 | 4 |
552 | 5 |
553 | 6 |
554 | 4 |
555 | 7 |
556 | 9 |
557 | 5 |
558 | 5 |
559 | 1 |
560 | 5 |
561 | 4 |
562 | 1 |
563 | 4 |
564 | 4 |
565 | 6 |
566 | 2 |
567 | 2 |
568 | 6 |
569 | 1 |
570 | 6 |
571 | 4 |
572 | 2 |
573 | 4 |
574 | 4 |
575 | 5 |
576 | 7 |
577 | 6 |
578 | 9 |
579 | 8 |
580 | 4 |
581 | 4 |
582 | 6 |
583 | 1 |
584 | 4 |
585 | 2 |
586 | 6 |
587 | 2 |
588 | 7 |
589 | 3 |
590 | 5 |
591 | 3 |
592 | 10 |
593 | 3 |
594 | 4 |
595 | 8 |
596 | 5 |
597 | 2 |
598 | 4 |
599 | 5 |
600 | 4 |
601 | 2 |
602 | 4 |
603 | 5 |
604 | 7 |
605 | 7 |
606 | 3 |
607 | 5 |
608 | 4 |
609 | 3 |
610 | 4 |
611 | 5 |
612 | 7 |
613 | 5 |
614 | 3 |
615 | 3 |
616 | 3 |
617 | 2 |
618 | 6 |
619 | 1 |
620 | 7 |
621 | 2 |
622 | 5 |
623 | 5 |
624 | 3 |
625 | 6 |
626 | 3 |
627 | 4 |
628 | 4 |
629 | 6 |
630 | 4 |
631 | 5 |
632 | 8 |
633 | 5 |
634 | 6 |
635 | 5 |
636 | 4 |
637 | 3 |
638 | 3 |
639 | 4 |
640 | 5 |
641 | 3 |
642 | 4 |
643 | 6 |
644 | 5 |
645 | 9 |
646 | 4 |
647 | 5 |
648 | 2 |
649 | 1 |
650 | 4 |
651 | 6 |
652 | 2 |
653 | 3 |
654 | 2 |
655 | 3 |
657 | 5 |
658 | 3 |
659 | 8 |
660 | 5 |
661 | 5 |
662 | 4 |
663 | 6 |
664 | 7 |
665 | 4 |
666 | 5 |
667 | 7 |
668 | 5 |
669 | 1 |
670 | 5 |
671 | 6 |
672 | 6 |
673 | 3 |
674 | 3 |
675 | 3 |
676 | 1 |
677 | 3 |
678 | 3 |
679 | 6 |
680 | 5 |
681 | 2 |
682 | 1 |
683 | 4 |
684 | 1 |
685 | 2 |
686 | 3 |
687 | 3 |
688 | 1 |
689 | 1 |
690 | 3 |
691 | 1 |
692 | 2 |
693 | 2 |
694 | 3 |
695 | 3 |
696 | 5 |
697 | 3 |
698 | 3 |
699 | 3 |
700 | 8 |
701 | 2 |
702 | 4 |
703 | 4 |
704 | 3 |
705 | 5 |
706 | 6 |
707 | 5 |
708 | 8 |
709 | 5 |
710 | 3 |
711 | 4 |
712 | 5 |
713 | 3 |
714 | 4 |
715 | 1 |
717 | 3 |
718 | 4 |
719 | 7 |
720 | 4 |
721 | 6 |
722 | 2 |
723 | 2 |
724 | 1 |
725 | 1 |
726 | 2 |
727 | 5 |
728 | 6 |
729 | 3 |
730 | 6 |
731 | 5 |
732 | 5 |
733 | 6 |
734 | 1 |
736 | 3 |
737 | 1 |
738 | 3 |
739 | 6 |
741 | 6 |
742 | 1 |
743 | 4 |
744 | 2 |
745 | 5 |
746 | 4 |
747 | 2 |
748 | 1 |
749 | 4 |
750 | 2 |
751 | 5 |
752 | 2 |
753 | 5 |
754 | 1 |
755 | 3 |
756 | 1 |
757 | 3 |
758 | 4 |
759 | 2 |
760 | 3 |
761 | 6 |
762 | 5 |
763 | 1 |
764 | 4 |
766 | 6 |
767 | 5 |
768 | 4 |
769 | 3 |
770 | 1 |
771 | 4 |
773 | 1 |
774 | 2 |
775 | 2 |
776 | 2 |
777 | 9 |
778 | 2 |
779 | 3 |
780 | 7 |
781 | 5 |
782 | 5 |
783 | 5 |
785 | 5 |
786 | 1 |
787 | 4 |
788 | 5 |
789 | 1 |
790 | 3 |
791 | 7 |
792 | 5 |
793 | 1 |
794 | 2 |
795 | 4 |
796 | 3 |
797 | 2 |
798 | 4 |
799 | 4 |
800 | 1 |
801 | 3 |
802 | 4 |
803 | 2 |
804 | 6 |
805 | 3 |
806 | 4 |
808 | 1 |
809 | 3 |
810 | 3 |
811 | 4 |
812 | 2 |
813 | 1 |
814 | 5 |
815 | 3 |
817 | 5 |
818 | 1 |
819 | 2 |
820 | 3 |
821 | 3 |
822 | 5 |
823 | 2 |
824 | 5 |
825 | 1 |
826 | 7 |
827 | 3 |
828 | 4 |
829 | 4 |
830 | 2 |
831 | 3 |
832 | 4 |
833 | 4 |
834 | 3 |
835 | 5 |
836 | 4 |
837 | 2 |
838 | 1 |
839 | 3 |
840 | 3 |
841 | 1 |
842 | 3 |
843 | 3 |
844 | 4 |
846 | 1 |
847 | 4 |
848 | 4 |
849 | 3 |
850 | 6 |
851 | 4 |
852 | 3 |
853 | 2 |
854 | 5 |
855 | 1 |
856 | 2 |
858 | 3 |
860 | 2 |
861 | 2 |
863 | 3 |
865 | 1 |
866 | 3 |
867 | 3 |
868 | 2 |
869 | 2 |
870 | 4 |
871 | 2 |
872 | 1 |
873 | 4 |
874 | 1 |
875 | 3 |
876 | 5 |
877 | 2 |
878 | 3 |
879 | 6 |
880 | 4 |
881 | 1 |
882 | 5 |
883 | 2 |
884 | 2 |
885 | 3 |
886 | 3 |
887 | 2 |
888 | 4 |
889 | 3 |
890 | 3 |
891 | 2 |
892 | 4 |
893 | 6 |
894 | 1 |
895 | 3 |
896 | 4 |
897 | 4 |
898 | 2 |
899 | 3 |
900 | 6 |
901 | 3 |
902 | 3 |
903 | 2 |
904 | 4 |
905 | 2 |
906 | 3 |
907 | 2 |
908 | 5 |
909 | 4 |
910 | 1 |
911 | 5 |
912 | 1 |
913 | 3 |
914 | 4 |
915 | 2 |
916 | 2 |
917 | 1 |
918 | 6 |
919 | 4 |
920 | 3 |
921 | 1 |
922 | 4 |
924 | 1 |
925 | 3 |
926 | 4 |
927 | 5 |
928 | 5 |
929 | 4 |
930 | 5 |
931 | 4 |
932 | 4 |
933 | 4 |
934 | 6 |
936 | 4 |
937 | 5 |
938 | 4 |
939 | 4 |
940 | 4 |
942 | 2 |
944 | 6 |
945 | 4 |
946 | 3 |
947 | 3 |
948 | 4 |
949 | 1 |
950 | 4 |
951 | 7 |
952 | 7 |
953 | 2 |
954 | 2 |
955 | 3 |
956 | 3 |
957 | 3 |
958 | 1 |
959 | 2 |
960 | 3 |
961 | 3 |
962 | 4 |
963 | 3 |
964 | 2 |
965 | 3 |
966 | 2 |
967 | 1 |
968 | 4 |
969 | 2 |
970 | 2 |
971 | 1 |
972 | 3 |
973 | 2 |
974 | 7 |
975 | 1 |
976 | 7 |
977 | 2 |
979 | 5 |
980 | 4 |
981 | 1 |
982 | 4 |
983 | 2 |
984 | 1 |
985 | 1 |
986 | 2 |
987 | 1 |
988 | 2 |
989 | 6 |
990 | 4 |
991 | 2 |
992 | 1 |
993 | 1 |
995 | 2 |
996 | 3 |
997 | 1 |
998 | 3 |
999 | 4 |
1000 | 2 |
1001 | 3 |
1002 | 2 |
1003 | 1 |
1004 | 3 |
1005 | 4 |
1006 | 5 |
1007 | 2 |
1008 | 2 |
1009 | 4 |
1010 | 1 |
1011 | 5 |
1012 | 3 |
1013 | 1 |
1014 | 2 |
1015 | 5 |
1018 | 1 |
1019 | 3 |
1020 | 5 |
1021 | 1 |
1022 | 3 |
1023 | 3 |
1024 | 2 |
1025 | 2 |
1026 | 2 |
1027 | 7 |
1028 | 1 |
1029 | 1 |
1030 | 3 |
1031 | 2 |
1032 | 2 |
1033 | 1 |
1034 | 3 |
1035 | 2 |
1036 | 1 |
1037 | 2 |
1038 | 6 |
1039 | 3 |
1040 | 4 |
1041 | 3 |
1042 | 1 |
1043 | 1 |
1044 | 3 |
1045 | 3 |
1046 | 4 |
1047 | 3 |
1048 | 2 |
1049 | 2 |
1050 | 2 |
1051 | 1 |
1052 | 2 |
1053 | 2 |
1054 | 3 |
1055 | 2 |
1056 | 2 |
1057 | 4 |
1058 | 4 |
1059 | 1 |
1060 | 2 |
1061 | 3 |
1062 | 2 |
1063 | 3 |
1064 | 3 |
1066 | 1 |
1067 | 3 |
1068 | 1 |
1069 | 3 |
1070 | 2 |
1072 | 4 |
1073 | 1 |
1074 | 1 |
1075 | 2 |
1076 | 4 |
1077 | 2 |
1078 | 2 |
1079 | 5 |
1080 | 3 |
1081 | 2 |
1082 | 1 |
1083 | 2 |
1084 | 2 |
1085 | 3 |
1087 | 1 |
1088 | 2 |
1089 | 2 |
1090 | 1 |
1091 | 1 |
1093 | 4 |
1094 | 4 |
1095 | 2 |
1096 | 1 |
1097 | 4 |
1098 | 2 |
1099 | 1 |
1100 | 1 |
1101 | 1 |
1104 | 2 |
1105 | 3 |
1106 | 3 |
1107 | 1 |
1110 | 4 |
1111 | 1 |
1112 | 3 |
1113 | 2 |
1114 | 2 |
1117 | 1 |
1118 | 5 |
1119 | 1 |
1120 | 1 |
1121 | 1 |
1122 | 2 |
1123 | 2 |
1124 | 3 |
1125 | 1 |
1128 | 2 |
1134 | 1 |
1137 | 2 |
1138 | 1 |
1139 | 2 |
1140 | 1 |
1141 | 2 |
1142 | 1 |
1143 | 3 |
1144 | 2 |
1145 | 4 |
1146 | 4 |
1149 | 1 |
1150 | 1 |
1151 | 2 |
1152 | 1 |
1153 | 1 |
1156 | 1 |
1157 | 4 |
1158 | 3 |
1159 | 2 |
1160 | 1 |
1162 | 2 |
1163 | 1 |
1164 | 3 |
1166 | 4 |
1167 | 1 |
1168 | 2 |
1169 | 4 |
1170 | 1 |
1171 | 1 |
1173 | 2 |
1174 | 1 |
1175 | 1 |
1176 | 2 |
1177 | 2 |
1179 | 2 |
1181 | 1 |
1182 | 1 |
1183 | 3 |
1184 | 1 |
1185 | 2 |
1186 | 2 |
1187 | 1 |
1193 | 3 |
1195 | 1 |
1196 | 2 |
1197 | 1 |
1198 | 1 |
1199 | 1 |
1201 | 1 |
1203 | 2 |
1205 | 4 |
1206 | 2 |
1207 | 1 |
1209 | 3 |
1211 | 1 |
1213 | 4 |
1215 | 1 |
1216 | 1 |
1217 | 2 |
1218 | 3 |
1219 | 1 |
1221 | 1 |
1223 | 3 |
1225 | 1 |
1226 | 1 |
1227 | 1 |
1228 | 2 |
1230 | 1 |
1231 | 1 |
1234 | 1 |
1235 | 2 |
1237 | 3 |
1238 | 2 |
1241 | 2 |
1242 | 3 |
1243 | 1 |
1245 | 2 |
1249 | 3 |
1250 | 1 |
1255 | 2 |
1257 | 1 |
1259 | 1 |
1260 | 3 |
1262 | 1 |
1263 | 1 |
1269 | 1 |
1270 | 3 |
1271 | 2 |
1274 | 1 |
1277 | 2 |
1279 | 2 |
1280 | 1 |
1281 | 1 |
1282 | 1 |
1283 | 2 |
1284 | 1 |
1285 | 1 |
1289 | 2 |
1290 | 1 |
1291 | 1 |
1292 | 3 |
1293 | 1 |
1294 | 2 |
1295 | 3 |
1297 | 2 |
1299 | 1 |
1303 | 1 |
1307 | 1 |
1308 | 1 |
1310 | 1 |
1311 | 1 |
1312 | 1 |
1315 | 1 |
1316 | 1 |
1317 | 1 |
1318 | 2 |
1320 | 1 |
1321 | 1 |
1325 | 1 |
1328 | 1 |
1329 | 2 |
1331 | 1 |
1332 | 2 |
1338 | 1 |
1339 | 2 |
1343 | 1 |
1345 | 1 |
1346 | 2 |
1348 | 1 |
1353 | 1 |
1355 | 1 |
1356 | 1 |
1358 | 2 |
1359 | 2 |
1361 | 2 |
1363 | 1 |
1368 | 2 |
1369 | 1 |
1370 | 1 |
1372 | 1 |
1375 | 1 |
1379 | 1 |
1380 | 2 |
1381 | 2 |
1386 | 1 |
1388 | 1 |
1390 | 3 |
1391 | 1 |
1393 | 1 |
1394 | 1 |
1398 | 1 |
1399 | 1 |
1405 | 1 |
1408 | 1 |
1413 | 1 |
1415 | 1 |
1419 | 1 |
1423 | 1 |
1425 | 1 |
1426 | 3 |
1427 | 1 |
1438 | 1 |
1439 | 1 |
1442 | 2 |
1447 | 1 |
1449 | 1 |
1451 | 2 |
1454 | 1 |
1457 | 1 |
1461 | 1 |
1462 | 1 |
1465 | 1 |
1466 | 2 |
1468 | 1 |
1470 | 1 |
1473 | 1 |
1477 | 1 |
1478 | 1 |
1483 | 1 |
1484 | 1 |
1492 | 1 |
1493 | 2 |
1494 | 1 |
1499 | 1 |
1516 | 1 |
1518 | 1 |
1522 | 1 |
1525 | 1 |
1528 | 1 |
1529 | 1 |
1536 | 1 |
1541 | 1 |
1544 | 2 |
1549 | 1 |
1550 | 1 |
1551 | 1 |
1559 | 1 |
1560 | 1 |
1568 | 1 |
1569 | 1 |
1570 | 1 |
1571 | 1 |
1579 | 1 |
1580 | 1 |
1587 | 1 |
1588 | 1 |
1593 | 1 |
1600 | 1 |
1601 | 1 |
1607 | 2 |
1608 | 1 |
1611 | 1 |
1615 | 1 |
1617 | 1 |
1622 | 1 |
1641 | 1 |
1642 | 1 |
1643 | 1 |
1645 | 1 |
1648 | 2 |
1656 | 1 |
1661 | 1 |
1665 | 1 |
1666 | 1 |
1667 | 2 |
1668 | 1 |
1673 | 1 |
1675 | 1 |
1681 | 1 |
1682 | 1 |
1684 | 1 |
1687 | 1 |
1706 | 1 |
1722 | 2 |
1723 | 1 |
1727 | 1 |
1731 | 1 |
1735 | 1 |
1745 | 1 |
1751 | 1 |
1758 | 2 |
1761 | 1 |
1769 | 1 |
1770 | 1 |
1771 | 1 |
1777 | 2 |
1778 | 1 |
1785 | 1 |
1790 | 1 |
1794 | 2 |
1796 | 1 |
1800 | 1 |
1804 | 1 |
1806 | 1 |
1809 | 1 |
1817 | 1 |
1823 | 1 |
1829 | 1 |
1833 | 1 |
1836 | 1 |
1839 | 1 |
1853 | 1 |
1854 | 1 |
1855 | 1 |
1858 | 1 |
1859 | 1 |
1860 | 1 |
1881 | 1 |
1887 | 1 |
1898 | 1 |
1902 | 1 |
1906 | 1 |
1931 | 2 |
1945 | 2 |
1954 | 1 |
1981 | 1 |
1986 | 1 |
2001 | 1 |
2004 | 1 |
2008 | 1 |
2022 | 1 |
2026 | 1 |
2031 | 1 |
2032 | 1 |
2037 | 1 |
2076 | 2 |
2094 | 1 |
2097 | 1 |
2099 | 1 |
2100 | 1 |
2103 | 1 |
2119 | 1 |
2132 | 1 |
2134 | 1 |
2147 | 1 |
2159 | 1 |
2165 | 1 |
2171 | 1 |
2173 | 1 |
2184 | 1 |
2188 | 1 |
2197 | 1 |
2228 | 1 |
2238 | 1 |
2247 | 1 |
2248 | 1 |
2253 | 1 |
2280 | 1 |
2294 | 1 |
2334 | 1 |
2347 | 1 |
2369 | 1 |
2373 | 1 |
2387 | 1 |
2416 | 1 |
2419 | 1 |
2444 | 1 |
2453 | 1 |
2458 | 1 |
2475 | 1 |
2478 | 1 |
2492 | 1 |
2499 | 1 |
2525 | 1 |
2536 | 1 |
2560 | 1 |
2632 | 1 |
2642 | 1 |
2691 | 1 |
2711 | 1 |
2728 | 1 |
2732 | 1 |
2761 | 1 |
2772 | 1 |
2784 | 1 |
2791 | 1 |
2885 | 1 |
2910 | 1 |
2962 | 1 |
3023 | 1 |
3050 | 1 |
3074 | 1 |
3157 | 1 |
3222 | 1 |
3320 | 1 |
3351 | 1 |
3455 | 1 |
3560 | 1 |
3603 | 1 |
3650 | 1 |
3940 | 1 |
3974 | 1 |
4044 | 1 |
4115 | 1 |
4119 | 1 |
4452 | 1 |
4747 | 1 |
4894 | 1 |
6336 | 1 |
Contingency table of frequencies for number of tokens in the article content
# Summarizing the number of images in the article
filtered_channel %>%
summarise(Minimum = min(num_imgs),
Q1 = quantile(num_imgs, prob = 0.25),
Average = mean(num_imgs),
Median = median(num_imgs),
Q3 = quantile(num_imgs, prob = 0.75),
Maximum = max(num_imgs)) %>%
kable(caption = "Numerical summary of number of images in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 1 | 1.808405 | 1 | 1 | 51 |
Numerical summary of number of images in an article
# Summarizing the number of videos in the article
filtered_channel %>%
summarise(Minimum = min(num_videos),
Q1 = quantile(num_videos, prob = 0.25),
Average = mean(num_videos),
Median = median(num_videos),
Q3 = quantile(num_videos, prob = 0.75),
Maximum = max(num_videos)) %>%
kable(caption = "Numerical summary of number of videos in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0 | 0.6364653 | 0 | 0 | 75 |
Numerical summary of number of videos in an article
# Summarizing the number of positive word rate
filtered_channel %>%
summarise(Minimum = min(rate_positive_words),
Q1 = quantile(rate_positive_words, prob = 0.25),
Average = mean(rate_positive_words),
Median = median(rate_positive_words),
Q3 = quantile(rate_positive_words, prob = 0.75),
Maximum = max(rate_positive_words)) %>%
kable(caption = "Numerical Summary of the rate of positive words in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0.6666667 | 0.7377051 | 0.75 | 0.8333333 | 1 |
Numerical Summary of the rate of positive words in an article
# Summarizing the number of negative word rate
filtered_channel %>%
summarise(Minimum = min(rate_negative_words),
Q1 = quantile(rate_negative_words, prob = 0.25),
Average = mean(rate_negative_words),
Median = median(rate_negative_words),
Q3 = quantile(rate_negative_words, prob = 0.75),
Maximum = max(rate_negative_words)) %>%
kable(caption = "Numerical Summary of the rate of negative words in an article")
Minimum | Q1 | Average | Median | Q3 | Maximum |
---|---|---|---|---|---|
0 | 0.1666667 | 0.2583 | 0.25 | 0.3333333 | 1 |
Numerical Summary of the rate of negative words in an article
The graphical summaries more dramatically show the trends in the data, including skewness and outliers. The boxplots below show a visual representation of the 5 Number summaries for Shares, split up by weekday, and shares split up by text sentiment polarity. Boxplots make it even easier to look out for outliers (look for the dots separated from the main boxplot). Next, we can examine several scatterplots. Scatterplots allow us to look at one numerical variable vs another to see if there is any correlation between them. Look out for any plots that have most of the points on a diagonal line! There are four scatterplots below, investigating shares vs Number of words in the content, Number of words in the title, rate of positive words, and rate of negative words. Finally, a histogram can show the overall distribution of a numerical variable, including skewness. The histogram below sows the distribution of the shares variable. Look for a left or right tail to signify skewness, and look out for multiple peaks to signify a multi-modal variable.
# Boxplot of Shares for Each Weekday, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = weekday, y = shares)) +
geom_boxplot(fill = "grey") +
labs(x = "Weekday", title = "Boxplot of Shares for Each Weekday", y = "Shares") +
theme_classic()
# Scatterplot of Number of words in the content vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_content, y = shares)) +
geom_point(color = "grey") +
labs(x = "Number of words in the content", y = "Shares",
title = "Scatterplot of Number of words in the content vs Shares") +
theme_classic()
# Scatterplot of Number of words in the title vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_title, y = shares)) +
geom_point(color = "grey") +
labs(x = "Number of words in the title", y = "Shares",
title = "Scatterplot of Number of words in the title vs Shares") +
theme_classic()
ggplot(filtered_channel, aes(x=shares)) +
geom_histogram(color="grey", binwidth = 2000) +
labs(x = "Shares",
title = "Histogram of number of shares") +
theme_classic()
ggplot(filtered_channel, aes(x=rate_positive_words, y=shares)) +
geom_point(color="grey") +
labs(x = "rate of positive words in an article", y = "Shares",
title = "Scatterplot of rate of positive words in an article vs shares") +
theme_classic()
ggplot(filtered_channel, aes(x=rate_negative_words, y=shares)) +
geom_point(color="grey") +
labs(x = "rate of negative words in an article", y = "Shares",
title = "Scatterplot of rate of negative words in an article vs shares") +
theme_classic()
ggplot(filtered_channel, aes(x=global_sentiment_polarity, y=shares)) +
geom_point(color="grey") +
labs(x = "global sentiment polarity in an article", y = "Shares",
title = "Scatterplot of global sentiment polarity in an article vs shares") +
theme_classic()
# drop the weekday variable created for EDA (will get in the way for our models if we don't drop it)
filtered_channel <- subset(filtered_channel, select = -c(weekday))
Modeling
Splitting the Data
First, let’s split up the data into a testing set and a training set using the proportions: 70% training and 30% testing.
set.seed(9876)
# Split the data into a training and test set (70/30 split)
# indices
train <- sample(1:nrow(filtered_channel), size = nrow(filtered_channel)*.70)
test <- setdiff(1:nrow(filtered_channel), train)
# training and testing subsets
Training <- filtered_channel[train, ]
Testing <- filtered_channel[test, ]
Linear Models
Linear regression models allow us to look at relationships between one response variable and several explanatory variables. A model can also include interaction terms and even higher order terms. The general form for a linear model is , where each represents a predictor variable and the “…” can include more predictors, interactions and/or higher order terms. Since our goal is to predict shares, we will be using these models to predict of a subset of the data created for training, and then we will later test the models on the other subsetted data set aside for testing.
Linear Model #1: - Jordan
# linear model on training dataset with 5-fold cv
fit1 <- train(shares ~ . , data = Training, method = "lm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5))
Linear Model #2: - Jonathan
lm_fit <- train(
shares ~ .^2,
data=Training,
method="lm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5)
)
Random Forest - Jordan
Random Forest is a tree based method for fitting predictive models, that averages across all trees. One may choose to use a tree based methood due to their prediction accuracy, the fact that predictors do not need to be scaled, no statistical assumptions, and a built-in variable selection process. Random forest, in particular, randomly selects a subset of predictors. This corrects the bagging issue where every bootstrap contains a strong predictor for the first split.
# fandom forest model on training dataset with 5-fold cv
ranfor <- train(shares ~ ., data = Training, method = "rf", preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5),
tuneGrid = expand.grid(mtry = c(1:round(ncol(Training)/3))))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
ranfor
## Random Forest
##
## 4380 samples
## 58 predictor
##
## Pre-processing: centered (58), scaled (58)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 3504, 3503, 3505, 3503, 3505
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 15157.85 0.05377877 2848.805
## 2 15355.86 0.04598578 2910.789
## 3 15509.29 0.03864467 2973.342
## 4 15581.39 0.05048981 3009.556
## 5 15846.00 0.04680512 3026.772
## 6 15834.85 0.04211956 3057.069
## 7 16038.76 0.04496525 3088.450
## 8 15927.51 0.05563984 3086.149
## 9 16203.03 0.04891168 3116.612
## 10 16283.49 0.05031580 3130.486
## 11 16399.59 0.03799132 3179.524
## 12 16377.32 0.04150967 3145.641
## 13 16441.95 0.04935393 3163.392
## 14 16694.82 0.03963834 3198.351
## 15 16523.92 0.05139832 3163.985
## 16 16718.52 0.04564188 3190.055
## 17 16748.47 0.05288098 3223.845
## 18 16865.25 0.04847716 3224.299
## 19 16896.86 0.04987847 3238.862
## 20 16896.14 0.05147638 3231.719
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 1.
Boosted Tree - Jonathan
tune_grid <- expand.grid(
n.trees = c(5, 10, 50, 100),
interaction.depth = c(1,2,3, 4),
shrinkage = 0.1,
n.minobsinnode = 10
)
bt_fit <- train(
shares ~ .,
data=Training,
method="gbm",
preProcess = c("center", "scale"),
trControl = trainControl(method = "cv", number = 5)
)
## Warning in preProcess.default(method = c("center", "scale"), x = structure(c(12, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 286310951.7625 nan 0.1000 -76117.7233
## 2 282962681.6287 nan 0.1000 507170.0352
## 3 282880632.9716 nan 0.1000 -57163.8105
## 4 280754127.8342 nan 0.1000 -306999.1694
## 5 278463444.3548 nan 0.1000 273690.2509
## 6 276150745.3540 nan 0.1000 -152201.8970
## 7 274039672.2304 nan 0.1000 -844518.1717
## 8 272389463.6044 nan 0.1000 -537390.8229
## 9 269451061.7775 nan 0.1000 -608717.8499
## 10 268424874.8383 nan 0.1000 -655569.0219
## 20 259466437.7866 nan 0.1000 -2129733.1245
## 40 248619914.5133 nan 0.1000 -3015123.9923
## 60 243262475.8892 nan 0.1000 -2137014.4795
## 80 239467184.8952 nan 0.1000 -1708195.5877
## 100 236662756.2163 nan 0.1000 120433.4881
## 120 231759778.1030 nan 0.1000 -3230448.2450
## 140 227236285.7942 nan 0.1000 452018.5219
## 150 224652077.5395 nan 0.1000 -1487306.1614
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 282827431.5079 nan 0.1000 766253.2601
## 2 273481266.7657 nan 0.1000 -345034.9433
## 3 269653116.2655 nan 0.1000 -1427208.7746
## 4 270220833.7521 nan 0.1000 -1646726.9749
## 5 267164396.5393 nan 0.1000 -2242074.6337
## 6 267678579.4169 nan 0.1000 -1809366.5355
## 7 265787722.8687 nan 0.1000 -400511.4227
## 8 266884141.5152 nan 0.1000 -2838379.7247
## 9 265261976.5598 nan 0.1000 -1300235.3791
## 10 265430053.4145 nan 0.1000 -699273.4364
## 20 251214942.3203 nan 0.1000 -200633.1778
## 40 229010072.1250 nan 0.1000 -1751412.5008
## 60 203389437.6725 nan 0.1000 -1020305.2945
## 80 183014574.5583 nan 0.1000 -1610033.1437
## 100 173073394.9241 nan 0.1000 -815807.0853
## 120 151162689.1521 nan 0.1000 -719433.6448
## 140 143182032.0413 nan 0.1000 -2964806.8562
## 150 137874355.2025 nan 0.1000 -2146517.1083
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 286107242.4372 nan 0.1000 293175.4791
## 2 280740763.1760 nan 0.1000 -312712.4557
## 3 275729651.3617 nan 0.1000 -251843.6152
## 4 276117837.0671 nan 0.1000 -1370876.3594
## 5 274399250.1942 nan 0.1000 -410666.5891
## 6 266637517.5449 nan 0.1000 -2219944.7322
## 7 264803566.7398 nan 0.1000 -736067.8808
## 8 262366122.0615 nan 0.1000 -1899514.4746
## 9 261839464.4311 nan 0.1000 -1978751.4083
## 10 261539249.6497 nan 0.1000 -1953518.9821
## 20 242752375.5035 nan 0.1000 -1818714.4675
## 40 209931307.4781 nan 0.1000 -607600.9489
## 60 199194653.2343 nan 0.1000 -1963858.2487
## 80 173561088.5659 nan 0.1000 -1186834.3085
## 100 156368778.8291 nan 0.1000 -1954755.5511
## 120 142213332.8463 nan 0.1000 -767200.1665
## 140 127563125.0776 nan 0.1000 -1429875.7704
## 150 123017367.5855 nan 0.1000 -298732.2090
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 373855220.4512 nan 0.1000 -171576.0547
## 2 373553251.2663 nan 0.1000 405526.5603
## 3 373206795.9682 nan 0.1000 -202549.7838
## 4 372653111.1019 nan 0.1000 -179332.2407
## 5 372486876.9969 nan 0.1000 -165702.2983
## 6 372245436.5515 nan 0.1000 -185699.8008
## 7 371980463.2002 nan 0.1000 -248090.5992
## 8 369404986.2741 nan 0.1000 507463.7840
## 9 367578052.7164 nan 0.1000 -511523.9389
## 10 365624849.3614 nan 0.1000 -281012.9788
## 20 354163252.4734 nan 0.1000 -1728553.7533
## 40 342489902.1573 nan 0.1000 -1614734.5376
## 60 337583613.2383 nan 0.1000 -1105096.5442
## 80 330762311.5265 nan 0.1000 -2803176.7545
## 100 324397324.6587 nan 0.1000 -1616146.3504
## 120 320130243.0733 nan 0.1000 -2336785.6749
## 140 315842049.4504 nan 0.1000 -2030493.6457
## 150 313746593.1310 nan 0.1000 -2424374.6768
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 375449557.8549 nan 0.1000 -205568.1854
## 2 372277961.4090 nan 0.1000 -238882.7419
## 3 363633895.0626 nan 0.1000 -1755134.7138
## 4 363128366.6772 nan 0.1000 -225890.1661
## 5 355256425.4360 nan 0.1000 -2002716.6431
## 6 352068716.8565 nan 0.1000 556595.9261
## 7 351085711.5435 nan 0.1000 -1361485.7732
## 8 351483857.9002 nan 0.1000 -1915921.9152
## 9 349035548.4479 nan 0.1000 -3956882.3380
## 10 348074080.4855 nan 0.1000 -2834344.4972
## 20 333697338.0210 nan 0.1000 -3333023.2033
## 40 305357013.1736 nan 0.1000 -507915.2546
## 60 275532746.0461 nan 0.1000 -2413767.0637
## 80 252740475.1621 nan 0.1000 -1575674.3280
## 100 238888916.5485 nan 0.1000 -1111816.9339
## 120 230873929.0000 nan 0.1000 -1187229.9098
## 140 221827683.4528 nan 0.1000 -817592.2280
## 150 217009613.0951 nan 0.1000 -837383.0430
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 373027859.9656 nan 0.1000 -412781.4645
## 2 370091723.5580 nan 0.1000 -392839.9197
## 3 361673491.7237 nan 0.1000 -2703434.2184
## 4 359011247.1929 nan 0.1000 -915956.9047
## 5 351656225.4386 nan 0.1000 -2366038.3466
## 6 347510626.2161 nan 0.1000 -629329.3360
## 7 340609623.7895 nan 0.1000 -2477621.1481
## 8 340740022.5723 nan 0.1000 -1974364.6116
## 9 334865253.1802 nan 0.1000 -2899867.8080
## 10 331591076.7855 nan 0.1000 416075.6419
## 20 308099963.8859 nan 0.1000 -1091178.4541
## 40 283436625.7619 nan 0.1000 -2369501.1502
## 60 266521035.7377 nan 0.1000 -1242036.7243
## 80 239376465.9446 nan 0.1000 -1275558.4886
## 100 222464609.0982 nan 0.1000 -1419601.5389
## 120 203042745.4347 nan 0.1000 -3117391.7745
## 140 183312549.2614 nan 0.1000 -1201381.0496
## 150 176827511.9344 nan 0.1000 -459731.1470
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 239538677.0782 nan 0.1000 -1770.8581
## 2 236642684.5218 nan 0.1000 -317179.3756
## 3 234498530.0893 nan 0.1000 -119907.1100
## 4 232486452.7371 nan 0.1000 -93093.4510
## 5 232255636.6523 nan 0.1000 -27778.2034
## 6 230648967.2272 nan 0.1000 -968593.0943
## 7 230200148.0669 nan 0.1000 5669.3839
## 8 229871250.6884 nan 0.1000 -94478.2215
## 9 228154504.8902 nan 0.1000 -909191.5889
## 10 227950948.5368 nan 0.1000 -91825.6882
## 20 220441802.9398 nan 0.1000 -1122808.5018
## 40 216804216.4422 nan 0.1000 -349749.5888
## 60 211389604.4330 nan 0.1000 -1051509.4675
## 80 207989146.3241 nan 0.1000 -760158.6987
## 100 206166267.0766 nan 0.1000 -91676.6283
## 120 203319015.4015 nan 0.1000 493233.3327
## 140 200954183.5520 nan 0.1000 -918191.5998
## 150 200777080.1784 nan 0.1000 -1021740.2691
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 239512474.7079 nan 0.1000 -76929.0648
## 2 238965202.6995 nan 0.1000 26279.0670
## 3 235873912.9471 nan 0.1000 421814.7061
## 4 235363845.2666 nan 0.1000 463575.4125
## 5 234837632.1140 nan 0.1000 347126.8398
## 6 232593001.4046 nan 0.1000 -73028.0375
## 7 231772743.3224 nan 0.1000 220936.4702
## 8 230088459.8286 nan 0.1000 -315559.4705
## 9 228267969.4086 nan 0.1000 -1217179.7463
## 10 226676962.3435 nan 0.1000 -440978.4683
## 20 219794123.4154 nan 0.1000 -321108.1953
## 40 204660748.4865 nan 0.1000 -281666.5669
## 60 194840104.7536 nan 0.1000 -733039.0167
## 80 188659508.5330 nan 0.1000 -1768387.9348
## 100 178173586.7072 nan 0.1000 -1112312.6364
## 120 171397202.4012 nan 0.1000 -904655.5031
## 140 164157814.8576 nan 0.1000 -1412414.9368
## 150 158638821.7582 nan 0.1000 -591593.6457
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 235965922.7284 nan 0.1000 482638.9292
## 2 233255133.8616 nan 0.1000 -222344.5470
## 3 230905039.1060 nan 0.1000 -528441.4051
## 4 228890191.0639 nan 0.1000 -276990.0971
## 5 226954212.8264 nan 0.1000 -863877.7788
## 6 226251788.9568 nan 0.1000 -160532.2220
## 7 224055870.1309 nan 0.1000 -88102.3355
## 8 221952237.9246 nan 0.1000 -372358.8852
## 9 220021901.6704 nan 0.1000 91755.9032
## 10 219401642.5801 nan 0.1000 -273587.7436
## 20 208005336.9939 nan 0.1000 -700905.2758
## 40 193803157.9561 nan 0.1000 -2060811.6936
## 60 183875563.2728 nan 0.1000 -478821.2857
## 80 174828010.2961 nan 0.1000 -1025189.2421
## 100 163703239.3589 nan 0.1000 -1311610.6115
## 120 154790933.1268 nan 0.1000 -467450.9410
## 140 148251185.7556 nan 0.1000 -1530743.1478
## 150 146598578.9371 nan 0.1000 -527067.2629
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 374935951.1375 nan 0.1000 -131918.0101
## 2 370953029.1862 nan 0.1000 -286936.7164
## 3 369152006.3610 nan 0.1000 -798406.1285
## 4 367918520.9180 nan 0.1000 -1283388.8805
## 5 367108230.4191 nan 0.1000 -1184225.6920
## 6 364600906.3288 nan 0.1000 -4709102.0790
## 7 365060236.9216 nan 0.1000 -1619409.8489
## 8 364989997.3752 nan 0.1000 -1822811.0926
## 9 364119120.7049 nan 0.1000 -187826.6318
## 10 362434139.0545 nan 0.1000 1044512.8960
## 20 353216883.0372 nan 0.1000 60005.7008
## 40 342227972.3415 nan 0.1000 -1247997.5119
## 60 333603443.4748 nan 0.1000 -1168893.5563
## 80 332199593.8909 nan 0.1000 -393735.6394
## 100 330281197.4015 nan 0.1000 -1701956.8625
## 120 323512677.0882 nan 0.1000 778967.4014
## 140 315900035.4582 nan 0.1000 -1160794.1080
## 150 314692836.0084 nan 0.1000 -1106566.8126
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 372944337.6918 nan 0.1000 -145704.6798
## 2 364069318.2645 nan 0.1000 -1398666.4622
## 3 362304662.4913 nan 0.1000 -351054.2697
## 4 360198493.0612 nan 0.1000 -555822.9433
## 5 360610803.4386 nan 0.1000 -1382564.5594
## 6 359352498.8863 nan 0.1000 -633063.8856
## 7 358430587.6158 nan 0.1000 -179908.9796
## 8 358866247.6044 nan 0.1000 -1318083.1581
## 9 356257887.5396 nan 0.1000 -210584.1432
## 10 355029094.4657 nan 0.1000 -704780.3541
## 20 328118033.0431 nan 0.1000 -1286201.4708
## 40 307086027.0498 nan 0.1000 -396512.1081
## 60 286295808.9451 nan 0.1000 -2027261.5021
## 80 277743376.2950 nan 0.1000 -129537.8088
## 100 257356188.9535 nan 0.1000 -1844969.0694
## 120 236533471.6071 nan 0.1000 -1407198.8378
## 140 218831798.8086 nan 0.1000 -858533.3338
## 150 211289087.6822 nan 0.1000 -785204.5122
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 365460465.6650 nan 0.1000 -888964.9657
## 2 362118918.9240 nan 0.1000 -224786.1582
## 3 360056342.2023 nan 0.1000 -410695.4743
## 4 357931854.7212 nan 0.1000 2140086.6094
## 5 356753977.5928 nan 0.1000 -276111.6826
## 6 348917973.8965 nan 0.1000 -2377176.3434
## 7 339074697.2067 nan 0.1000 -570186.1088
## 8 337275296.2190 nan 0.1000 -463977.3387
## 9 333936147.3690 nan 0.1000 -270519.6636
## 10 331340641.2970 nan 0.1000 -634697.8100
## 20 312169382.3129 nan 0.1000 -3006399.1388
## 40 283130148.2977 nan 0.1000 -2625340.5536
## 60 248211145.0891 nan 0.1000 -578139.7662
## 80 227669005.9706 nan 0.1000 -244080.3089
## 100 204960299.8546 nan 0.1000 -2115856.1981
## 120 189091122.4248 nan 0.1000 -143152.9284
## 140 173771435.7999 nan 0.1000 -588544.9085
## 150 167670735.0858 nan 0.1000 -1052901.5226
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 252874696.7749 nan 0.1000 -132916.6042
## 2 252421407.5789 nan 0.1000 100072.8549
## 3 250020009.8939 nan 0.1000 -248314.6035
## 4 249673451.7268 nan 0.1000 93137.9411
## 5 247878849.4004 nan 0.1000 -655368.8219
## 6 247656624.8289 nan 0.1000 -110478.4343
## 7 247342884.8166 nan 0.1000 -38841.0126
## 8 246957658.4065 nan 0.1000 -165805.7050
## 9 245848145.8046 nan 0.1000 -1484325.9418
## 10 245669114.6160 nan 0.1000 -147706.7993
## 20 243187433.0503 nan 0.1000 -714229.0504
## 40 240476573.3833 nan 0.1000 -2542883.0829
## 60 235301345.2308 nan 0.1000 577822.9812
## 80 230871028.9008 nan 0.1000 -265777.1594
## 100 228425345.2860 nan 0.1000 -650181.9573
## 120 224796419.6964 nan 0.1000 -1482761.7323
## 140 222169889.0526 nan 0.1000 -1205106.6434
## 150 221579347.0535 nan 0.1000 -1652192.5361
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 250766129.8195 nan 0.1000 -120199.8395
## 2 248564022.0184 nan 0.1000 -417168.1618
## 3 243726522.7168 nan 0.1000 -454007.0291
## 4 242111959.8054 nan 0.1000 -456687.4443
## 5 241616430.6599 nan 0.1000 -92812.1462
## 6 240127714.7402 nan 0.1000 -789456.9438
## 7 239195146.8048 nan 0.1000 -947982.5507
## 8 238331670.4518 nan 0.1000 -294815.1564
## 9 237390495.1842 nan 0.1000 -1410373.3971
## 10 236918455.8150 nan 0.1000 -1407122.1551
## 20 232141494.0057 nan 0.1000 -1685499.9376
## 40 226020611.6901 nan 0.1000 -852839.3267
## 60 221400379.1324 nan 0.1000 -3243524.6944
## 80 218338375.0996 nan 0.1000 -931033.9343
## 100 215017927.5096 nan 0.1000 -1340408.6237
## 120 210638399.4758 nan 0.1000 -1902631.1702
## 140 205435228.1908 nan 0.1000 -1114159.8907
## 150 202298046.8557 nan 0.1000 -373188.6182
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 250398422.8465 nan 0.1000 -109769.0007
## 2 247603320.6953 nan 0.1000 -550827.4364
## 3 244943286.8613 nan 0.1000 -513396.4589
## 4 243312141.6676 nan 0.1000 -1544460.1232
## 5 241442627.6791 nan 0.1000 -374519.6976
## 6 239491306.3047 nan 0.1000 -457967.7883
## 7 238483208.9356 nan 0.1000 -1313128.4605
## 8 238374384.4908 nan 0.1000 -1139888.4877
## 9 237318809.6531 nan 0.1000 -2185472.8297
## 10 237341209.9695 nan 0.1000 -1540618.4168
## 20 227807774.7721 nan 0.1000 -555707.1865
## 40 211607554.4276 nan 0.1000 966712.1584
## 60 202634474.5630 nan 0.1000 -1401536.8374
## 80 187386272.6326 nan 0.1000 -487414.4154
## 100 173461093.0258 nan 0.1000 -213058.8397
## 120 164506086.3756 nan 0.1000 -888958.5489
## 140 157309970.6938 nan 0.1000 -965581.5288
## 150 155225602.3479 nan 0.1000 -1017925.7803
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.
## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 304181418.9313 nan 0.1000 520330.6038
## 2 303799814.3213 nan 0.1000 -81539.2991
## 3 301791146.8319 nan 0.1000 335795.3849
## 4 299876821.0557 nan 0.1000 -181863.5384
## 5 299647182.6686 nan 0.1000 -114851.2762
## 6 297719493.2358 nan 0.1000 -391557.0138
## 7 296399329.5459 nan 0.1000 -712681.4733
## 8 296224174.4025 nan 0.1000 96152.2242
## 9 293939404.5748 nan 0.1000 -554432.9029
## 10 292427477.2420 nan 0.1000 -112599.9810
## 20 286240995.3229 nan 0.1000 -820306.1566
## 40 279688070.1419 nan 0.1000 -1707349.1322
## 50 275159167.0077 nan 0.1000 -1306506.7414
bt_fit
## Stochastic Gradient Boosting
##
## 4380 samples
## 58 predictor
##
## Pre-processing: centered (58), scaled (58)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 3506, 3503, 3503, 3504, 3504
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 15831.95 0.009215988 3053.250
## 1 100 16089.59 0.010770786 3136.111
## 1 150 16069.62 0.014203229 3170.613
## 2 50 15978.42 0.011840449 3115.630
## 2 100 16265.49 0.012146965 3212.661
## 2 150 16492.86 0.010227445 3311.518
## 3 50 16013.91 0.014552547 3187.900
## 3 100 16410.04 0.013086433 3310.393
## 3 150 16808.60 0.012678378 3385.395
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## Tuning parameter 'n.minobsinnode' was held
## constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode
## = 10.
Comparison - Jordan
Finally, let’s compare our four models: 2 linear models, 1 random forest model, and 1 boosted tree model.
# random forest prediction on testing model and its performance
predRF <- predict(ranfor, newdata = Testing)
RF <- postResample(predRF, Testing$shares)
# linear model 1 prediction on testing model and its performance
predlm1 <- predict(fit1, newdata = Testing)
LM <- postResample(predlm1, Testing$shares)
# linear model 2 prediction on testing model and its performance
predlm2 <- predict(lm_fit, newdata = Testing)
LM2 <- postResample(predlm2, Testing$shares)
# boosted tree prediction on testing model and its performance
predbt <- predict(bt_fit, newdata = Testing)
BT <- postResample(predbt, Testing$shares)
# combine each of the performance stats for the models and add a column with the model names
dat <- data.frame(rbind(t(data.frame(LM)), t(data.frame(RF)), t(data.frame(LM2)), t(data.frame(BT))))
df <- as_tibble(rownames_to_column(dat, "models"))
# find the model with the lowesr RMSE
best <- df %>% filter(RMSE == min(RMSE)) %>% select(models)
# print "The Best fitting model according to RMSE is [insert model name for lowest RMSE here]"
paste("The Best fitting model according to RMSE is", best$models, sep = " ")
## [1] "The Best fitting model according to RMSE is RF"
Automation - Jonathan
#rmarkdown::render(
# "Tanley-Wood-Project2.Rmd",
# output_format="github_document",
# output_dir="./Analysis",
# output_options = list(
# html_preview = FALSE
# )
#)