View on GitHub

Tanley-Wood-Project2

Tanley-Wood-Project2

Jordan Tanley and Jonathan Wood 2022-07-05

Introduction - Jonathan

Data

The data in this analysis will be the online news popularity dataset. This data has a set of features on articles from Mashable.com over a two year period.

The goal of this project is to determine the number of shares (how many times the article was shared over social media) the article has. We will use this information to predict if an article can be popular by the number of shares.

Notable Variables

While there are 61 variables in the data set, we will not use all of them for this project. The notable variables are the following:

Methods

Multiple methods will be used for this project to predict the number of shares a new article can generate, including

Data - Jordan

In order to read in the data using a relative path, be sure to have the data file saved in your working directory.

# read in the data
news <- read_csv("OnlineNewsPopularity/OnlineNewsPopularity.csv")
## Rows: 39644 Columns: 61
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): url
## dbl (60): timedelta, n_tokens_title, n_tokens_content, n_unique_tokens, n_non_stop_words, n_non_stop_unique_token...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# sneek peek at the dataset
head(news)
# Creating a weekday variable (basically undoing the 7 dummy variables that came with the data) for EDA
news$weekday <- ifelse(news$weekday_is_friday == 1, "Friday",
                       ifelse(news$weekday_is_monday == 1, "Monday",
                              ifelse(news$weekday_is_tuesday == 1, "Tuesday",
                                     ifelse(news$weekday_is_wednesday == 1, "Wednesday",
                                            ifelse(news$weekday_is_thursday == 1, "Thursday",
                                                   ifelse(news$weekday_is_saturday == 1, "Saturday", 
                                                          "Sunday"))))))

Next, let’s subset the data so that we can only look at the data channel of interest. We will look at articles with the “Social Media” data channel.

# Subset the data to  one of the parameterized data channels and drop unnecessary variables
chan <- paste0("data_channel_is_", params$channel)

print(chan)
## [1] "data_channel_is_bus"
filtered_channel <- news %>% 
                as_tibble() %>% 
                filter(news[chan] == 1) %>% 
                select(-c(url, timedelta))

# take a peek at the data
filtered_channel %>%
  select(ends_with(chan))

Summarizations - Both (at least 3 plots each)

For the numerical summaries, we can look at several aspects. Contingency tables allow us to examine frequencies of categorical variables. The first output below, for example, shows the counts for each weekday. Similarly, the fifth table outputted shows the frequencies of number of tokens in the article content. Another set of summary statistics to look at are the 5 Number Summaries. These provide the minmum, 1st quantile, median, 3rd quantile, and maximum for a particular variable. Additionally, it may also be helful to look at the average. These are helpful in determining the skewness (if mean = median vs. mean < or > median) and helps in looking for outliers (anything outside (Q3 - Q1)1.5 from the median is generally considered an outlier). Below, the 5 Number summaries (plus mean) are shown for Shares, Number of words in the content, Number of words in the content for the upper quantile of Shares, number of images in the article, number of videos in the article, positive word rate, and negative word rate.

# Contingency table of frequencies for days of the week, added caption for clarity
kable(table(filtered_channel$weekday), 
      col.names = c("Weekday", "Frequency"), 
      caption = "Contingency table of frequencies for days of the week")
Weekday Frequency
Friday 832
Monday 1153
Saturday 243
Sunday 343
Thursday 1234
Tuesday 1182
Wednesday 1271

Contingency table of frequencies for days of the week

# Numerical Summary of Shares, added caption for clarity
filtered_channel %>% summarise(Minimum = min(shares), 
                          Q1 = quantile(shares, prob = 0.25), 
                          Average = mean(shares), 
                          Median = median(shares), 
                          Q3 = quantile(shares, prob = 0.75), 
                          Maximum = max(shares)) %>% 
                kable(caption = "Numerical Summary of Shares")
Minimum Q1 Average Median Q3 Maximum
1 952.25 3063.019 1400 2500 690400

Numerical Summary of Shares

# Numerical Summary of Number of words in the content, added caption for clarity
filtered_channel %>% summarise(Minimum = min(n_tokens_content), 
                          Q1 = quantile(n_tokens_content, prob = 0.25), 
                          Average = mean(n_tokens_content), 
                          Median = median(n_tokens_content), 
                          Q3 = quantile(n_tokens_content, prob = 0.75), 
                          Maximum = max(n_tokens_content)) %>% 
                kable(caption = "Numerical Summary of Number of words in the content")
Minimum Q1 Average Median Q3 Maximum
0 244 539.8714 400 727 6336

Numerical Summary of Number of words in the content

# Numerical Summary of Number of words in the content for the upper quantile of Shares, added caption for clarity
filtered_channel %>% filter(shares > quantile(shares, prob = 0.75)) %>%
                summarise(Minimum = min(n_tokens_content), 
                          Q1 = quantile(n_tokens_content, prob = 0.25), 
                          Average = mean(n_tokens_content), 
                          Median = median(n_tokens_content), 
                          Q3 = quantile(n_tokens_content, prob = 0.75), 
                          Maximum = max(n_tokens_content)) %>% 
                kable(caption = "Numerical Summary of Number of words in the content for the upper quantile of Shares")
Minimum Q1 Average Median Q3 Maximum
0 263.25 685.6176 567 948 6336

Numerical Summary of Number of words in the content for the upper quantile of Shares

kable(table(filtered_channel$n_tokens_content),
  col.names = c("Tokens", "Frequency"), 
  caption = "Contingency table of frequencies for number of tokens in the article content")
Tokens Frequency
0 23
47 1
50 1
61 1
67 1
72 1
73 1
74 1
76 2
78 1
80 1
81 1
83 2
84 2
85 1
86 1
87 2
88 1
89 5
90 1
91 4
92 3
93 1
94 3
95 4
96 4
97 4
98 2
99 2
100 5
101 9
102 2
103 8
104 7
105 4
106 5
107 1
108 2
109 7
110 9
111 5
112 6
113 12
114 5
115 6
116 9
117 9
118 14
119 8
120 5
121 4
122 3
123 6
124 6
125 5
126 8
127 8
128 12
129 5
130 7
131 3
132 8
133 7
134 7
135 5
136 8
137 11
138 9
139 8
140 5
141 8
142 11
143 7
144 9
145 6
146 10
147 7
148 6
149 8
150 10
151 10
152 5
153 8
154 11
155 15
156 13
157 4
158 11
159 12
160 7
161 8
162 8
163 9
164 8
165 11
166 18
167 8
168 17
169 9
170 13
171 9
172 13
173 14
174 8
175 9
176 12
177 12
178 14
179 10
180 14
181 14
182 15
183 14
184 8
185 15
186 11
187 11
188 6
189 11
190 16
191 11
192 16
193 10
194 11
195 7
196 11
197 16
198 14
199 10
200 13
201 9
202 7
203 9
204 20
205 16
206 11
207 14
208 17
209 15
210 6
211 11
212 17
213 11
214 11
215 15
216 18
217 11
218 9
219 17
220 17
221 13
222 15
223 22
224 18
225 16
226 9
227 14
228 10
229 11
230 15
231 11
232 18
233 11
234 13
235 14
236 10
237 11
238 8
239 10
240 14
241 14
242 15
243 12
244 17
245 14
246 11
247 13
248 11
249 13
250 8
251 11
252 10
253 9
254 9
255 14
256 20
257 12
258 12
259 9
260 18
261 9
262 19
263 17
264 10
265 11
266 11
267 13
268 15
269 14
270 6
271 11
272 12
273 13
274 12
275 17
276 9
277 15
278 10
279 14
280 5
281 12
282 14
283 11
284 7
285 6
286 17
287 10
288 11
289 11
290 11
291 16
292 13
293 13
294 18
295 13
296 11
297 11
298 11
299 15
300 8
301 18
302 13
303 13
304 12
305 7
306 9
307 14
308 10
309 12
310 7
311 13
312 11
313 11
314 10
315 14
316 10
317 10
318 13
319 10
320 9
321 8
322 10
323 8
324 11
325 8
326 14
327 18
328 8
329 4
330 7
331 3
332 11
333 9
334 11
335 14
336 7
337 9
338 9
339 6
340 6
341 10
342 7
343 10
344 8
345 6
346 5
347 10
348 9
349 7
350 12
351 7
352 8
353 5
354 4
355 10
356 7
357 11
358 4
359 13
360 8
361 9
362 8
363 4
364 13
365 5
366 7
367 14
368 9
369 8
370 4
371 2
372 8
373 14
374 9
375 9
376 6
377 11
378 8
379 6
380 9
381 9
382 4
383 8
384 11
385 8
386 12
387 9
388 11
389 13
390 2
391 6
392 7
393 8
394 8
395 6
396 5
397 8
398 4
399 7
400 7
401 9
402 8
403 9
404 5
405 10
406 6
407 9
408 5
409 6
410 4
411 2
412 5
413 8
414 6
415 10
416 9
417 7
418 7
419 4
420 6
421 6
422 8
423 8
424 3
425 8
426 8
427 6
428 6
429 6
430 7
431 5
432 8
433 4
434 7
435 3
436 5
437 9
438 2
439 6
440 4
441 12
442 7
443 2
444 7
445 4
446 7
447 6
448 3
449 3
450 4
451 4
452 4
453 2
454 6
455 8
456 2
457 9
458 2
459 3
460 4
461 3
462 5
463 7
464 6
465 5
466 9
467 7
468 6
469 4
470 3
471 9
472 5
473 8
474 7
475 4
476 8
477 10
478 4
479 8
480 3
481 7
482 4
483 3
484 4
485 9
486 6
487 7
488 7
489 7
490 4
491 8
492 8
493 6
494 5
495 5
496 6
497 5
498 7
499 3
500 7
501 6
502 8
503 6
504 1
505 3
506 7
507 6
508 5
509 9
510 2
511 12
512 3
513 2
514 3
515 3
516 2
517 7
518 5
519 2
520 7
521 4
522 7
523 4
524 8
525 3
526 5
527 7
528 4
529 3
530 5
531 3
532 4
533 5
534 4
535 1
536 5
537 9
538 5
539 5
540 7
541 6
543 3
544 8
545 7
546 5
547 5
548 7
549 3
550 3
551 4
552 5
553 6
554 4
555 7
556 9
557 5
558 5
559 1
560 5
561 4
562 1
563 4
564 4
565 6
566 2
567 2
568 6
569 1
570 6
571 4
572 2
573 4
574 4
575 5
576 7
577 6
578 9
579 8
580 4
581 4
582 6
583 1
584 4
585 2
586 6
587 2
588 7
589 3
590 5
591 3
592 10
593 3
594 4
595 8
596 5
597 2
598 4
599 5
600 4
601 2
602 4
603 5
604 7
605 7
606 3
607 5
608 4
609 3
610 4
611 5
612 7
613 5
614 3
615 3
616 3
617 2
618 6
619 1
620 7
621 2
622 5
623 5
624 3
625 6
626 3
627 4
628 4
629 6
630 4
631 5
632 8
633 5
634 6
635 5
636 4
637 3
638 3
639 4
640 5
641 3
642 4
643 6
644 5
645 9
646 4
647 5
648 2
649 1
650 4
651 6
652 2
653 3
654 2
655 3
657 5
658 3
659 8
660 5
661 5
662 4
663 6
664 7
665 4
666 5
667 7
668 5
669 1
670 5
671 6
672 6
673 3
674 3
675 3
676 1
677 3
678 3
679 6
680 5
681 2
682 1
683 4
684 1
685 2
686 3
687 3
688 1
689 1
690 3
691 1
692 2
693 2
694 3
695 3
696 5
697 3
698 3
699 3
700 8
701 2
702 4
703 4
704 3
705 5
706 6
707 5
708 8
709 5
710 3
711 4
712 5
713 3
714 4
715 1
717 3
718 4
719 7
720 4
721 6
722 2
723 2
724 1
725 1
726 2
727 5
728 6
729 3
730 6
731 5
732 5
733 6
734 1
736 3
737 1
738 3
739 6
741 6
742 1
743 4
744 2
745 5
746 4
747 2
748 1
749 4
750 2
751 5
752 2
753 5
754 1
755 3
756 1
757 3
758 4
759 2
760 3
761 6
762 5
763 1
764 4
766 6
767 5
768 4
769 3
770 1
771 4
773 1
774 2
775 2
776 2
777 9
778 2
779 3
780 7
781 5
782 5
783 5
785 5
786 1
787 4
788 5
789 1
790 3
791 7
792 5
793 1
794 2
795 4
796 3
797 2
798 4
799 4
800 1
801 3
802 4
803 2
804 6
805 3
806 4
808 1
809 3
810 3
811 4
812 2
813 1
814 5
815 3
817 5
818 1
819 2
820 3
821 3
822 5
823 2
824 5
825 1
826 7
827 3
828 4
829 4
830 2
831 3
832 4
833 4
834 3
835 5
836 4
837 2
838 1
839 3
840 3
841 1
842 3
843 3
844 4
846 1
847 4
848 4
849 3
850 6
851 4
852 3
853 2
854 5
855 1
856 2
858 3
860 2
861 2
863 3
865 1
866 3
867 3
868 2
869 2
870 4
871 2
872 1
873 4
874 1
875 3
876 5
877 2
878 3
879 6
880 4
881 1
882 5
883 2
884 2
885 3
886 3
887 2
888 4
889 3
890 3
891 2
892 4
893 6
894 1
895 3
896 4
897 4
898 2
899 3
900 6
901 3
902 3
903 2
904 4
905 2
906 3
907 2
908 5
909 4
910 1
911 5
912 1
913 3
914 4
915 2
916 2
917 1
918 6
919 4
920 3
921 1
922 4
924 1
925 3
926 4
927 5
928 5
929 4
930 5
931 4
932 4
933 4
934 6
936 4
937 5
938 4
939 4
940 4
942 2
944 6
945 4
946 3
947 3
948 4
949 1
950 4
951 7
952 7
953 2
954 2
955 3
956 3
957 3
958 1
959 2
960 3
961 3
962 4
963 3
964 2
965 3
966 2
967 1
968 4
969 2
970 2
971 1
972 3
973 2
974 7
975 1
976 7
977 2
979 5
980 4
981 1
982 4
983 2
984 1
985 1
986 2
987 1
988 2
989 6
990 4
991 2
992 1
993 1
995 2
996 3
997 1
998 3
999 4
1000 2
1001 3
1002 2
1003 1
1004 3
1005 4
1006 5
1007 2
1008 2
1009 4
1010 1
1011 5
1012 3
1013 1
1014 2
1015 5
1018 1
1019 3
1020 5
1021 1
1022 3
1023 3
1024 2
1025 2
1026 2
1027 7
1028 1
1029 1
1030 3
1031 2
1032 2
1033 1
1034 3
1035 2
1036 1
1037 2
1038 6
1039 3
1040 4
1041 3
1042 1
1043 1
1044 3
1045 3
1046 4
1047 3
1048 2
1049 2
1050 2
1051 1
1052 2
1053 2
1054 3
1055 2
1056 2
1057 4
1058 4
1059 1
1060 2
1061 3
1062 2
1063 3
1064 3
1066 1
1067 3
1068 1
1069 3
1070 2
1072 4
1073 1
1074 1
1075 2
1076 4
1077 2
1078 2
1079 5
1080 3
1081 2
1082 1
1083 2
1084 2
1085 3
1087 1
1088 2
1089 2
1090 1
1091 1
1093 4
1094 4
1095 2
1096 1
1097 4
1098 2
1099 1
1100 1
1101 1
1104 2
1105 3
1106 3
1107 1
1110 4
1111 1
1112 3
1113 2
1114 2
1117 1
1118 5
1119 1
1120 1
1121 1
1122 2
1123 2
1124 3
1125 1
1128 2
1134 1
1137 2
1138 1
1139 2
1140 1
1141 2
1142 1
1143 3
1144 2
1145 4
1146 4
1149 1
1150 1
1151 2
1152 1
1153 1
1156 1
1157 4
1158 3
1159 2
1160 1
1162 2
1163 1
1164 3
1166 4
1167 1
1168 2
1169 4
1170 1
1171 1
1173 2
1174 1
1175 1
1176 2
1177 2
1179 2
1181 1
1182 1
1183 3
1184 1
1185 2
1186 2
1187 1
1193 3
1195 1
1196 2
1197 1
1198 1
1199 1
1201 1
1203 2
1205 4
1206 2
1207 1
1209 3
1211 1
1213 4
1215 1
1216 1
1217 2
1218 3
1219 1
1221 1
1223 3
1225 1
1226 1
1227 1
1228 2
1230 1
1231 1
1234 1
1235 2
1237 3
1238 2
1241 2
1242 3
1243 1
1245 2
1249 3
1250 1
1255 2
1257 1
1259 1
1260 3
1262 1
1263 1
1269 1
1270 3
1271 2
1274 1
1277 2
1279 2
1280 1
1281 1
1282 1
1283 2
1284 1
1285 1
1289 2
1290 1
1291 1
1292 3
1293 1
1294 2
1295 3
1297 2
1299 1
1303 1
1307 1
1308 1
1310 1
1311 1
1312 1
1315 1
1316 1
1317 1
1318 2
1320 1
1321 1
1325 1
1328 1
1329 2
1331 1
1332 2
1338 1
1339 2
1343 1
1345 1
1346 2
1348 1
1353 1
1355 1
1356 1
1358 2
1359 2
1361 2
1363 1
1368 2
1369 1
1370 1
1372 1
1375 1
1379 1
1380 2
1381 2
1386 1
1388 1
1390 3
1391 1
1393 1
1394 1
1398 1
1399 1
1405 1
1408 1
1413 1
1415 1
1419 1
1423 1
1425 1
1426 3
1427 1
1438 1
1439 1
1442 2
1447 1
1449 1
1451 2
1454 1
1457 1
1461 1
1462 1
1465 1
1466 2
1468 1
1470 1
1473 1
1477 1
1478 1
1483 1
1484 1
1492 1
1493 2
1494 1
1499 1
1516 1
1518 1
1522 1
1525 1
1528 1
1529 1
1536 1
1541 1
1544 2
1549 1
1550 1
1551 1
1559 1
1560 1
1568 1
1569 1
1570 1
1571 1
1579 1
1580 1
1587 1
1588 1
1593 1
1600 1
1601 1
1607 2
1608 1
1611 1
1615 1
1617 1
1622 1
1641 1
1642 1
1643 1
1645 1
1648 2
1656 1
1661 1
1665 1
1666 1
1667 2
1668 1
1673 1
1675 1
1681 1
1682 1
1684 1
1687 1
1706 1
1722 2
1723 1
1727 1
1731 1
1735 1
1745 1
1751 1
1758 2
1761 1
1769 1
1770 1
1771 1
1777 2
1778 1
1785 1
1790 1
1794 2
1796 1
1800 1
1804 1
1806 1
1809 1
1817 1
1823 1
1829 1
1833 1
1836 1
1839 1
1853 1
1854 1
1855 1
1858 1
1859 1
1860 1
1881 1
1887 1
1898 1
1902 1
1906 1
1931 2
1945 2
1954 1
1981 1
1986 1
2001 1
2004 1
2008 1
2022 1
2026 1
2031 1
2032 1
2037 1
2076 2
2094 1
2097 1
2099 1
2100 1
2103 1
2119 1
2132 1
2134 1
2147 1
2159 1
2165 1
2171 1
2173 1
2184 1
2188 1
2197 1
2228 1
2238 1
2247 1
2248 1
2253 1
2280 1
2294 1
2334 1
2347 1
2369 1
2373 1
2387 1
2416 1
2419 1
2444 1
2453 1
2458 1
2475 1
2478 1
2492 1
2499 1
2525 1
2536 1
2560 1
2632 1
2642 1
2691 1
2711 1
2728 1
2732 1
2761 1
2772 1
2784 1
2791 1
2885 1
2910 1
2962 1
3023 1
3050 1
3074 1
3157 1
3222 1
3320 1
3351 1
3455 1
3560 1
3603 1
3650 1
3940 1
3974 1
4044 1
4115 1
4119 1
4452 1
4747 1
4894 1
6336 1

Contingency table of frequencies for number of tokens in the article content

# Summarizing the number of images in the article
filtered_channel %>% 
  summarise(Minimum = min(num_imgs), 
      Q1 = quantile(num_imgs, prob = 0.25), 
      Average = mean(num_imgs), 
      Median = median(num_imgs), 
      Q3 = quantile(num_imgs, prob = 0.75), 
      Maximum = max(num_imgs)) %>% 
  kable(caption = "Numerical summary of number of images in an article")
Minimum Q1 Average Median Q3 Maximum
0 1 1.808405 1 1 51

Numerical summary of number of images in an article

# Summarizing the number of videos in the article
filtered_channel %>% 
  summarise(Minimum = min(num_videos), 
      Q1 = quantile(num_videos, prob = 0.25), 
      Average = mean(num_videos), 
      Median = median(num_videos), 
      Q3 = quantile(num_videos, prob = 0.75), 
      Maximum = max(num_videos)) %>% 
  kable(caption = "Numerical summary of number of videos in an article")
Minimum Q1 Average Median Q3 Maximum
0 0 0.6364653 0 0 75

Numerical summary of number of videos in an article

# Summarizing the number of positive word rate
filtered_channel %>% 
  summarise(Minimum = min(rate_positive_words), 
      Q1 = quantile(rate_positive_words, prob = 0.25), 
      Average = mean(rate_positive_words), 
      Median = median(rate_positive_words), 
      Q3 = quantile(rate_positive_words, prob = 0.75), 
      Maximum = max(rate_positive_words)) %>% 
  kable(caption = "Numerical Summary of the rate of positive words in an article")
Minimum Q1 Average Median Q3 Maximum
0 0.6666667 0.7377051 0.75 0.8333333 1

Numerical Summary of the rate of positive words in an article

# Summarizing the number of negative word rate
filtered_channel %>% 
  summarise(Minimum = min(rate_negative_words), 
      Q1 = quantile(rate_negative_words, prob = 0.25), 
      Average = mean(rate_negative_words), 
      Median = median(rate_negative_words), 
      Q3 = quantile(rate_negative_words, prob = 0.75), 
      Maximum = max(rate_negative_words)) %>% 
  kable(caption = "Numerical Summary of the rate of negative words in an article")
Minimum Q1 Average Median Q3 Maximum
0 0.1666667 0.2583 0.25 0.3333333 1

Numerical Summary of the rate of negative words in an article

The graphical summaries more dramatically show the trends in the data, including skewness and outliers. The boxplots below show a visual representation of the 5 Number summaries for Shares, split up by weekday, and shares split up by text sentiment polarity. Boxplots make it even easier to look out for outliers (look for the dots separated from the main boxplot). Next, we can examine several scatterplots. Scatterplots allow us to look at one numerical variable vs another to see if there is any correlation between them. Look out for any plots that have most of the points on a diagonal line! There are four scatterplots below, investigating shares vs Number of words in the content, Number of words in the title, rate of positive words, and rate of negative words. Finally, a histogram can show the overall distribution of a numerical variable, including skewness. The histogram below sows the distribution of the shares variable. Look for a left or right tail to signify skewness, and look out for multiple peaks to signify a multi-modal variable.

# Boxplot of Shares for Each Weekday, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = weekday, y = shares)) + 
          geom_boxplot(fill = "grey") + 
          labs(x = "Weekday", title = "Boxplot of Shares for Each Weekday", y = "Shares") + 
          theme_classic()

# Scatterplot of Number of words in the content vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_content, y = shares)) + 
          geom_point(color = "grey") +
          labs(x = "Number of words in the content", y = "Shares", 
               title = "Scatterplot of Number of words in the content vs Shares") +
          theme_classic()

# Scatterplot of Number of words in the title vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_title, y = shares)) + 
          geom_point(color = "grey") +
          labs(x = "Number of words in the title", y = "Shares", 
               title = "Scatterplot of Number of words in the title vs Shares") +
          theme_classic()

ggplot(filtered_channel, aes(x=shares)) +
  geom_histogram(color="grey", binwidth = 2000) +
  labs(x = "Shares", 
               title = "Histogram of number of shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=rate_positive_words, y=shares)) +
  geom_point(color="grey") +
  labs(x = "rate of positive words in an article", y = "Shares", 
               title = "Scatterplot of rate of positive words in an article vs shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=rate_negative_words, y=shares)) +
  geom_point(color="grey") +
  labs(x = "rate of negative words in an article", y = "Shares", 
               title = "Scatterplot of rate of negative words in an article vs shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=global_sentiment_polarity, y=shares)) +
  geom_point(color="grey") +
  labs(x = "global sentiment polarity in an article", y = "Shares", 
               title = "Scatterplot of global sentiment polarity in an article vs shares") +
  theme_classic()

# drop the weekday variable created for EDA (will get in the way for our models if we don't drop it)
filtered_channel <- subset(filtered_channel, select = -c(weekday))

Modeling

Splitting the Data

First, let’s split up the data into a testing set and a training set using the proportions: 70% training and 30% testing.

set.seed(9876)
# Split the data into a training and test set (70/30 split)
# indices
train <- sample(1:nrow(filtered_channel), size = nrow(filtered_channel)*.70)
test <- setdiff(1:nrow(filtered_channel), train)

# training and testing subsets
Training <- filtered_channel[train, ]
Testing <- filtered_channel[test, ]

Linear Models

Linear regression models allow us to look at relationships between one response variable and several explanatory variables. A model can also include interaction terms and even higher order terms. The general form for a linear model is Y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + E_i, where each x_i represents a predictor variable and the “…” can include more predictors, interactions and/or higher order terms. Since our goal is to predict shares, we will be using these models to predict of a subset of the data created for training, and then we will later test the models on the other subsetted data set aside for testing.

Linear Model #1: - Jordan

# linear model on training dataset with 5-fold cv
fit1 <- train(shares ~ . , data = Training, method = "lm",
              preProcess = c("center", "scale"), 
              trControl = trainControl(method = "cv", number = 5))

Linear Model #2: - Jonathan

lm_fit <- train(
  shares ~ .^2,
  data=Training,
  method="lm",
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "cv", number = 5)
)

Random Forest - Jordan

Random Forest is a tree based method for fitting predictive models, that averages across all trees. One may choose to use a tree based methood due to their prediction accuracy, the fact that predictors do not need to be scaled, no statistical assumptions, and a built-in variable selection process. Random forest, in particular, randomly selects a subset of m = p / 3 predictors. This corrects the bagging issue where every bootstrap contains a strong predictor for the first split.

# fandom forest model on training dataset with 5-fold cv
ranfor <- train(shares ~ ., data = Training, method = "rf", preProcess = c("center", "scale"),
                trControl = trainControl(method = "cv", number = 5), 
                tuneGrid = expand.grid(mtry = c(1:round(ncol(Training)/3))))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
ranfor
## Random Forest 
## 
## 4380 samples
##   58 predictor
## 
## Pre-processing: centered (58), scaled (58) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3504, 3503, 3505, 3503, 3505 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared    MAE     
##    1    15157.85  0.05377877  2848.805
##    2    15355.86  0.04598578  2910.789
##    3    15509.29  0.03864467  2973.342
##    4    15581.39  0.05048981  3009.556
##    5    15846.00  0.04680512  3026.772
##    6    15834.85  0.04211956  3057.069
##    7    16038.76  0.04496525  3088.450
##    8    15927.51  0.05563984  3086.149
##    9    16203.03  0.04891168  3116.612
##   10    16283.49  0.05031580  3130.486
##   11    16399.59  0.03799132  3179.524
##   12    16377.32  0.04150967  3145.641
##   13    16441.95  0.04935393  3163.392
##   14    16694.82  0.03963834  3198.351
##   15    16523.92  0.05139832  3163.985
##   16    16718.52  0.04564188  3190.055
##   17    16748.47  0.05288098  3223.845
##   18    16865.25  0.04847716  3224.299
##   19    16896.86  0.04987847  3238.862
##   20    16896.14  0.05147638  3231.719
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 1.

Boosted Tree - Jonathan

tune_grid <- expand.grid(
  n.trees = c(5, 10, 50, 100),
  interaction.depth = c(1,2,3, 4),
  shrinkage = 0.1,
  n.minobsinnode = 10
)

bt_fit <- train(
  shares ~ .,
  data=Training,
  method="gbm",
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "cv", number = 5)
)
## Warning in preProcess.default(method = c("center", "scale"), x = structure(c(12, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 286310951.7625             nan     0.1000 -76117.7233
##      2 282962681.6287             nan     0.1000 507170.0352
##      3 282880632.9716             nan     0.1000 -57163.8105
##      4 280754127.8342             nan     0.1000 -306999.1694
##      5 278463444.3548             nan     0.1000 273690.2509
##      6 276150745.3540             nan     0.1000 -152201.8970
##      7 274039672.2304             nan     0.1000 -844518.1717
##      8 272389463.6044             nan     0.1000 -537390.8229
##      9 269451061.7775             nan     0.1000 -608717.8499
##     10 268424874.8383             nan     0.1000 -655569.0219
##     20 259466437.7866             nan     0.1000 -2129733.1245
##     40 248619914.5133             nan     0.1000 -3015123.9923
##     60 243262475.8892             nan     0.1000 -2137014.4795
##     80 239467184.8952             nan     0.1000 -1708195.5877
##    100 236662756.2163             nan     0.1000 120433.4881
##    120 231759778.1030             nan     0.1000 -3230448.2450
##    140 227236285.7942             nan     0.1000 452018.5219
##    150 224652077.5395             nan     0.1000 -1487306.1614

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 282827431.5079             nan     0.1000 766253.2601
##      2 273481266.7657             nan     0.1000 -345034.9433
##      3 269653116.2655             nan     0.1000 -1427208.7746
##      4 270220833.7521             nan     0.1000 -1646726.9749
##      5 267164396.5393             nan     0.1000 -2242074.6337
##      6 267678579.4169             nan     0.1000 -1809366.5355
##      7 265787722.8687             nan     0.1000 -400511.4227
##      8 266884141.5152             nan     0.1000 -2838379.7247
##      9 265261976.5598             nan     0.1000 -1300235.3791
##     10 265430053.4145             nan     0.1000 -699273.4364
##     20 251214942.3203             nan     0.1000 -200633.1778
##     40 229010072.1250             nan     0.1000 -1751412.5008
##     60 203389437.6725             nan     0.1000 -1020305.2945
##     80 183014574.5583             nan     0.1000 -1610033.1437
##    100 173073394.9241             nan     0.1000 -815807.0853
##    120 151162689.1521             nan     0.1000 -719433.6448
##    140 143182032.0413             nan     0.1000 -2964806.8562
##    150 137874355.2025             nan     0.1000 -2146517.1083

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 286107242.4372             nan     0.1000 293175.4791
##      2 280740763.1760             nan     0.1000 -312712.4557
##      3 275729651.3617             nan     0.1000 -251843.6152
##      4 276117837.0671             nan     0.1000 -1370876.3594
##      5 274399250.1942             nan     0.1000 -410666.5891
##      6 266637517.5449             nan     0.1000 -2219944.7322
##      7 264803566.7398             nan     0.1000 -736067.8808
##      8 262366122.0615             nan     0.1000 -1899514.4746
##      9 261839464.4311             nan     0.1000 -1978751.4083
##     10 261539249.6497             nan     0.1000 -1953518.9821
##     20 242752375.5035             nan     0.1000 -1818714.4675
##     40 209931307.4781             nan     0.1000 -607600.9489
##     60 199194653.2343             nan     0.1000 -1963858.2487
##     80 173561088.5659             nan     0.1000 -1186834.3085
##    100 156368778.8291             nan     0.1000 -1954755.5511
##    120 142213332.8463             nan     0.1000 -767200.1665
##    140 127563125.0776             nan     0.1000 -1429875.7704
##    150 123017367.5855             nan     0.1000 -298732.2090

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 373855220.4512             nan     0.1000 -171576.0547
##      2 373553251.2663             nan     0.1000 405526.5603
##      3 373206795.9682             nan     0.1000 -202549.7838
##      4 372653111.1019             nan     0.1000 -179332.2407
##      5 372486876.9969             nan     0.1000 -165702.2983
##      6 372245436.5515             nan     0.1000 -185699.8008
##      7 371980463.2002             nan     0.1000 -248090.5992
##      8 369404986.2741             nan     0.1000 507463.7840
##      9 367578052.7164             nan     0.1000 -511523.9389
##     10 365624849.3614             nan     0.1000 -281012.9788
##     20 354163252.4734             nan     0.1000 -1728553.7533
##     40 342489902.1573             nan     0.1000 -1614734.5376
##     60 337583613.2383             nan     0.1000 -1105096.5442
##     80 330762311.5265             nan     0.1000 -2803176.7545
##    100 324397324.6587             nan     0.1000 -1616146.3504
##    120 320130243.0733             nan     0.1000 -2336785.6749
##    140 315842049.4504             nan     0.1000 -2030493.6457
##    150 313746593.1310             nan     0.1000 -2424374.6768

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 375449557.8549             nan     0.1000 -205568.1854
##      2 372277961.4090             nan     0.1000 -238882.7419
##      3 363633895.0626             nan     0.1000 -1755134.7138
##      4 363128366.6772             nan     0.1000 -225890.1661
##      5 355256425.4360             nan     0.1000 -2002716.6431
##      6 352068716.8565             nan     0.1000 556595.9261
##      7 351085711.5435             nan     0.1000 -1361485.7732
##      8 351483857.9002             nan     0.1000 -1915921.9152
##      9 349035548.4479             nan     0.1000 -3956882.3380
##     10 348074080.4855             nan     0.1000 -2834344.4972
##     20 333697338.0210             nan     0.1000 -3333023.2033
##     40 305357013.1736             nan     0.1000 -507915.2546
##     60 275532746.0461             nan     0.1000 -2413767.0637
##     80 252740475.1621             nan     0.1000 -1575674.3280
##    100 238888916.5485             nan     0.1000 -1111816.9339
##    120 230873929.0000             nan     0.1000 -1187229.9098
##    140 221827683.4528             nan     0.1000 -817592.2280
##    150 217009613.0951             nan     0.1000 -837383.0430

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 373027859.9656             nan     0.1000 -412781.4645
##      2 370091723.5580             nan     0.1000 -392839.9197
##      3 361673491.7237             nan     0.1000 -2703434.2184
##      4 359011247.1929             nan     0.1000 -915956.9047
##      5 351656225.4386             nan     0.1000 -2366038.3466
##      6 347510626.2161             nan     0.1000 -629329.3360
##      7 340609623.7895             nan     0.1000 -2477621.1481
##      8 340740022.5723             nan     0.1000 -1974364.6116
##      9 334865253.1802             nan     0.1000 -2899867.8080
##     10 331591076.7855             nan     0.1000 416075.6419
##     20 308099963.8859             nan     0.1000 -1091178.4541
##     40 283436625.7619             nan     0.1000 -2369501.1502
##     60 266521035.7377             nan     0.1000 -1242036.7243
##     80 239376465.9446             nan     0.1000 -1275558.4886
##    100 222464609.0982             nan     0.1000 -1419601.5389
##    120 203042745.4347             nan     0.1000 -3117391.7745
##    140 183312549.2614             nan     0.1000 -1201381.0496
##    150 176827511.9344             nan     0.1000 -459731.1470

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 239538677.0782             nan     0.1000 -1770.8581
##      2 236642684.5218             nan     0.1000 -317179.3756
##      3 234498530.0893             nan     0.1000 -119907.1100
##      4 232486452.7371             nan     0.1000 -93093.4510
##      5 232255636.6523             nan     0.1000 -27778.2034
##      6 230648967.2272             nan     0.1000 -968593.0943
##      7 230200148.0669             nan     0.1000 5669.3839
##      8 229871250.6884             nan     0.1000 -94478.2215
##      9 228154504.8902             nan     0.1000 -909191.5889
##     10 227950948.5368             nan     0.1000 -91825.6882
##     20 220441802.9398             nan     0.1000 -1122808.5018
##     40 216804216.4422             nan     0.1000 -349749.5888
##     60 211389604.4330             nan     0.1000 -1051509.4675
##     80 207989146.3241             nan     0.1000 -760158.6987
##    100 206166267.0766             nan     0.1000 -91676.6283
##    120 203319015.4015             nan     0.1000 493233.3327
##    140 200954183.5520             nan     0.1000 -918191.5998
##    150 200777080.1784             nan     0.1000 -1021740.2691

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 239512474.7079             nan     0.1000 -76929.0648
##      2 238965202.6995             nan     0.1000 26279.0670
##      3 235873912.9471             nan     0.1000 421814.7061
##      4 235363845.2666             nan     0.1000 463575.4125
##      5 234837632.1140             nan     0.1000 347126.8398
##      6 232593001.4046             nan     0.1000 -73028.0375
##      7 231772743.3224             nan     0.1000 220936.4702
##      8 230088459.8286             nan     0.1000 -315559.4705
##      9 228267969.4086             nan     0.1000 -1217179.7463
##     10 226676962.3435             nan     0.1000 -440978.4683
##     20 219794123.4154             nan     0.1000 -321108.1953
##     40 204660748.4865             nan     0.1000 -281666.5669
##     60 194840104.7536             nan     0.1000 -733039.0167
##     80 188659508.5330             nan     0.1000 -1768387.9348
##    100 178173586.7072             nan     0.1000 -1112312.6364
##    120 171397202.4012             nan     0.1000 -904655.5031
##    140 164157814.8576             nan     0.1000 -1412414.9368
##    150 158638821.7582             nan     0.1000 -591593.6457

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 235965922.7284             nan     0.1000 482638.9292
##      2 233255133.8616             nan     0.1000 -222344.5470
##      3 230905039.1060             nan     0.1000 -528441.4051
##      4 228890191.0639             nan     0.1000 -276990.0971
##      5 226954212.8264             nan     0.1000 -863877.7788
##      6 226251788.9568             nan     0.1000 -160532.2220
##      7 224055870.1309             nan     0.1000 -88102.3355
##      8 221952237.9246             nan     0.1000 -372358.8852
##      9 220021901.6704             nan     0.1000 91755.9032
##     10 219401642.5801             nan     0.1000 -273587.7436
##     20 208005336.9939             nan     0.1000 -700905.2758
##     40 193803157.9561             nan     0.1000 -2060811.6936
##     60 183875563.2728             nan     0.1000 -478821.2857
##     80 174828010.2961             nan     0.1000 -1025189.2421
##    100 163703239.3589             nan     0.1000 -1311610.6115
##    120 154790933.1268             nan     0.1000 -467450.9410
##    140 148251185.7556             nan     0.1000 -1530743.1478
##    150 146598578.9371             nan     0.1000 -527067.2629

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 374935951.1375             nan     0.1000 -131918.0101
##      2 370953029.1862             nan     0.1000 -286936.7164
##      3 369152006.3610             nan     0.1000 -798406.1285
##      4 367918520.9180             nan     0.1000 -1283388.8805
##      5 367108230.4191             nan     0.1000 -1184225.6920
##      6 364600906.3288             nan     0.1000 -4709102.0790
##      7 365060236.9216             nan     0.1000 -1619409.8489
##      8 364989997.3752             nan     0.1000 -1822811.0926
##      9 364119120.7049             nan     0.1000 -187826.6318
##     10 362434139.0545             nan     0.1000 1044512.8960
##     20 353216883.0372             nan     0.1000 60005.7008
##     40 342227972.3415             nan     0.1000 -1247997.5119
##     60 333603443.4748             nan     0.1000 -1168893.5563
##     80 332199593.8909             nan     0.1000 -393735.6394
##    100 330281197.4015             nan     0.1000 -1701956.8625
##    120 323512677.0882             nan     0.1000 778967.4014
##    140 315900035.4582             nan     0.1000 -1160794.1080
##    150 314692836.0084             nan     0.1000 -1106566.8126

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 372944337.6918             nan     0.1000 -145704.6798
##      2 364069318.2645             nan     0.1000 -1398666.4622
##      3 362304662.4913             nan     0.1000 -351054.2697
##      4 360198493.0612             nan     0.1000 -555822.9433
##      5 360610803.4386             nan     0.1000 -1382564.5594
##      6 359352498.8863             nan     0.1000 -633063.8856
##      7 358430587.6158             nan     0.1000 -179908.9796
##      8 358866247.6044             nan     0.1000 -1318083.1581
##      9 356257887.5396             nan     0.1000 -210584.1432
##     10 355029094.4657             nan     0.1000 -704780.3541
##     20 328118033.0431             nan     0.1000 -1286201.4708
##     40 307086027.0498             nan     0.1000 -396512.1081
##     60 286295808.9451             nan     0.1000 -2027261.5021
##     80 277743376.2950             nan     0.1000 -129537.8088
##    100 257356188.9535             nan     0.1000 -1844969.0694
##    120 236533471.6071             nan     0.1000 -1407198.8378
##    140 218831798.8086             nan     0.1000 -858533.3338
##    150 211289087.6822             nan     0.1000 -785204.5122

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 365460465.6650             nan     0.1000 -888964.9657
##      2 362118918.9240             nan     0.1000 -224786.1582
##      3 360056342.2023             nan     0.1000 -410695.4743
##      4 357931854.7212             nan     0.1000 2140086.6094
##      5 356753977.5928             nan     0.1000 -276111.6826
##      6 348917973.8965             nan     0.1000 -2377176.3434
##      7 339074697.2067             nan     0.1000 -570186.1088
##      8 337275296.2190             nan     0.1000 -463977.3387
##      9 333936147.3690             nan     0.1000 -270519.6636
##     10 331340641.2970             nan     0.1000 -634697.8100
##     20 312169382.3129             nan     0.1000 -3006399.1388
##     40 283130148.2977             nan     0.1000 -2625340.5536
##     60 248211145.0891             nan     0.1000 -578139.7662
##     80 227669005.9706             nan     0.1000 -244080.3089
##    100 204960299.8546             nan     0.1000 -2115856.1981
##    120 189091122.4248             nan     0.1000 -143152.9284
##    140 173771435.7999             nan     0.1000 -588544.9085
##    150 167670735.0858             nan     0.1000 -1052901.5226

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 252874696.7749             nan     0.1000 -132916.6042
##      2 252421407.5789             nan     0.1000 100072.8549
##      3 250020009.8939             nan     0.1000 -248314.6035
##      4 249673451.7268             nan     0.1000 93137.9411
##      5 247878849.4004             nan     0.1000 -655368.8219
##      6 247656624.8289             nan     0.1000 -110478.4343
##      7 247342884.8166             nan     0.1000 -38841.0126
##      8 246957658.4065             nan     0.1000 -165805.7050
##      9 245848145.8046             nan     0.1000 -1484325.9418
##     10 245669114.6160             nan     0.1000 -147706.7993
##     20 243187433.0503             nan     0.1000 -714229.0504
##     40 240476573.3833             nan     0.1000 -2542883.0829
##     60 235301345.2308             nan     0.1000 577822.9812
##     80 230871028.9008             nan     0.1000 -265777.1594
##    100 228425345.2860             nan     0.1000 -650181.9573
##    120 224796419.6964             nan     0.1000 -1482761.7323
##    140 222169889.0526             nan     0.1000 -1205106.6434
##    150 221579347.0535             nan     0.1000 -1652192.5361

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 250766129.8195             nan     0.1000 -120199.8395
##      2 248564022.0184             nan     0.1000 -417168.1618
##      3 243726522.7168             nan     0.1000 -454007.0291
##      4 242111959.8054             nan     0.1000 -456687.4443
##      5 241616430.6599             nan     0.1000 -92812.1462
##      6 240127714.7402             nan     0.1000 -789456.9438
##      7 239195146.8048             nan     0.1000 -947982.5507
##      8 238331670.4518             nan     0.1000 -294815.1564
##      9 237390495.1842             nan     0.1000 -1410373.3971
##     10 236918455.8150             nan     0.1000 -1407122.1551
##     20 232141494.0057             nan     0.1000 -1685499.9376
##     40 226020611.6901             nan     0.1000 -852839.3267
##     60 221400379.1324             nan     0.1000 -3243524.6944
##     80 218338375.0996             nan     0.1000 -931033.9343
##    100 215017927.5096             nan     0.1000 -1340408.6237
##    120 210638399.4758             nan     0.1000 -1902631.1702
##    140 205435228.1908             nan     0.1000 -1114159.8907
##    150 202298046.8557             nan     0.1000 -373188.6182

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 250398422.8465             nan     0.1000 -109769.0007
##      2 247603320.6953             nan     0.1000 -550827.4364
##      3 244943286.8613             nan     0.1000 -513396.4589
##      4 243312141.6676             nan     0.1000 -1544460.1232
##      5 241442627.6791             nan     0.1000 -374519.6976
##      6 239491306.3047             nan     0.1000 -457967.7883
##      7 238483208.9356             nan     0.1000 -1313128.4605
##      8 238374384.4908             nan     0.1000 -1139888.4877
##      9 237318809.6531             nan     0.1000 -2185472.8297
##     10 237341209.9695             nan     0.1000 -1540618.4168
##     20 227807774.7721             nan     0.1000 -555707.1865
##     40 211607554.4276             nan     0.1000 966712.1584
##     60 202634474.5630             nan     0.1000 -1401536.8374
##     80 187386272.6326             nan     0.1000 -487414.4154
##    100 173461093.0258             nan     0.1000 -213058.8397
##    120 164506086.3756             nan     0.1000 -888958.5489
##    140 157309970.6938             nan     0.1000 -965581.5288
##    150 155225602.3479             nan     0.1000 -1017925.7803

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 304181418.9313             nan     0.1000 520330.6038
##      2 303799814.3213             nan     0.1000 -81539.2991
##      3 301791146.8319             nan     0.1000 335795.3849
##      4 299876821.0557             nan     0.1000 -181863.5384
##      5 299647182.6686             nan     0.1000 -114851.2762
##      6 297719493.2358             nan     0.1000 -391557.0138
##      7 296399329.5459             nan     0.1000 -712681.4733
##      8 296224174.4025             nan     0.1000 96152.2242
##      9 293939404.5748             nan     0.1000 -554432.9029
##     10 292427477.2420             nan     0.1000 -112599.9810
##     20 286240995.3229             nan     0.1000 -820306.1566
##     40 279688070.1419             nan     0.1000 -1707349.1322
##     50 275159167.0077             nan     0.1000 -1306506.7414
bt_fit
## Stochastic Gradient Boosting 
## 
## 4380 samples
##   58 predictor
## 
## Pre-processing: centered (58), scaled (58) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3506, 3503, 3503, 3504, 3504 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE      Rsquared     MAE     
##   1                   50      15831.95  0.009215988  3053.250
##   1                  100      16089.59  0.010770786  3136.111
##   1                  150      16069.62  0.014203229  3170.613
##   2                   50      15978.42  0.011840449  3115.630
##   2                  100      16265.49  0.012146965  3212.661
##   2                  150      16492.86  0.010227445  3311.518
##   3                   50      16013.91  0.014552547  3187.900
##   3                  100      16410.04  0.013086433  3310.393
##   3                  150      16808.60  0.012678378  3385.395
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## Tuning parameter 'n.minobsinnode' was held
##  constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode
##  = 10.

Comparison - Jordan

Finally, let’s compare our four models: 2 linear models, 1 random forest model, and 1 boosted tree model.

# random forest prediction on testing model and its performance
predRF <- predict(ranfor, newdata = Testing)
RF <- postResample(predRF, Testing$shares)

# linear model 1 prediction on testing model and its performance
predlm1 <- predict(fit1, newdata = Testing)
LM <- postResample(predlm1, Testing$shares)

# linear model 2 prediction on testing model and its performance
predlm2 <- predict(lm_fit, newdata = Testing)
LM2 <- postResample(predlm2, Testing$shares)

# boosted tree prediction on testing model and its performance
predbt <- predict(bt_fit, newdata = Testing)
BT <- postResample(predbt, Testing$shares)

# combine each of the performance stats for the models and add a column with the model names
dat <- data.frame(rbind(t(data.frame(LM)), t(data.frame(RF)), t(data.frame(LM2)), t(data.frame(BT))))
df <- as_tibble(rownames_to_column(dat, "models"))

# find the model with the lowesr RMSE
best <- df %>% filter(RMSE == min(RMSE)) %>% select(models)

# print "The Best fitting model according to RMSE is [insert model name for lowest RMSE here]"
paste("The Best fitting model according to RMSE is", best$models, sep = " ")
## [1] "The Best fitting model according to RMSE is RF"

Automation - Jonathan

#rmarkdown::render(
#  "Tanley-Wood-Project2.Rmd",
#  output_format="github_document",
#  output_dir="./Analysis",
#  output_options = list(
#    html_preview = FALSE
#  )
#)