View on GitHub

Tanley-Wood-Project2

Tanley-Wood-Project2

Jordan Tanley and Jonathan Wood 2022-07-05

Introduction - Jonathan

Data

The data in this analysis will be the online news popularity dataset. This data has a set of features on articles from Mashable.com over a two year period.

The goal of this project is to determine the number of shares (how many times the article was shared over social media) the article has. We will use this information to predict if an article can be popular by the number of shares.

Notable Variables

While there are 61 variables in the data set, we will not use all of them for this project. The notable variables are the following:

Methods

Multiple methods will be used for this project to predict the number of shares a new article can generate, including

Data - Jordan

In order to read in the data using a relative path, be sure to have the data file saved in your working directory.

# read in the data
news <- read_csv("OnlineNewsPopularity/OnlineNewsPopularity.csv")
## Rows: 39644 Columns: 61
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): url
## dbl (60): timedelta, n_tokens_title, n_tokens_content, n_unique_tokens, n_non_stop_words, n_non_stop_unique_token...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# sneek peek at the dataset
head(news)
# Creating a weekday variable (basically undoing the 7 dummy variables that came with the data) for EDA
news$weekday <- ifelse(news$weekday_is_friday == 1, "Friday",
                       ifelse(news$weekday_is_monday == 1, "Monday",
                              ifelse(news$weekday_is_tuesday == 1, "Tuesday",
                                     ifelse(news$weekday_is_wednesday == 1, "Wednesday",
                                            ifelse(news$weekday_is_thursday == 1, "Thursday",
                                                   ifelse(news$weekday_is_saturday == 1, "Saturday", 
                                                          "Sunday"))))))

Next, let’s subset the data so that we can only look at the data channel of interest. We will look at articles with the “Social Media” data channel.

# Subset the data to  one of the parameterized data channels and drop unnecessary variables
chan <- paste0("data_channel_is_", params$channel)

print(chan)
## [1] "data_channel_is_world"
filtered_channel <- news %>% 
                as_tibble() %>% 
                filter(news[chan] == 1) %>% 
                select(-c(url, timedelta))

# take a peek at the data
filtered_channel %>%
  select(ends_with(chan))

Summarizations - Both (at least 3 plots each)

For the numerical summaries, we can look at several aspects. Contingency tables allow us to examine frequencies of categorical variables. The first output below, for example, shows the counts for each weekday. Similarly, the fifth table outputted shows the frequencies of number of tokens in the article content. Another set of summary statistics to look at are the 5 Number Summaries. These provide the minmum, 1st quantile, median, 3rd quantile, and maximum for a particular variable. Additionally, it may also be helful to look at the average. These are helpful in determining the skewness (if mean = median vs. mean < or > median) and helps in looking for outliers (anything outside (Q3 - Q1)1.5 from the median is generally considered an outlier). Below, the 5 Number summaries (plus mean) are shown for Shares, Number of words in the content, Number of words in the content for the upper quantile of Shares, number of images in the article, number of videos in the article, positive word rate, and negative word rate.

# Contingency table of frequencies for days of the week, added caption for clarity
kable(table(filtered_channel$weekday), 
      col.names = c("Weekday", "Frequency"), 
      caption = "Contingency table of frequencies for days of the week")
Weekday Frequency
Friday 1305
Monday 1356
Saturday 519
Sunday 567
Thursday 1569
Tuesday 1546
Wednesday 1565

Contingency table of frequencies for days of the week

# Numerical Summary of Shares, added caption for clarity
filtered_channel %>% summarise(Minimum = min(shares), 
                          Q1 = quantile(shares, prob = 0.25), 
                          Average = mean(shares), 
                          Median = median(shares), 
                          Q3 = quantile(shares, prob = 0.75), 
                          Maximum = max(shares)) %>% 
                kable(caption = "Numerical Summary of Shares")
Minimum Q1 Average Median Q3 Maximum
35 827 2287.734 1100 1900 284700

Numerical Summary of Shares

# Numerical Summary of Number of words in the content, added caption for clarity
filtered_channel %>% summarise(Minimum = min(n_tokens_content), 
                          Q1 = quantile(n_tokens_content, prob = 0.25), 
                          Average = mean(n_tokens_content), 
                          Median = median(n_tokens_content), 
                          Q3 = quantile(n_tokens_content, prob = 0.75), 
                          Maximum = max(n_tokens_content)) %>% 
                kable(caption = "Numerical Summary of Number of words in the content")
Minimum Q1 Average Median Q3 Maximum
0 332 597.2814 509 768 7081

Numerical Summary of Number of words in the content

# Numerical Summary of Number of words in the content for the upper quantile of Shares, added caption for clarity
filtered_channel %>% filter(shares > quantile(shares, prob = 0.75)) %>%
                summarise(Minimum = min(n_tokens_content), 
                          Q1 = quantile(n_tokens_content, prob = 0.25), 
                          Average = mean(n_tokens_content), 
                          Median = median(n_tokens_content), 
                          Q3 = quantile(n_tokens_content, prob = 0.75), 
                          Maximum = max(n_tokens_content)) %>% 
                kable(caption = "Numerical Summary of Number of words in the content for the upper quantile of Shares")
Minimum Q1 Average Median Q3 Maximum
0 303 598.1955 476 761 4661

Numerical Summary of Number of words in the content for the upper quantile of Shares

kable(table(filtered_channel$n_tokens_content),
  col.names = c("Tokens", "Frequency"), 
  caption = "Contingency table of frequencies for number of tokens in the article content")
Tokens Frequency
0 259
29 1
32 1
34 2
37 1
39 1
41 2
42 2
47 1
48 1
51 1
53 1
55 1
56 1
57 2
59 1
60 1
63 2
64 1
65 1
71 2
72 1
73 2
75 1
76 1
77 1
79 1
80 2
82 1
83 1
84 1
85 2
87 2
90 1
91 1
92 1
94 1
95 2
96 3
97 2
98 3
100 1
101 1
102 4
103 4
104 3
105 2
107 3
108 2
109 4
110 1
111 5
112 1
113 3
114 8
115 2
116 3
117 2
118 6
119 1
120 1
121 5
122 2
123 2
124 1
125 3
126 6
127 4
128 4
129 2
130 2
131 2
132 2
134 4
135 6
136 3
137 4
138 6
139 3
140 3
141 4
142 5
143 6
144 6
145 1
146 2
147 7
148 6
149 8
150 3
151 3
152 1
153 7
154 5
155 7
156 9
157 6
158 4
159 5
160 2
161 4
162 5
163 2
164 5
165 6
166 6
167 3
168 3
169 2
170 3
171 8
172 4
173 1
174 2
175 8
176 3
177 2
178 2
179 5
180 6
181 8
182 4
184 7
185 4
186 6
187 10
188 3
189 3
190 11
191 7
192 7
193 5
194 7
195 3
196 9
197 5
198 6
199 8
200 6
201 3
202 3
203 6
204 6
205 9
206 6
207 9
208 7
209 13
210 4
211 7
212 9
213 6
214 8
215 9
216 4
217 7
218 5
219 5
220 7
221 12
222 11
223 11
224 8
225 18
226 10
227 5
228 10
229 13
230 12
231 13
232 8
233 9
234 12
235 7
236 14
237 11
238 11
239 14
240 10
241 10
242 9
243 2
244 6
245 7
246 9
247 10
248 9
249 10
250 9
251 10
252 8
253 13
254 7
255 11
256 10
257 4
258 14
259 12
260 9
261 13
262 9
263 12
264 8
265 7
266 6
267 7
268 7
269 9
270 9
271 11
272 10
273 11
274 7
275 12
276 16
277 7
278 7
279 16
280 11
281 17
282 10
283 11
284 9
285 11
286 15
287 7
288 10
289 11
290 7
291 8
292 18
293 9
294 13
295 15
296 11
297 9
298 13
299 16
300 11
301 13
302 11
303 19
304 11
305 9
306 14
307 13
308 9
309 8
310 8
311 14
312 10
313 15
314 8
315 19
316 17
317 19
318 11
319 8
320 19
321 10
322 12
323 17
324 13
325 16
326 10
327 17
328 16
329 14
330 15
331 10
332 14
333 17
334 10
335 16
336 16
337 13
338 15
339 13
340 14
341 13
342 13
343 9
344 13
345 11
346 14
347 12
348 15
349 11
350 15
351 12
352 11
353 8
354 15
355 15
356 9
357 14
358 19
359 6
360 11
361 14
362 15
363 17
364 14
365 8
366 11
367 12
368 14
369 10
370 12
371 9
372 13
373 11
374 13
375 16
376 14
377 13
378 8
379 13
380 16
381 11
382 8
383 13
384 14
385 13
386 10
387 7
388 14
389 7
390 15
391 17
392 18
393 17
394 13
395 20
396 10
397 13
398 7
399 13
400 9
401 13
402 11
403 11
404 18
405 11
406 12
407 7
408 15
409 9
410 13
411 13
412 6
413 14
414 7
415 15
416 16
417 16
418 15
419 13
420 12
421 14
422 9
423 10
424 13
425 12
426 4
427 8
428 13
429 17
430 7
431 12
432 14
433 11
434 17
435 15
436 14
437 20
438 13
439 10
440 12
441 16
442 10
443 11
444 15
445 15
446 12
447 10
448 10
449 13
450 15
451 13
452 15
453 19
454 11
455 6
456 9
457 8
458 10
459 12
460 22
461 16
462 6
463 10
464 11
465 10
466 4
467 7
468 11
469 11
470 11
471 8
472 15
473 11
474 7
475 9
476 15
477 13
478 3
479 10
480 8
481 8
482 7
483 9
484 11
485 5
486 12
487 12
488 13
489 10
490 17
491 7
492 10
493 12
494 6
495 12
496 12
497 5
498 12
499 18
500 7
501 17
502 14
503 8
504 10
505 13
506 11
507 9
508 11
509 18
510 10
511 8
512 10
513 14
514 9
515 12
516 12
517 13
518 11
519 13
520 13
521 5
522 13
523 7
524 6
525 18
526 16
527 13
528 5
529 8
530 13
531 18
532 12
533 13
534 12
535 5
536 8
537 14
538 7
539 12
540 11
541 11
542 8
543 12
544 11
545 8
546 10
547 8
548 12
549 7
550 14
551 9
552 13
553 10
554 15
555 7
556 11
557 7
558 7
559 13
560 14
561 6
562 14
563 4
564 14
565 9
566 7
567 5
568 9
569 9
570 13
571 10
572 8
573 12
574 6
575 11
576 10
577 13
578 6
579 12
580 10
581 9
582 8
583 6
584 2
585 8
586 8
587 9
588 11
589 6
590 11
591 5
592 8
593 14
594 10
595 9
596 8
597 6
598 8
599 5
600 5
601 8
602 17
603 13
604 10
605 9
606 9
607 4
608 6
609 12
610 8
611 15
612 8
613 14
614 10
615 14
616 13
617 6
618 4
619 5
620 5
621 6
622 8
623 6
624 5
625 5
626 7
627 7
628 11
629 8
630 8
631 8
632 8
633 9
634 6
635 4
636 8
637 6
638 7
639 9
640 7
641 6
642 20
643 9
644 7
645 10
646 4
647 10
648 16
649 8
650 3
651 6
652 12
653 8
654 10
655 8
656 10
657 11
658 10
659 5
660 6
661 7
662 8
663 6
664 5
665 6
666 4
667 9
668 4
669 6
670 7
671 4
672 5
673 6
674 5
675 5
676 8
677 11
678 5
679 2
680 9
681 7
682 14
683 7
684 6
685 3
686 11
687 4
688 6
689 4
690 11
691 4
692 5
693 7
694 4
695 13
696 9
697 7
698 7
699 6
700 4
701 8
702 5
703 9
704 6
705 4
706 5
707 8
708 10
709 5
710 10
711 9
712 4
713 8
714 6
715 7
716 11
717 5
718 7
719 6
720 5
721 6
722 7
723 10
724 11
725 6
726 8
727 3
728 7
729 5
730 9
731 10
732 6
733 7
734 2
735 10
736 4
737 8
738 7
739 10
740 4
741 7
742 5
743 3
744 12
745 7
746 2
747 5
748 8
749 4
750 9
751 6
752 3
753 5
754 6
755 6
756 5
757 8
758 2
759 3
760 5
761 11
762 5
763 5
764 5
765 8
766 1
767 2
768 4
769 6
770 5
771 5
772 11
773 7
774 3
775 1
776 8
777 5
778 6
779 3
780 4
781 6
782 3
783 6
784 7
785 9
786 5
787 11
788 7
789 5
790 9
791 6
792 9
793 6
794 7
795 6
796 7
797 7
798 8
799 9
800 7
801 4
802 9
803 5
804 12
805 3
806 6
807 4
808 7
809 7
810 8
811 7
812 8
813 3
814 5
815 7
816 3
817 4
818 6
819 7
820 2
821 7
822 1
823 6
824 5
825 7
826 9
827 5
828 7
829 4
830 6
831 2
832 5
833 2
834 6
835 5
836 3
837 2
838 11
839 4
840 4
841 5
842 7
843 3
844 4
845 2
846 3
847 8
848 6
849 3
850 5
851 4
852 8
853 2
854 7
855 1
856 2
857 2
858 5
859 1
860 8
861 6
862 6
863 5
864 4
865 4
866 4
867 3
869 4
870 4
871 3
872 2
873 6
874 9
875 2
876 2
877 8
878 9
879 2
880 3
881 1
882 3
883 2
884 3
885 4
886 4
887 3
888 3
889 4
890 5
891 3
892 6
893 10
894 2
895 7
896 3
897 3
898 3
899 9
900 5
901 4
902 6
903 1
904 4
905 1
906 6
908 4
909 1
910 2
911 3
912 3
913 7
914 5
915 3
916 3
917 4
918 6
919 6
920 3
921 7
922 2
923 6
924 4
925 3
926 3
927 3
928 1
929 5
930 5
931 2
932 4
933 1
934 3
935 1
936 3
937 2
938 5
939 3
940 5
941 3
942 1
943 6
944 6
945 4
946 4
947 6
948 4
949 3
950 1
951 6
952 3
953 2
954 5
955 5
956 7
957 4
958 6
959 2
960 2
961 7
962 4
963 4
964 5
965 6
966 4
967 2
968 2
969 5
970 2
971 4
972 3
973 4
974 7
975 6
976 3
977 5
978 4
979 6
980 9
981 1
982 2
983 3
984 4
985 1
986 3
987 1
988 3
989 5
990 5
991 5
992 7
994 3
995 2
996 2
997 5
998 2
1000 6
1001 2
1002 2
1004 3
1005 4
1006 2
1008 3
1009 1
1010 5
1011 4
1012 6
1013 3
1014 4
1015 2
1016 4
1017 2
1018 2
1019 5
1020 1
1021 2
1022 2
1023 2
1024 3
1025 3
1026 7
1027 1
1028 2
1029 5
1030 3
1031 2
1032 3
1033 1
1034 4
1035 1
1037 8
1038 2
1039 2
1040 5
1041 4
1042 8
1043 4
1044 3
1045 4
1046 3
1047 2
1048 1
1049 5
1050 5
1051 3
1052 2
1053 2
1054 1
1055 2
1056 3
1057 4
1058 1
1059 3
1060 3
1061 4
1062 2
1063 2
1065 7
1066 4
1067 4
1068 5
1069 5
1070 3
1071 5
1072 2
1073 1
1074 4
1076 5
1077 3
1078 3
1079 1
1080 3
1081 4
1082 3
1083 2
1084 1
1085 4
1086 2
1087 1
1088 3
1089 4
1090 3
1091 2
1092 4
1093 3
1094 3
1095 1
1096 7
1097 2
1098 3
1100 3
1101 3
1104 2
1105 1
1106 3
1108 2
1109 2
1110 5
1111 2
1112 3
1113 3
1114 2
1115 2
1116 3
1117 2
1118 3
1119 2
1120 1
1121 4
1122 1
1123 3
1124 1
1125 1
1126 1
1127 2
1128 4
1129 3
1131 3
1132 1
1133 3
1134 4
1135 1
1136 2
1137 3
1138 2
1139 2
1140 2
1141 2
1142 1
1143 1
1144 1
1145 4
1146 1
1147 2
1148 2
1149 1
1150 2
1151 5
1152 1
1153 1
1154 2
1155 5
1156 1
1158 3
1159 1
1161 2
1162 5
1163 2
1164 2
1165 1
1166 1
1167 2
1169 4
1172 3
1174 4
1175 2
1176 2
1177 3
1178 1
1179 1
1180 2
1181 1
1182 4
1183 2
1184 1
1185 1
1186 1
1187 2
1188 3
1189 1
1190 1
1191 2
1193 3
1194 1
1195 1
1196 1
1197 5
1198 2
1199 3
1200 1
1202 3
1204 2
1205 3
1206 2
1207 5
1208 1
1209 3
1210 1
1211 2
1212 1
1213 2
1214 3
1215 3
1216 1
1217 2
1218 1
1219 2
1220 1
1222 4
1223 3
1224 1
1225 1
1226 3
1228 3
1229 1
1230 1
1231 1
1232 1
1233 3
1234 1
1235 1
1237 1
1238 1
1239 1
1240 1
1242 1
1244 1
1245 1
1246 1
1247 1
1248 3
1249 1
1250 2
1251 4
1252 3
1253 3
1254 1
1255 2
1256 1
1257 2
1258 3
1259 1
1260 1
1261 3
1262 1
1265 1
1266 2
1267 1
1269 1
1270 1
1274 2
1275 2
1279 1
1280 1
1281 1
1282 1
1284 1
1286 1
1287 3
1288 1
1289 1
1290 1
1291 1
1292 1
1293 2
1294 3
1295 1
1296 2
1298 1
1299 1
1300 3
1302 1
1303 2
1305 2
1306 1
1307 2
1309 3
1310 1
1312 2
1315 1
1316 2
1318 1
1319 1
1320 2
1324 1
1325 2
1330 2
1331 1
1332 1
1333 1
1334 1
1335 2
1337 2
1339 1
1340 1
1341 3
1343 2
1344 3
1350 1
1351 2
1352 1
1356 2
1357 1
1358 3
1360 1
1361 1
1363 1
1368 1
1370 2
1372 3
1373 3
1374 2
1375 2
1376 2
1377 4
1378 1
1379 1
1380 1
1382 2
1383 3
1384 2
1388 1
1389 1
1390 1
1392 3
1394 1
1396 1
1397 1
1398 1
1400 2
1401 1
1405 1
1406 2
1409 1
1410 2
1411 1
1412 1
1413 1
1414 1
1416 1
1417 3
1418 2
1421 2
1423 1
1424 1
1427 1
1428 5
1431 2
1432 1
1433 2
1434 1
1436 2
1439 2
1440 1
1441 2
1443 1
1445 1
1446 2
1447 3
1449 2
1451 1
1453 1
1456 1
1457 1
1458 1
1462 1
1463 1
1464 1
1465 1
1468 1
1469 1
1472 1
1473 2
1476 1
1477 1
1481 1
1483 1
1484 1
1486 1
1489 1
1491 1
1492 1
1494 1
1497 1
1498 1
1499 1
1500 1
1501 1
1503 1
1504 1
1506 1
1515 1
1516 1
1517 1
1520 3
1521 1
1522 3
1524 2
1531 1
1532 1
1536 3
1537 2
1540 1
1541 1
1542 1
1543 1
1552 1
1553 2
1558 1
1567 1
1568 1
1569 2
1571 2
1572 1
1577 1
1579 1
1580 1
1582 1
1589 1
1598 1
1605 1
1608 2
1609 3
1612 4
1614 1
1616 1
1620 3
1625 1
1627 1
1632 1
1634 1
1637 2
1638 1
1640 2
1642 1
1643 1
1650 2
1651 1
1653 2
1656 2
1665 1
1668 1
1669 1
1670 1
1671 1
1678 1
1682 1
1684 1
1687 2
1690 2
1691 1
1694 2
1696 2
1699 1
1708 1
1709 2
1714 1
1719 1
1721 3
1728 1
1737 1
1744 1
1747 1
1751 1
1757 1
1765 2
1767 1
1771 1
1776 1
1786 1
1789 1
1800 1
1807 1
1814 2
1816 1
1821 1
1823 2
1824 1
1825 1
1827 1
1831 1
1834 1
1837 1
1844 1
1848 1
1864 1
1881 1
1885 1
1888 1
1891 2
1894 2
1896 1
1898 1
1903 1
1905 1
1920 1
1921 1
1922 1
1924 1
1938 1
1949 1
1959 1
1961 1
1964 1
1973 1
1977 2
1982 1
1985 1
1995 1
2004 1
2011 1
2028 2
2037 1
2040 1
2042 1
2047 1
2048 1
2066 1
2068 1
2069 1
2078 1
2088 1
2093 1
2095 1
2096 1
2105 1
2117 1
2133 1
2136 1
2137 1
2142 1
2156 1
2164 1
2183 1
2189 1
2200 1
2204 1
2237 1
2251 1
2255 1
2258 1
2314 1
2318 1
2324 1
2325 1
2334 1
2361 1
2365 1
2380 1
2385 1
2397 1
2399 1
2421 1
2424 1
2432 1
2439 1
2470 1
2490 1
2495 1
2513 1
2517 1
2521 1
2523 2
2556 1
2610 1
2649 1
2651 1
2689 2
2704 1
2708 1
2727 1
2732 1
2747 1
2793 1
2844 1
2889 1
2951 1
2957 1
2958 1
2997 1
3037 1
3125 1
3153 1
3220 1
3259 1
3290 1
3326 1
3332 1
3376 1
3467 1
3510 1
3547 1
3662 1
3810 1
3888 1
3905 1
4110 1
4155 1
4172 1
4331 1
4462 1
4661 1
7081 1

Contingency table of frequencies for number of tokens in the article content

# Summarizing the number of images in the article
filtered_channel %>% 
  summarise(Minimum = min(num_imgs), 
      Q1 = quantile(num_imgs, prob = 0.25), 
      Average = mean(num_imgs), 
      Median = median(num_imgs), 
      Q3 = quantile(num_imgs, prob = 0.75), 
      Maximum = max(num_imgs)) %>% 
  kable(caption = "Numerical summary of number of images in an article")
Minimum Q1 Average Median Q3 Maximum
0 1 2.841225 1 2 100

Numerical summary of number of images in an article

# Summarizing the number of videos in the article
filtered_channel %>% 
  summarise(Minimum = min(num_videos), 
      Q1 = quantile(num_videos, prob = 0.25), 
      Average = mean(num_videos), 
      Median = median(num_videos), 
      Q3 = quantile(num_videos, prob = 0.75), 
      Maximum = max(num_videos)) %>% 
  kable(caption = "Numerical summary of number of videos in an article")
Minimum Q1 Average Median Q3 Maximum
0 0 0.5495431 0 1 51

Numerical summary of number of videos in an article

# Summarizing the number of positive word rate
filtered_channel %>% 
  summarise(Minimum = min(rate_positive_words), 
      Q1 = quantile(rate_positive_words, prob = 0.25), 
      Average = mean(rate_positive_words), 
      Median = median(rate_positive_words), 
      Q3 = quantile(rate_positive_words, prob = 0.75), 
      Maximum = max(rate_positive_words)) %>% 
  kable(caption = "Numerical Summary of the rate of positive words in an article")
Minimum Q1 Average Median Q3 Maximum
0 0.5357143 0.6233722 0.6428571 0.7416574 1

Numerical Summary of the rate of positive words in an article

# Summarizing the number of negative word rate
filtered_channel %>% 
  summarise(Minimum = min(rate_negative_words), 
      Q1 = quantile(rate_negative_words, prob = 0.25), 
      Average = mean(rate_negative_words), 
      Median = median(rate_negative_words), 
      Q3 = quantile(rate_negative_words, prob = 0.75), 
      Maximum = max(rate_negative_words)) %>% 
  kable(caption = "Numerical Summary of the rate of negative words in an article")
Minimum Q1 Average Median Q3 Maximum
0 0.25 0.3458933 0.3461538 0.4482759 1

Numerical Summary of the rate of negative words in an article

The graphical summaries more dramatically show the trends in the data, including skewness and outliers. The boxplots below show a visual representation of the 5 Number summaries for Shares, split up by weekday, and shares split up by text sentiment polarity. Boxplots make it even easier to look out for outliers (look for the dots separated from the main boxplot). Next, we can examine several scatterplots. Scatterplots allow us to look at one numerical variable vs another to see if there is any correlation between them. Look out for any plots that have most of the points on a diagonal line! There are four scatterplots below, investigating shares vs Number of words in the content, Number of words in the title, rate of positive words, and rate of negative words. Finally, a histogram can show the overall distribution of a numerical variable, including skewness. The histogram below sows the distribution of the shares variable. Look for a left or right tail to signify skewness, and look out for multiple peaks to signify a multi-modal variable.

# Boxplot of Shares for Each Weekday, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = weekday, y = shares)) + 
          geom_boxplot(fill = "grey") + 
          labs(x = "Weekday", title = "Boxplot of Shares for Each Weekday", y = "Shares") + 
          theme_classic()

# Scatterplot of Number of words in the content vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_content, y = shares)) + 
          geom_point(color = "grey") +
          labs(x = "Number of words in the content", y = "Shares", 
               title = "Scatterplot of Number of words in the content vs Shares") +
          theme_classic()

# Scatterplot of Number of words in the title vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_title, y = shares)) + 
          geom_point(color = "grey") +
          labs(x = "Number of words in the title", y = "Shares", 
               title = "Scatterplot of Number of words in the title vs Shares") +
          theme_classic()

ggplot(filtered_channel, aes(x=shares)) +
  geom_histogram(color="grey", binwidth = 2000) +
  labs(x = "Shares", 
               title = "Histogram of number of shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=rate_positive_words, y=shares)) +
  geom_point(color="grey") +
  labs(x = "rate of positive words in an article", y = "Shares", 
               title = "Scatterplot of rate of positive words in an article vs shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=rate_negative_words, y=shares)) +
  geom_point(color="grey") +
  labs(x = "rate of negative words in an article", y = "Shares", 
               title = "Scatterplot of rate of negative words in an article vs shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=global_sentiment_polarity, y=shares)) +
  geom_point(color="grey") +
  labs(x = "global sentiment polarity in an article", y = "Shares", 
               title = "Scatterplot of global sentiment polarity in an article vs shares") +
  theme_classic()

# drop the weekday variable created for EDA (will get in the way for our models if we don't drop it)
filtered_channel <- subset(filtered_channel, select = -c(weekday))

Modeling

Splitting the Data

First, let’s split up the data into a testing set and a training set using the proportions: 70% training and 30% testing.

set.seed(9876)
# Split the data into a training and test set (70/30 split)
# indices
train <- sample(1:nrow(filtered_channel), size = nrow(filtered_channel)*.70)
test <- setdiff(1:nrow(filtered_channel), train)

# training and testing subsets
Training <- filtered_channel[train, ]
Testing <- filtered_channel[test, ]

Linear Models

Linear regression models allow us to look at relationships between one response variable and several explanatory variables. A model can also include interaction terms and even higher order terms. The general form for a linear model is Y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + E_i, where each x_i represents a predictor variable and the “…” can include more predictors, interactions and/or higher order terms. Since our goal is to predict shares, we will be using these models to predict of a subset of the data created for training, and then we will later test the models on the other subsetted data set aside for testing.

Linear Model #1: - Jordan

# linear model on training dataset with 5-fold cv
fit1 <- train(shares ~ . , data = Training, method = "lm",
              preProcess = c("center", "scale"), 
              trControl = trainControl(method = "cv", number = 5))

Linear Model #2: - Jonathan

lm_fit <- train(
  shares ~ .^2,
  data=Training,
  method="lm",
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "cv", number = 5)
)

Random Forest - Jordan

Random Forest is a tree based method for fitting predictive models, that averages across all trees. One may choose to use a tree based methood due to their prediction accuracy, the fact that predictors do not need to be scaled, no statistical assumptions, and a built-in variable selection process. Random forest, in particular, randomly selects a subset of m = p / 3 predictors. This corrects the bagging issue where every bootstrap contains a strong predictor for the first split.

# fandom forest model on training dataset with 5-fold cv
ranfor <- train(shares ~ ., data = Training, method = "rf", preProcess = c("center", "scale"),
                trControl = trainControl(method = "cv", number = 5), 
                tuneGrid = expand.grid(mtry = c(1:round(ncol(Training)/3))))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
ranfor
## Random Forest 
## 
## 5898 samples
##   58 predictor
## 
## Pre-processing: centered (58), scaled (58) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 4720, 4718, 4718, 4718, 4718 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared    MAE     
##    1    5345.573  0.02647603  1870.972
##    2    5322.853  0.03188500  1881.222
##    3    5335.355  0.03149379  1913.333
##    4    5359.654  0.02785236  1931.492
##    5    5361.715  0.02918497  1945.100
##    6    5380.631  0.02673864  1959.963
##    7    5381.394  0.02792917  1962.778
##    8    5391.320  0.02716143  1976.360
##    9    5398.391  0.02676870  1974.328
##   10    5406.409  0.02697802  1984.644
##   11    5413.265  0.02564483  1985.580
##   12    5418.456  0.02629804  1996.720
##   13    5429.877  0.02537852  2002.805
##   14    5434.809  0.02435986  1998.388
##   15    5446.111  0.02321781  2005.398
##   16    5447.412  0.02366329  2005.893
##   17    5461.479  0.02265279  2011.009
##   18    5460.770  0.02299980  2021.381
##   19    5462.733  0.02246101  2017.977
##   20    5469.755  0.02265865  2021.448
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 2.

Boosted Tree - Jonathan

tune_grid <- expand.grid(
  n.trees = c(5, 10, 50, 100),
  interaction.depth = c(1,2,3, 4),
  shrinkage = 0.1,
  n.minobsinnode = 10
)

bt_fit <- train(
  shares ~ .,
  data=Training,
  method="gbm",
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "cv", number = 5)
)
## Warning in preProcess.default(method = c("center", "scale"), x = structure(c(12, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 29264604.7162             nan     0.1000 61836.0089
##      2 29196359.9080             nan     0.1000 -7929.6096
##      3 29108711.9370             nan     0.1000 45402.8961
##      4 28997581.9665             nan     0.1000 53286.2615
##      5 28909376.8021             nan     0.1000 39659.7637
##      6 28823682.0484             nan     0.1000 33892.0311
##      7 28766984.2616             nan     0.1000 31692.5585
##      8 28716839.8662             nan     0.1000 -10845.0167
##      9 28653933.7658             nan     0.1000 25398.5578
##     10 28609847.6974             nan     0.1000 32222.4909
##     20 28147125.4339             nan     0.1000 29887.6417
##     40 27679620.2654             nan     0.1000 -15302.4912
##     60 27426876.6636             nan     0.1000 2454.2492
##     80 27228651.4542             nan     0.1000  905.0252
##    100 27095830.9434             nan     0.1000 -10987.2975
##    120 26930200.0818             nan     0.1000 -22808.9263
##    140 26771227.8311             nan     0.1000 -17589.8773
##    150 26679120.2772             nan     0.1000 -6478.9072

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 29188203.1607             nan     0.1000 19519.4706
##      2 28958451.2920             nan     0.1000 -6418.2257
##      3 28804927.1331             nan     0.1000 44516.8021
##      4 28435075.7068             nan     0.1000 76776.5898
##      5 28299753.4646             nan     0.1000 56152.3688
##      6 28145609.6246             nan     0.1000 -2319.1242
##      7 28031954.7424             nan     0.1000 -6194.2521
##      8 27910710.0460             nan     0.1000 65574.0219
##      9 27789784.7100             nan     0.1000 -51637.0581
##     10 27700721.3489             nan     0.1000 2241.5853
##     20 26627588.6009             nan     0.1000 11644.1148
##     40 25319022.9876             nan     0.1000 -10858.6012
##     60 24240808.0393             nan     0.1000 -11632.0009
##     80 23630685.2330             nan     0.1000 -27574.3158
##    100 23175042.9926             nan     0.1000 -46281.8703
##    120 22639910.5941             nan     0.1000 -62013.0875
##    140 22069985.5322             nan     0.1000 27742.9889
##    150 21755275.5963             nan     0.1000 -10251.6188

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 29020059.8034             nan     0.1000 272682.6076
##      2 28765378.5977             nan     0.1000 93606.1204
##      3 28666996.4149             nan     0.1000 33145.4331
##      4 28342731.9627             nan     0.1000 10853.0874
##      5 28154604.6211             nan     0.1000 -6602.7392
##      6 27977328.4967             nan     0.1000 11129.9392
##      7 27857186.6472             nan     0.1000 18691.3107
##      8 27561765.8823             nan     0.1000 47333.1438
##      9 27237951.8906             nan     0.1000 -103420.6852
##     10 27013336.0118             nan     0.1000 -6852.3875
##     20 25333508.1366             nan     0.1000 -54839.2357
##     40 23626826.1727             nan     0.1000 -87906.7666
##     60 22387312.2050             nan     0.1000 -59587.8955
##     80 21506441.2696             nan     0.1000 -19707.6403
##    100 20575345.8013             nan     0.1000 -21328.8213
##    120 19626346.5377             nan     0.1000 -34383.7622
##    140 18981322.8675             nan     0.1000 -16431.6213
##    150 18768626.6481             nan     0.1000 -53189.5615

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30932049.0055             nan     0.1000 63679.8847
##      2 30805189.8539             nan     0.1000 89899.8494
##      3 30686598.6889             nan     0.1000 37778.0633
##      4 30641901.2560             nan     0.1000 21196.3586
##      5 30557926.0879             nan     0.1000 54681.4020
##      6 30523586.8417             nan     0.1000 -8887.5946
##      7 30474151.4663             nan     0.1000 13461.7504
##      8 30374724.9624             nan     0.1000 62785.8326
##      9 30304077.2970             nan     0.1000 22654.4562
##     10 30208006.5521             nan     0.1000 -14425.6910
##     20 29721785.8435             nan     0.1000 -18903.2676
##     40 29177934.6202             nan     0.1000  602.1123
##     60 28800170.5423             nan     0.1000 -76058.4376
##     80 28530404.5330             nan     0.1000 4470.0985
##    100 28261767.7557             nan     0.1000 -38580.1477
##    120 28085700.9562             nan     0.1000 -40795.6669
##    140 27889514.5927             nan     0.1000 19323.3526
##    150 27771878.6679             nan     0.1000 -58108.1982

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30934703.5984             nan     0.1000 35669.6516
##      2 30582029.6190             nan     0.1000 -7066.7822
##      3 30361696.5492             nan     0.1000 112492.7280
##      4 30053020.9030             nan     0.1000 20918.5515
##      5 29815814.4801             nan     0.1000 -113.9750
##      6 29504133.9788             nan     0.1000 38243.9020
##      7 29333249.5731             nan     0.1000 109382.9248
##      8 29185894.7646             nan     0.1000 61430.6242
##      9 29099121.6130             nan     0.1000 32619.9296
##     10 28843394.9850             nan     0.1000 -25989.9709
##     20 27766112.8005             nan     0.1000 20810.0116
##     40 26187314.7564             nan     0.1000 -29628.7604
##     60 25508875.1951             nan     0.1000 -43108.6459
##     80 24814815.2366             nan     0.1000 -52659.4604
##    100 24264657.2770             nan     0.1000 -62891.9210
##    120 23869060.5724             nan     0.1000 -37029.8676
##    140 23498877.6033             nan     0.1000 -37555.6471
##    150 23255795.1341             nan     0.1000 -4910.4751

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30812317.1735             nan     0.1000 126038.6600
##      2 30395096.1463             nan     0.1000  503.1965
##      3 30013498.9250             nan     0.1000 75662.7102
##      4 29791863.0498             nan     0.1000 1811.6108
##      5 29646392.5901             nan     0.1000 -11546.4511
##      6 29427565.5737             nan     0.1000 -5121.8606
##      7 29284607.7766             nan     0.1000 -25663.5292
##      8 29178963.3859             nan     0.1000 -10107.5054
##      9 28885018.5066             nan     0.1000 54445.3816
##     10 28709984.3348             nan     0.1000 105473.1604
##     20 26793323.8797             nan     0.1000 -44325.3534
##     40 24971873.7217             nan     0.1000 -135088.6769
##     60 23729846.1352             nan     0.1000 -9487.3679
##     80 22679254.7988             nan     0.1000 -66254.9954
##    100 21936301.7523             nan     0.1000 -42029.4602
##    120 21271332.6501             nan     0.1000 -35698.8292
##    140 20598256.4660             nan     0.1000 -81957.9635
##    150 20376316.7255             nan     0.1000 -29258.5822

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 26883116.7173             nan     0.1000 29030.2858
##      2 26811090.3886             nan     0.1000 45803.5275
##      3 26739979.8601             nan     0.1000 13768.3896
##      4 26674170.4152             nan     0.1000 49151.5210
##      5 26614153.9557             nan     0.1000 1809.0761
##      6 26549819.2700             nan     0.1000 35188.9170
##      7 26460453.4018             nan     0.1000 51722.7162
##      8 26411462.8062             nan     0.1000 37773.4472
##      9 26354508.7993             nan     0.1000 9327.6710
##     10 26307142.8546             nan     0.1000 17029.7098
##     20 25905886.1641             nan     0.1000 -5206.7313
##     40 25461627.8565             nan     0.1000 -638.2573
##     60 25281273.3239             nan     0.1000 -16783.9208
##     80 25035948.1564             nan     0.1000 -11541.4967
##    100 24890327.8232             nan     0.1000 -34259.8705
##    120 24751028.3026             nan     0.1000 -22291.3524
##    140 24656434.2367             nan     0.1000 -7510.8783
##    150 24574182.0966             nan     0.1000 -572.3681

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 26616491.9035             nan     0.1000 58044.2865
##      2 26515552.3031             nan     0.1000 27355.4438
##      3 26413354.9589             nan     0.1000 57703.8452
##      4 26212452.2355             nan     0.1000  623.4567
##      5 25999621.4658             nan     0.1000 -14614.9734
##      6 25894642.9877             nan     0.1000 62621.5598
##      7 25775822.2131             nan     0.1000 35421.7411
##      8 25685900.5866             nan     0.1000 40407.0831
##      9 25587345.0771             nan     0.1000 -24530.6114
##     10 25432055.7338             nan     0.1000 -36402.2007
##     20 24518424.6879             nan     0.1000 -27520.0528
##     40 23206858.6548             nan     0.1000 -28306.5986
##     60 22604618.9345             nan     0.1000 -17902.4601
##     80 22043522.5133             nan     0.1000 -15676.1289
##    100 21636987.2257             nan     0.1000 -29610.5895
##    120 21227202.8376             nan     0.1000 -28604.1290
##    140 20785547.4983             nan     0.1000 -69616.7645
##    150 20482405.6502             nan     0.1000 -72902.2067

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 26725545.4944             nan     0.1000 -3659.5361
##      2 26574823.3041             nan     0.1000 22213.2008
##      3 26269211.1945             nan     0.1000 10755.9305
##      4 26136852.7398             nan     0.1000 60763.6772
##      5 25976851.9458             nan     0.1000 26223.9027
##      6 25812393.6424             nan     0.1000 13110.7043
##      7 25697098.4355             nan     0.1000 -4289.9734
##      8 25570195.0529             nan     0.1000 -2810.5342
##      9 25341909.1762             nan     0.1000 -37632.5196
##     10 25144655.0968             nan     0.1000 41783.9296
##     20 23604751.8879             nan     0.1000 -72112.4247
##     40 21963610.9123             nan     0.1000 -11261.0860
##     60 20547180.6400             nan     0.1000 -52042.5297
##     80 19454207.8775             nan     0.1000 -25685.8714
##    100 18904517.3517             nan     0.1000 -5607.3178
##    120 18253505.5042             nan     0.1000 -54106.2997
##    140 17498224.3781             nan     0.1000 -73921.2261
##    150 17305416.1734             nan     0.1000 -35266.9876

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30925841.7301             nan     0.1000 24968.6492
##      2 30823728.7785             nan     0.1000 70775.8891
##      3 30766490.8744             nan     0.1000 16210.5110
##      4 30697036.7630             nan     0.1000 26778.9835
##      5 30659114.8582             nan     0.1000 -16509.2064
##      6 30592598.2118             nan     0.1000 59853.6057
##      7 30503595.7297             nan     0.1000 35172.0975
##      8 30420644.4411             nan     0.1000 40520.2205
##      9 30355875.1648             nan     0.1000 39963.3769
##     10 30296589.2626             nan     0.1000 62596.6132
##     20 29794202.6881             nan     0.1000 7610.4034
##     40 29221686.5847             nan     0.1000 -11298.2000
##     60 28871033.4340             nan     0.1000 -29046.9068
##     80 28745532.5767             nan     0.1000 -45345.6352
##    100 28518195.4831             nan     0.1000 -14726.0158
##    120 28278214.0011             nan     0.1000 -12729.5430
##    140 28102506.1996             nan     0.1000 11323.4313
##    150 28063225.3212             nan     0.1000 -49304.6849

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30753903.4736             nan     0.1000 17087.5011
##      2 30523510.6373             nan     0.1000 96345.6738
##      3 30223502.3581             nan     0.1000 -17011.2526
##      4 29968088.0830             nan     0.1000 -4350.2841
##      5 29845234.4025             nan     0.1000 80561.2103
##      6 29764613.8378             nan     0.1000 72734.7848
##      7 29651510.7507             nan     0.1000 47281.6226
##      8 29261888.2702             nan     0.1000 18203.8168
##      9 29162364.5705             nan     0.1000 24241.8382
##     10 28853157.1070             nan     0.1000 55152.1681
##     20 27783364.9341             nan     0.1000 30939.9048
##     40 26359180.1108             nan     0.1000 -16593.1322
##     60 24968393.0637             nan     0.1000 -3779.2909
##     80 24114398.6154             nan     0.1000 -36330.1719
##    100 23622045.4645             nan     0.1000 -71817.2338
##    120 23165608.1886             nan     0.1000 -14343.6940
##    140 22798477.7154             nan     0.1000 -52588.9827
##    150 22575297.5629             nan     0.1000 -30203.5940

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30623331.2585             nan     0.1000 143080.2943
##      2 30454821.3524             nan     0.1000 67779.4230
##      3 29930279.3303             nan     0.1000 32900.6666
##      4 29583935.3764             nan     0.1000 -27414.8590
##      5 29445318.1875             nan     0.1000 89072.4408
##      6 29119560.5139             nan     0.1000 2839.4824
##      7 28815665.1505             nan     0.1000 15752.9229
##      8 28454073.6783             nan     0.1000 -5193.2734
##      9 28186914.3990             nan     0.1000 44187.4510
##     10 28037149.7492             nan     0.1000 -3828.2462
##     20 26430700.1916             nan     0.1000 -29959.3460
##     40 24393530.2007             nan     0.1000 -94463.4543
##     60 23222804.8493             nan     0.1000 -64121.8405
##     80 22289175.3609             nan     0.1000 -18912.6286
##    100 21648543.3408             nan     0.1000 -41755.5889
##    120 20995226.0975             nan     0.1000 -69224.5666
##    140 20140285.4790             nan     0.1000 -107209.6875
##    150 19712030.1117             nan     0.1000 -47758.2194

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 34054749.0627             nan     0.1000 55853.4023
##      2 33984388.0580             nan     0.1000 44933.6927
##      3 33889505.4221             nan     0.1000 15301.8419
##      4 33804572.2258             nan     0.1000 49990.7761
##      5 33726440.5477             nan     0.1000 36151.6104
##      6 33676369.4922             nan     0.1000 24280.3068
##      7 33594821.4502             nan     0.1000 40300.0645
##      8 33523705.0983             nan     0.1000 33636.5717
##      9 33447560.4104             nan     0.1000 10822.8620
##     10 33376972.5561             nan     0.1000 23904.4532
##     20 32854005.7861             nan     0.1000 -32214.8952
##     40 32351377.9957             nan     0.1000 1597.7000
##     60 32067529.2143             nan     0.1000 -30842.9899
##     80 31801186.6423             nan     0.1000 -1031.7182
##    100 31682550.3155             nan     0.1000 1401.6818
##    120 31575105.1736             nan     0.1000 -31299.0906
##    140 31386594.8317             nan     0.1000 29584.6099
##    150 31310562.0616             nan     0.1000 -22348.9562

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 34030350.5443             nan     0.1000 -28116.1996
##      2 33703495.3490             nan     0.1000 77296.5065
##      3 33500608.8208             nan     0.1000 85072.9102
##      4 33162166.9714             nan     0.1000 22336.5888
##      5 32997129.9958             nan     0.1000 42487.1908
##      6 32741609.6410             nan     0.1000 -10417.3359
##      7 32655648.5585             nan     0.1000 -1174.5096
##      8 32460228.3047             nan     0.1000 59644.6503
##      9 32206362.1889             nan     0.1000 -54610.8548
##     10 32079212.3803             nan     0.1000 6458.3804
##     20 30906454.6161             nan     0.1000 -10829.1176
##     40 29293382.4840             nan     0.1000 -56417.1533
##     60 28220600.5675             nan     0.1000 -29122.6479
##     80 27301795.8936             nan     0.1000 -24758.0899
##    100 26663732.4174             nan     0.1000 -58326.2607
##    120 26246854.7767             nan     0.1000 -31114.9156
##    140 25705428.9384             nan     0.1000 -137562.9005
##    150 25444559.2525             nan     0.1000 -20859.0501

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 33985999.3234             nan     0.1000 46270.5590
##      2 33753105.1758             nan     0.1000 66531.3274
##      3 33496765.5874             nan     0.1000 17416.7306
##      4 33400189.9782             nan     0.1000 26083.1259
##      5 33174616.3512             nan     0.1000 20256.4427
##      6 32968551.7236             nan     0.1000 53615.9867
##      7 32864944.4321             nan     0.1000 -30009.8999
##      8 32669362.8305             nan     0.1000 21845.5438
##      9 32584930.1020             nan     0.1000 9077.7721
##     10 32212592.5519             nan     0.1000 104539.9562
##     20 29883278.2422             nan     0.1000 -19885.8635
##     40 27252167.4540             nan     0.1000 -98771.0957
##     60 26103044.3488             nan     0.1000 -30661.5789
##     80 25180390.5232             nan     0.1000 -57060.8641
##    100 24404829.4473             nan     0.1000 -70163.9802
##    120 23589552.7478             nan     0.1000 -25752.4919
##    140 22745163.4650             nan     0.1000 -96207.2084
##    150 22517918.5695             nan     0.1000 -36817.5736

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 30419566.0074             nan     0.1000 37259.6557
##      2 30346448.9746             nan     0.1000 40213.6494
##      3 30251855.8699             nan     0.1000 45726.1742
##      4 30187009.9710             nan     0.1000 45728.2218
##      5 30111678.2665             nan     0.1000 61553.6041
##      6 30035862.2940             nan     0.1000 14870.1298
##      7 29998878.2434             nan     0.1000 -158.9341
##      8 29937511.8572             nan     0.1000 49689.2459
##      9 29904667.3883             nan     0.1000 13603.5112
##     10 29869229.4971             nan     0.1000 -4247.6697
##     20 29433405.9689             nan     0.1000 -343.2730
##     40 28887949.1899             nan     0.1000 -22252.5555
##     60 28611046.8965             nan     0.1000 -337.9091
##     80 28446284.4009             nan     0.1000 -37489.1241
##    100 28320089.3364             nan     0.1000 -5238.6498
##    120 28146778.2384             nan     0.1000 -8375.1433
##    140 28021572.2770             nan     0.1000 -29879.3715
##    150 27968117.8495             nan     0.1000 -15256.2617
bt_fit
## Stochastic Gradient Boosting 
## 
## 5898 samples
##   58 predictor
## 
## Pre-processing: centered (58), scaled (58) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 4718, 4718, 4718, 4720, 4718 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE      Rsquared    MAE     
##   1                   50      5400.354  0.02485865  1846.406
##   1                  100      5397.992  0.02855140  1853.253
##   1                  150      5397.704  0.02939816  1854.367
##   2                   50      5458.574  0.01796376  1868.672
##   2                  100      5481.362  0.02020405  1900.717
##   2                  150      5521.553  0.01841250  1927.497
##   3                   50      5431.065  0.02670551  1857.143
##   3                  100      5471.017  0.02565600  1903.657
##   3                  150      5512.787  0.02309192  1946.188
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## Tuning parameter 'n.minobsinnode' was held
##  constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 150, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode
##  = 10.

Comparison - Jordan

Finally, let’s compare our four models: 2 linear models, 1 random forest model, and 1 boosted tree model.

# random forest prediction on testing model and its performance
predRF <- predict(ranfor, newdata = Testing)
RF <- postResample(predRF, Testing$shares)

# linear model 1 prediction on testing model and its performance
predlm1 <- predict(fit1, newdata = Testing)
LM <- postResample(predlm1, Testing$shares)

# linear model 2 prediction on testing model and its performance
predlm2 <- predict(lm_fit, newdata = Testing)
LM2 <- postResample(predlm2, Testing$shares)

# boosted tree prediction on testing model and its performance
predbt <- predict(bt_fit, newdata = Testing)
BT <- postResample(predbt, Testing$shares)

# combine each of the performance stats for the models and add a column with the model names
dat <- data.frame(rbind(t(data.frame(LM)), t(data.frame(RF)), t(data.frame(LM2)), t(data.frame(BT))))
df <- as_tibble(rownames_to_column(dat, "models"))

# find the model with the lowesr RMSE
best <- df %>% filter(RMSE == min(RMSE)) %>% select(models)

# print "The Best fitting model according to RMSE is [insert model name for lowest RMSE here]"
paste("The Best fitting model according to RMSE is", best$models, sep = " ")
## [1] "The Best fitting model according to RMSE is RF"

Automation - Jonathan

#rmarkdown::render(
#  "Tanley-Wood-Project2.Rmd",
#  output_format="github_document",
#  output_dir="./Analysis",
#  output_options = list(
#    html_preview = FALSE
#  )
#)