View on GitHub

Tanley-Wood-Project2

Tanley-Wood-Project2

Jordan Tanley and Jonathan Wood 2022-07-05

Introduction - Jonathan

Data

The data in this analysis will be the online news popularity dataset. This data has a set of features on articles from Mashable.com over a two year period.

The goal of this project is to determine the number of shares (how many times the article was shared over social media) the article has. We will use this information to predict if an article can be popular by the number of shares.

Notable Variables

While there are 61 variables in the data set, we will not use all of them for this project. The notable variables are the following:

Methods

Multiple methods will be used for this project to predict the number of shares a new article can generate, including

Data - Jordan

In order to read in the data using a relative path, be sure to have the data file saved in your working directory.

# read in the data
news <- read_csv("OnlineNewsPopularity/OnlineNewsPopularity.csv")
## Rows: 39644 Columns: 61
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): url
## dbl (60): timedelta, n_tokens_title, n_tokens_content, n_unique_tokens, n_non_stop_words, n_non_stop_unique_token...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# sneek peek at the dataset
head(news)
# Creating a weekday variable (basically undoing the 7 dummy variables that came with the data) for EDA
news$weekday <- ifelse(news$weekday_is_friday == 1, "Friday",
                       ifelse(news$weekday_is_monday == 1, "Monday",
                              ifelse(news$weekday_is_tuesday == 1, "Tuesday",
                                     ifelse(news$weekday_is_wednesday == 1, "Wednesday",
                                            ifelse(news$weekday_is_thursday == 1, "Thursday",
                                                   ifelse(news$weekday_is_saturday == 1, "Saturday", 
                                                          "Sunday"))))))

Next, let’s subset the data so that we can only look at the data channel of interest. We will look at articles with the “Social Media” data channel.

# Subset the data to  one of the parameterized data channels and drop unnecessary variables
chan <- paste0("data_channel_is_", params$channel)

print(chan)
## [1] "data_channel_is_entertainment"
filtered_channel <- news %>% 
                as_tibble() %>% 
                filter(news[chan] == 1) %>% 
                select(-c(url, timedelta))

# take a peek at the data
filtered_channel %>%
  select(ends_with(chan))

Summarizations - Both (at least 3 plots each)

For the numerical summaries, we can look at several aspects. Contingency tables allow us to examine frequencies of categorical variables. The first output below, for example, shows the counts for each weekday. Similarly, the fifth table outputted shows the frequencies of number of tokens in the article content. Another set of summary statistics to look at are the 5 Number Summaries. These provide the minmum, 1st quantile, median, 3rd quantile, and maximum for a particular variable. Additionally, it may also be helful to look at the average. These are helpful in determining the skewness (if mean = median vs. mean < or > median) and helps in looking for outliers (anything outside (Q3 - Q1)1.5 from the median is generally considered an outlier). Below, the 5 Number summaries (plus mean) are shown for Shares, Number of words in the content, Number of words in the content for the upper quantile of Shares, number of images in the article, number of videos in the article, positive word rate, and negative word rate.

# Contingency table of frequencies for days of the week, added caption for clarity
kable(table(filtered_channel$weekday), 
      col.names = c("Weekday", "Frequency"), 
      caption = "Contingency table of frequencies for days of the week")
Weekday Frequency
Friday 972
Monday 1358
Saturday 380
Sunday 536
Thursday 1231
Tuesday 1285
Wednesday 1295

Contingency table of frequencies for days of the week

# Numerical Summary of Shares, added caption for clarity
filtered_channel %>% summarise(Minimum = min(shares), 
                          Q1 = quantile(shares, prob = 0.25), 
                          Average = mean(shares), 
                          Median = median(shares), 
                          Q3 = quantile(shares, prob = 0.75), 
                          Maximum = max(shares)) %>% 
                kable(caption = "Numerical Summary of Shares")
Minimum Q1 Average Median Q3 Maximum
47 833 2970.487 1200 2100 210300

Numerical Summary of Shares

# Numerical Summary of Number of words in the content, added caption for clarity
filtered_channel %>% summarise(Minimum = min(n_tokens_content), 
                          Q1 = quantile(n_tokens_content, prob = 0.25), 
                          Average = mean(n_tokens_content), 
                          Median = median(n_tokens_content), 
                          Q3 = quantile(n_tokens_content, prob = 0.75), 
                          Maximum = max(n_tokens_content)) %>% 
                kable(caption = "Numerical Summary of Number of words in the content")
Minimum Q1 Average Median Q3 Maximum
0 255 607.4574 433 805 6505

Numerical Summary of Number of words in the content

# Numerical Summary of Number of words in the content for the upper quantile of Shares, added caption for clarity
filtered_channel %>% filter(shares > quantile(shares, prob = 0.75)) %>%
                summarise(Minimum = min(n_tokens_content), 
                          Q1 = quantile(n_tokens_content, prob = 0.25), 
                          Average = mean(n_tokens_content), 
                          Median = median(n_tokens_content), 
                          Q3 = quantile(n_tokens_content, prob = 0.75), 
                          Maximum = max(n_tokens_content)) %>% 
                kable(caption = "Numerical Summary of Number of words in the content for the upper quantile of Shares")
Minimum Q1 Average Median Q3 Maximum
0 238 601.5838 410 809 6159

Numerical Summary of Number of words in the content for the upper quantile of Shares

kable(table(filtered_channel$n_tokens_content),
  col.names = c("Tokens", "Frequency"), 
  caption = "Contingency table of frequencies for number of tokens in the article content")
Tokens Frequency
0 201
31 1
43 1
51 1
53 1
54 2
55 1
58 1
66 2
69 1
70 1
73 3
74 1
75 1
76 2
77 1
78 2
79 1
80 1
81 3
82 2
83 2
84 1
86 2
87 3
88 2
90 2
91 1
92 3
93 6
94 1
95 4
96 2
97 2
98 3
99 1
100 1
101 1
102 2
103 2
104 3
105 5
106 4
107 5
108 1
109 8
110 5
111 4
112 4
113 5
114 2
115 4
116 2
117 6
118 7
119 5
120 3
121 1
122 6
123 9
124 6
125 4
126 9
127 4
128 8
129 10
130 7
131 4
132 11
133 11
134 6
135 7
136 12
137 5
138 8
139 7
140 8
141 12
142 17
143 11
144 14
145 10
146 11
147 6
148 10
149 6
150 3
151 9
152 9
153 7
154 9
155 5
156 8
157 12
158 11
159 10
160 10
161 11
162 9
163 6
164 12
165 11
166 12
167 10
168 12
169 9
170 11
171 7
172 15
173 12
174 11
175 14
176 11
177 17
178 16
179 13
180 12
181 11
182 8
183 5
184 12
185 15
186 13
187 8
188 6
189 11
190 14
191 9
192 11
193 14
194 15
195 12
196 11
197 18
198 18
199 13
200 11
201 13
202 11
203 14
204 6
205 9
206 8
207 17
208 9
209 11
210 13
211 18
212 12
213 7
214 17
215 9
216 8
217 12
218 16
219 12
220 13
221 11
222 16
223 10
224 10
225 12
226 9
227 15
228 8
229 8
230 17
231 12
232 15
233 8
234 14
235 12
236 10
237 6
238 10
239 11
240 10
241 14
242 16
243 7
244 12
245 9
246 22
247 9
248 12
249 9
250 11
251 8
252 11
253 9
254 11
255 9
256 8
257 7
258 15
259 12
260 11
261 13
262 11
263 14
264 13
265 11
266 10
267 16
268 13
269 16
270 12
271 12
272 10
273 13
274 17
275 13
276 11
277 16
278 18
279 14
280 7
281 12
282 16
283 17
284 11
285 11
286 18
287 12
288 15
289 11
290 11
291 10
292 13
293 12
294 17
295 7
296 11
297 10
298 10
299 7
300 14
301 11
302 14
303 7
304 11
305 9
306 14
307 13
308 17
309 13
310 14
311 13
312 13
313 6
314 13
315 12
316 9
317 11
318 8
319 4
320 12
321 7
322 13
323 12
324 14
325 3
326 12
327 15
328 10
329 10
330 4
331 13
332 11
333 13
334 11
335 14
336 13
337 11
338 8
339 15
340 10
341 8
342 9
343 9
344 14
345 13
346 11
347 9
348 11
349 10
350 13
351 9
352 10
353 9
354 17
355 7
356 14
357 8
358 6
359 12
360 5
361 9
362 8
363 7
364 8
365 15
366 5
367 5
368 9
369 13
370 3
371 6
372 4
373 7
374 7
375 12
376 10
377 9
378 8
379 8
380 13
381 4
382 12
383 5
384 8
385 8
386 10
387 7
388 10
389 9
390 5
391 13
392 7
393 8
394 9
395 11
396 10
397 4
398 5
399 11
400 5
401 4
402 4
403 6
404 5
405 6
406 6
407 5
408 6
409 11
410 12
411 7
412 7
413 7
414 10
415 9
416 6
417 2
418 8
419 6
420 10
421 5
422 8
423 7
424 10
425 10
426 5
427 7
428 9
429 6
430 8
431 2
432 3
433 9
434 6
435 10
436 12
437 12
438 6
439 5
440 6
441 7
442 8
443 6
444 11
445 8
446 6
447 7
448 2
449 4
450 2
451 6
452 10
453 11
454 6
455 7
456 11
457 4
458 5
459 9
460 9
461 11
462 7
464 3
465 7
466 3
467 6
468 6
469 11
470 5
471 8
472 5
473 4
474 9
475 8
476 7
477 6
478 5
479 7
480 9
481 6
482 9
483 5
484 3
485 8
486 3
487 8
488 7
489 6
490 7
491 3
492 3
493 7
494 5
495 9
496 3
497 6
498 8
499 2
500 2
501 8
502 2
503 9
504 5
505 10
506 6
507 7
508 6
509 4
510 6
511 8
512 3
513 4
514 4
515 6
516 9
517 8
518 9
519 6
520 6
521 10
522 3
523 3
524 4
525 4
526 8
527 8
528 3
529 4
530 6
531 8
532 4
533 4
534 7
535 8
536 6
537 6
538 2
539 6
540 8
541 2
542 7
543 3
544 6
545 6
546 3
547 5
548 3
549 5
550 5
551 1
552 3
553 6
554 9
555 5
556 6
557 10
558 3
559 5
560 6
561 5
562 5
563 5
564 3
565 4
566 7
567 6
568 2
569 5
570 2
571 4
572 8
573 5
574 2
575 4
576 5
577 2
578 3
579 4
580 3
581 6
582 6
583 3
584 5
585 4
586 4
587 4
588 3
589 3
590 4
591 6
592 6
593 7
594 14
595 3
596 1
597 5
598 1
599 7
600 3
601 6
602 4
603 1
604 9
605 8
606 5
607 2
608 5
609 4
610 3
611 3
612 3
613 4
614 1
615 7
616 6
617 9
618 5
619 4
620 3
621 6
622 6
623 5
624 9
625 3
626 2
627 2
628 2
629 4
630 5
631 2
632 6
633 5
634 6
635 5
636 2
637 3
638 4
639 7
640 8
641 5
642 8
643 7
644 1
645 3
646 2
647 9
648 7
649 7
650 6
651 2
652 4
653 4
654 2
655 6
656 3
657 4
658 5
659 7
660 7
661 4
662 4
664 4
665 3
666 3
667 2
668 6
669 5
670 5
671 3
672 3
673 4
674 3
675 7
676 1
677 4
678 4
679 3
680 6
681 6
682 6
683 3
684 5
685 5
686 1
687 1
688 2
689 4
690 1
691 2
692 1
693 2
694 7
695 4
696 2
697 2
698 3
699 3
700 1
701 3
702 5
703 4
704 5
705 5
706 7
707 4
708 4
709 1
710 6
711 5
712 1
713 2
714 7
715 1
716 4
717 4
719 8
720 8
721 2
723 2
725 5
726 2
727 7
728 3
729 3
730 5
731 2
732 4
733 3
734 5
735 6
736 7
737 4
738 5
739 3
740 4
741 4
742 1
743 4
744 6
745 4
746 5
747 4
748 5
749 9
750 3
751 1
752 7
753 4
754 3
755 6
756 3
757 3
758 6
759 1
760 5
761 3
762 4
763 7
764 3
765 2
766 4
767 3
768 3
769 2
770 5
771 5
772 3
773 4
774 3
775 1
776 1
777 5
778 3
779 8
780 3
781 3
782 5
783 3
784 3
785 1
786 2
787 4
788 3
789 2
790 4
791 2
792 5
793 2
794 1
795 5
796 5
797 2
798 5
799 3
800 5
801 9
802 6
803 2
804 1
805 2
806 1
807 1
808 5
809 3
810 2
811 2
812 1
813 1
814 5
815 1
816 2
817 1
818 2
819 2
820 3
822 2
823 2
824 2
825 2
826 3
827 3
828 1
829 3
830 5
831 1
832 3
833 3
834 4
835 4
836 4
838 5
839 6
840 1
841 3
842 4
843 1
844 7
845 1
846 3
847 2
848 3
849 2
850 1
851 1
852 3
853 4
854 2
855 7
856 4
857 4
859 3
862 3
863 4
864 4
865 5
866 1
867 5
869 1
870 1
871 2
872 2
874 6
875 1
876 1
877 1
879 4
880 4
881 1
882 1
883 4
884 4
885 4
886 1
887 2
888 2
889 4
890 3
891 4
892 6
893 5
894 3
895 4
896 4
897 3
898 1
901 3
902 2
904 2
905 4
906 6
907 4
909 2
910 3
911 2
912 3
913 4
914 4
915 2
916 5
917 5
918 2
919 1
920 4
921 3
922 5
923 2
924 2
925 1
926 3
927 2
928 1
929 4
930 2
931 2
932 6
934 1
935 2
936 2
937 2
938 3
939 5
940 4
941 1
942 3
943 2
944 4
945 4
946 3
947 6
948 1
949 2
950 2
951 3
952 4
955 2
956 4
957 2
959 3
960 4
961 4
962 3
963 4
964 2
965 2
966 2
967 4
968 2
969 3
970 7
971 3
972 5
973 2
974 1
975 1
976 1
977 1
978 3
979 3
980 2
981 4
983 4
985 3
986 2
987 2
989 2
990 5
991 4
992 2
993 3
994 2
995 3
996 1
997 1
998 5
999 2
1000 3
1002 3
1003 5
1004 4
1005 2
1007 1
1008 3
1009 4
1012 3
1013 3
1014 1
1015 6
1016 1
1018 1
1019 4
1020 1
1021 1
1022 1
1023 3
1024 2
1025 2
1026 2
1027 4
1029 4
1030 2
1031 4
1033 2
1034 2
1035 3
1036 2
1037 4
1038 3
1039 2
1040 3
1041 1
1042 4
1043 1
1044 1
1045 3
1046 5
1047 3
1048 3
1049 5
1050 2
1051 1
1052 3
1053 6
1054 1
1055 1
1056 6
1057 4
1058 1
1059 3
1060 2
1061 2
1062 1
1063 4
1064 1
1065 2
1066 3
1067 1
1068 1
1069 2
1070 3
1071 2
1072 2
1074 3
1075 1
1076 2
1077 3
1078 3
1079 1
1080 3
1081 1
1082 3
1083 3
1084 1
1085 1
1086 2
1087 1
1088 4
1089 1
1090 1
1091 4
1092 4
1093 2
1095 1
1096 5
1097 3
1098 2
1099 1
1100 1
1101 1
1103 3
1104 2
1105 3
1106 2
1107 3
1108 5
1109 5
1110 2
1111 2
1113 4
1114 3
1115 2
1117 2
1120 2
1123 1
1124 4
1125 2
1126 1
1127 2
1128 2
1129 1
1130 4
1132 2
1133 2
1135 2
1137 1
1138 1
1139 3
1140 1
1141 1
1142 2
1144 2
1145 1
1146 4
1147 1
1148 2
1149 2
1150 1
1151 1
1152 1
1153 2
1154 1
1155 4
1157 2
1158 1
1159 2
1160 2
1163 4
1165 4
1166 2
1167 1
1168 2
1169 2
1170 3
1171 2
1172 1
1173 1
1174 2
1175 3
1176 3
1177 1
1179 3
1181 1
1182 1
1183 2
1184 1
1185 2
1186 2
1187 1
1188 1
1189 3
1190 1
1191 2
1192 2
1193 1
1194 1
1195 2
1196 1
1197 2
1198 1
1199 2
1201 2
1202 1
1203 3
1204 2
1205 2
1207 2
1208 3
1209 1
1210 1
1211 4
1212 1
1213 1
1214 1
1215 3
1216 2
1217 3
1218 2
1220 2
1222 1
1223 1
1224 3
1225 3
1226 1
1227 2
1229 3
1231 3
1232 1
1233 2
1234 1
1235 1
1237 1
1238 1
1240 3
1241 1
1243 2
1244 4
1245 2
1247 2
1248 2
1249 1
1250 5
1251 1
1252 2
1253 2
1255 2
1256 3
1257 2
1259 2
1261 1
1262 1
1263 2
1264 1
1265 2
1267 2
1268 1
1269 2
1270 1
1271 1
1272 2
1273 2
1274 1
1275 2
1276 1
1277 3
1278 2
1280 3
1281 2
1282 1
1283 1
1284 2
1285 4
1286 1
1287 1
1288 2
1289 1
1290 1
1292 1
1293 1
1294 1
1295 3
1296 4
1297 4
1298 3
1302 2
1304 1
1305 2
1306 2
1308 3
1309 2
1310 3
1311 1
1312 1
1313 2
1314 1
1316 2
1317 1
1318 1
1319 1
1320 2
1321 1
1322 1
1324 1
1325 2
1326 1
1327 1
1328 1
1330 3
1331 3
1332 4
1334 4
1335 1
1336 3
1337 1
1338 4
1339 1
1340 3
1341 1
1342 4
1343 1
1345 1
1346 1
1347 1
1348 1
1349 1
1350 1
1351 3
1352 1
1353 1
1355 2
1357 1
1358 1
1361 1
1363 1
1364 1
1365 2
1367 1
1368 1
1369 3
1371 1
1372 3
1374 2
1375 3
1376 2
1377 2
1379 2
1381 1
1382 1
1383 1
1384 2
1385 2
1386 1
1387 3
1388 1
1390 3
1393 2
1396 2
1398 1
1399 1
1401 2
1404 2
1405 2
1406 1
1407 1
1408 1
1409 2
1410 1
1411 3
1412 2
1414 2
1415 2
1416 1
1422 1
1423 1
1424 1
1425 1
1426 1
1427 2
1429 1
1430 2
1432 1
1433 4
1434 1
1436 2
1438 1
1439 2
1440 3
1442 1
1443 2
1444 1
1445 4
1446 1
1447 2
1448 3
1450 3
1452 1
1453 1
1454 1
1456 2
1457 2
1458 2
1459 2
1460 2
1461 1
1462 1
1463 1
1464 1
1465 1
1467 1
1468 1
1469 2
1470 1
1471 1
1473 1
1477 1
1478 1
1479 2
1480 1
1481 2
1482 2
1485 2
1486 2
1487 2
1489 4
1491 4
1492 1
1493 1
1494 1
1496 1
1497 1
1499 5
1500 1
1501 1
1503 1
1504 1
1506 1
1507 1
1508 1
1511 3
1513 1
1514 2
1515 1
1516 1
1517 1
1518 2
1519 1
1520 1
1524 2
1525 2
1527 2
1528 1
1530 1
1531 1
1534 3
1536 1
1538 1
1541 1
1543 1
1545 1
1546 1
1549 2
1551 1
1554 2
1555 3
1556 2
1557 1
1558 2
1560 1
1562 2
1563 2
1564 1
1566 3
1567 2
1570 2
1571 2
1572 1
1573 1
1575 2
1576 2
1580 2
1581 1
1585 1
1586 1
1593 1
1597 1
1599 2
1602 1
1603 3
1604 1
1605 1
1608 1
1610 2
1612 1
1613 2
1615 1
1616 1
1617 3
1618 2
1622 1
1623 2
1625 1
1626 1
1628 1
1629 2
1630 1
1631 1
1633 2
1637 1
1642 2
1643 1
1644 4
1646 1
1648 1
1651 1
1653 1
1654 1
1655 1
1656 1
1659 1
1662 1
1665 1
1666 1
1668 2
1672 2
1673 1
1674 2
1676 2
1677 1
1678 1
1679 3
1682 1
1683 2
1684 1
1686 2
1687 1
1691 1
1692 2
1693 1
1694 1
1698 1
1701 1
1702 1
1703 1
1707 1
1709 1
1711 1
1712 1
1715 2
1716 1
1718 2
1722 1
1723 1
1725 1
1726 1
1730 1
1731 1
1735 1
1736 2
1739 1
1740 1
1742 2
1746 1
1752 1
1754 1
1759 1
1760 2
1764 1
1769 1
1770 1
1771 1
1776 2
1777 1
1783 3
1784 1
1785 2
1790 1
1792 2
1793 1
1803 1
1804 2
1807 1
1811 1
1816 1
1817 1
1819 1
1820 1
1824 1
1825 1
1830 1
1831 2
1832 1
1834 1
1839 2
1842 1
1843 2
1845 2
1846 1
1848 1
1849 1
1850 1
1852 2
1857 1
1861 1
1868 1
1872 1
1873 1
1874 1
1876 1
1878 1
1879 1
1880 1
1881 1
1884 2
1885 1
1887 2
1889 1
1892 1
1895 1
1896 2
1898 2
1900 1
1901 1
1912 1
1916 1
1920 2
1923 1
1929 1
1934 2
1938 1
1939 1
1945 2
1947 1
1949 1
1951 1
1952 1
1955 2
1956 1
1957 1
1960 1
1961 1
1963 1
1964 1
1965 1
1966 1
1970 1
1974 2
1977 1
1978 1
1989 1
1993 1
1997 1
1998 1
2002 1
2003 1
2009 1
2010 1
2011 1
2013 1
2016 1
2018 1
2021 1
2028 1
2029 1
2030 1
2032 1
2033 1
2036 1
2037 1
2038 1
2039 1
2040 1
2045 1
2049 1
2061 2
2063 1
2064 1
2066 1
2068 1
2072 1
2075 1
2083 1
2084 3
2086 1
2088 1
2089 1
2090 1
2095 1
2100 1
2102 1
2104 1
2105 1
2113 2
2118 1
2119 1
2122 1
2123 1
2141 1
2148 1
2153 1
2161 1
2166 1
2167 1
2170 1
2173 1
2178 1
2181 1
2182 1
2184 2
2188 2
2191 1
2201 1
2204 1
2208 1
2210 1
2216 1
2223 1
2224 1
2239 1
2254 1
2257 1
2276 1
2278 1
2290 1
2299 1
2314 2
2325 2
2335 1
2338 1
2343 1
2349 1
2360 1
2364 1
2365 1
2369 1
2371 1
2375 1
2377 1
2384 1
2386 1
2391 1
2419 1
2420 1
2424 1
2428 1
2436 2
2439 1
2452 1
2461 1
2486 1
2490 1
2505 1
2517 1
2521 1
2523 1
2540 1
2541 1
2565 1
2572 1
2580 1
2593 1
2610 1
2632 1
2636 1
2654 1
2668 1
2674 1
2679 1
2716 1
2718 1
2731 1
2735 1
2778 1
2795 1
2799 1
2808 1
2814 1
2828 1
2845 1
2849 1
2893 1
2917 1
2936 1
2942 1
2946 1
2963 1
2973 1
3019 1
3174 1
3342 1
3349 1
3359 1
3395 1
3428 1
3550 1
3652 1
3702 1
3736 1
3751 1
3815 1
4046 1
4306 1
4514 1
4574 1
5194 1
5553 1
5562 1
6159 1
6505 1

Contingency table of frequencies for number of tokens in the article content

# Summarizing the number of images in the article
filtered_channel %>% 
  summarise(Minimum = min(num_imgs), 
      Q1 = quantile(num_imgs, prob = 0.25), 
      Average = mean(num_imgs), 
      Median = median(num_imgs), 
      Q3 = quantile(num_imgs, prob = 0.75), 
      Maximum = max(num_imgs)) %>% 
  kable(caption = "Numerical summary of number of images in an article")
Minimum Q1 Average Median Q3 Maximum
0 1 6.317699 1 8 128

Numerical summary of number of images in an article

# Summarizing the number of videos in the article
filtered_channel %>% 
  summarise(Minimum = min(num_videos), 
      Q1 = quantile(num_videos, prob = 0.25), 
      Average = mean(num_videos), 
      Median = median(num_videos), 
      Q3 = quantile(num_videos, prob = 0.75), 
      Maximum = max(num_videos)) %>% 
  kable(caption = "Numerical summary of number of videos in an article")
Minimum Q1 Average Median Q3 Maximum
0 0 2.545841 1 1 74

Numerical summary of number of videos in an article

# Summarizing the number of positive word rate
filtered_channel %>% 
  summarise(Minimum = min(rate_positive_words), 
      Q1 = quantile(rate_positive_words, prob = 0.25), 
      Average = mean(rate_positive_words), 
      Median = median(rate_positive_words), 
      Q3 = quantile(rate_positive_words, prob = 0.75), 
      Maximum = max(rate_positive_words)) %>% 
  kable(caption = "Numerical Summary of the rate of positive words in an article")
Minimum Q1 Average Median Q3 Maximum
0 0.5789474 0.6663317 0.6875 0.7843137 1

Numerical Summary of the rate of positive words in an article

# Summarizing the number of negative word rate
filtered_channel %>% 
  summarise(Minimum = min(rate_negative_words), 
      Q1 = quantile(rate_negative_words, prob = 0.25), 
      Average = mean(rate_negative_words), 
      Median = median(rate_negative_words), 
      Q3 = quantile(rate_negative_words, prob = 0.75), 
      Maximum = max(rate_negative_words)) %>% 
  kable(caption = "Numerical Summary of the rate of negative words in an article")
Minimum Q1 Average Median Q3 Maximum
0 0.2 0.3050442 0.3 0.4038462 1

Numerical Summary of the rate of negative words in an article

The graphical summaries more dramatically show the trends in the data, including skewness and outliers. The boxplots below show a visual representation of the 5 Number summaries for Shares, split up by weekday, and shares split up by text sentiment polarity. Boxplots make it even easier to look out for outliers (look for the dots separated from the main boxplot). Next, we can examine several scatterplots. Scatterplots allow us to look at one numerical variable vs another to see if there is any correlation between them. Look out for any plots that have most of the points on a diagonal line! There are four scatterplots below, investigating shares vs Number of words in the content, Number of words in the title, rate of positive words, and rate of negative words. Finally, a histogram can show the overall distribution of a numerical variable, including skewness. The histogram below sows the distribution of the shares variable. Look for a left or right tail to signify skewness, and look out for multiple peaks to signify a multi-modal variable.

# Boxplot of Shares for Each Weekday, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = weekday, y = shares)) + 
          geom_boxplot(fill = "grey") + 
          labs(x = "Weekday", title = "Boxplot of Shares for Each Weekday", y = "Shares") + 
          theme_classic()

# Scatterplot of Number of words in the content vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_content, y = shares)) + 
          geom_point(color = "grey") +
          labs(x = "Number of words in the content", y = "Shares", 
               title = "Scatterplot of Number of words in the content vs Shares") +
          theme_classic()

# Scatterplot of Number of words in the title vs Shares, colored gray with classic theme, added labels and title
ggplot(filtered_channel, aes(x = n_tokens_title, y = shares)) + 
          geom_point(color = "grey") +
          labs(x = "Number of words in the title", y = "Shares", 
               title = "Scatterplot of Number of words in the title vs Shares") +
          theme_classic()

ggplot(filtered_channel, aes(x=shares)) +
  geom_histogram(color="grey", binwidth = 2000) +
  labs(x = "Shares", 
               title = "Histogram of number of shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=rate_positive_words, y=shares)) +
  geom_point(color="grey") +
  labs(x = "rate of positive words in an article", y = "Shares", 
               title = "Scatterplot of rate of positive words in an article vs shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=rate_negative_words, y=shares)) +
  geom_point(color="grey") +
  labs(x = "rate of negative words in an article", y = "Shares", 
               title = "Scatterplot of rate of negative words in an article vs shares") +
  theme_classic()

ggplot(filtered_channel, aes(x=global_sentiment_polarity, y=shares)) +
  geom_point(color="grey") +
  labs(x = "global sentiment polarity in an article", y = "Shares", 
               title = "Scatterplot of global sentiment polarity in an article vs shares") +
  theme_classic()

# drop the weekday variable created for EDA (will get in the way for our models if we don't drop it)
filtered_channel <- subset(filtered_channel, select = -c(weekday))

Modeling

Splitting the Data

First, let’s split up the data into a testing set and a training set using the proportions: 70% training and 30% testing.

set.seed(9876)
# Split the data into a training and test set (70/30 split)
# indices
train <- sample(1:nrow(filtered_channel), size = nrow(filtered_channel)*.70)
test <- setdiff(1:nrow(filtered_channel), train)

# training and testing subsets
Training <- filtered_channel[train, ]
Testing <- filtered_channel[test, ]

Linear Models

Linear regression models allow us to look at relationships between one response variable and several explanatory variables. A model can also include interaction terms and even higher order terms. The general form for a linear model is Y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + E_i, where each x_i represents a predictor variable and the “…” can include more predictors, interactions and/or higher order terms. Since our goal is to predict shares, we will be using these models to predict of a subset of the data created for training, and then we will later test the models on the other subsetted data set aside for testing.

Linear Model #1: - Jordan

# linear model on training dataset with 5-fold cv
fit1 <- train(shares ~ . , data = Training, method = "lm",
              preProcess = c("center", "scale"), 
              trControl = trainControl(method = "cv", number = 5))

Linear Model #2: - Jonathan

lm_fit <- train(
  shares ~ .^2,
  data=Training,
  method="lm",
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "cv", number = 5)
)

Random Forest - Jordan

Random Forest is a tree based method for fitting predictive models, that averages across all trees. One may choose to use a tree based methood due to their prediction accuracy, the fact that predictors do not need to be scaled, no statistical assumptions, and a built-in variable selection process. Random forest, in particular, randomly selects a subset of m = p / 3 predictors. This corrects the bagging issue where every bootstrap contains a strong predictor for the first split.

# fandom forest model on training dataset with 5-fold cv
ranfor <- train(shares ~ ., data = Training, method = "rf", preProcess = c("center", "scale"),
                trControl = trainControl(method = "cv", number = 5), 
                tuneGrid = expand.grid(mtry = c(1:round(ncol(Training)/3))))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world
ranfor
## Random Forest 
## 
## 4939 samples
##   58 predictor
## 
## Pre-processing: centered (58), scaled (58) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3952, 3952, 3950, 3952, 3950 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared    MAE     
##    1    7322.801  0.03719910  2884.251
##    2    7261.771  0.04416517  2905.477
##    3    7259.764  0.04683093  2945.705
##    4    7257.441  0.04813149  2969.800
##    5    7274.750  0.04651551  2981.350
##    6    7270.060  0.04894000  2980.080
##    7    7286.204  0.04692995  2994.422
##    8    7291.769  0.04644447  3010.634
##    9    7315.709  0.04309422  3018.477
##   10    7323.113  0.04357281  3028.397
##   11    7329.393  0.04418740  3026.235
##   12    7333.955  0.04400734  3035.712
##   13    7323.320  0.04555137  3028.184
##   14    7339.371  0.04346928  3037.240
##   15    7344.910  0.04406313  3041.250
##   16    7342.588  0.04518231  3046.621
##   17    7327.670  0.04779090  3035.731
##   18    7348.218  0.04595008  3053.670
##   19    7378.930  0.04128517  3056.058
##   20    7354.649  0.04561489  3053.310
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 4.

Boosted Tree - Jonathan

tune_grid <- expand.grid(
  n.trees = c(5, 10, 50, 100),
  interaction.depth = c(1,2,3, 4),
  shrinkage = 0.1,
  n.minobsinnode = 10
)

bt_fit <- train(
  shares ~ .,
  data=Training,
  method="gbm",
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "cv", number = 5)
)
## Warning in preProcess.default(method = c("center", "scale"), x = structure(c(13, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 54251791.7616             nan     0.1000 237314.8847
##      2 53992161.4797             nan     0.1000 129904.0034
##      3 53738055.1779             nan     0.1000 48786.8624
##      4 53409366.9828             nan     0.1000 147660.7754
##      5 53224345.2766             nan     0.1000 171099.6320
##      6 52993564.4760             nan     0.1000 28181.0123
##      7 52828338.6556             nan     0.1000 -103380.8733
##      8 52686834.1232             nan     0.1000 138578.0817
##      9 52556745.9142             nan     0.1000 55864.4673
##     10 52412074.0512             nan     0.1000 -72220.9100
##     20 51365420.1527             nan     0.1000 -21778.2527
##     40 50424749.0667             nan     0.1000 -95290.2867
##     60 49884724.4365             nan     0.1000 -183778.3618
##     80 49469657.8269             nan     0.1000 -236233.3011
##    100 49138347.6260             nan     0.1000 -105562.3049
##    120 48885058.0393             nan     0.1000 -172495.4635
##    140 48642743.5955             nan     0.1000 -70834.1521
##    150 48566666.3101             nan     0.1000 -149299.2271

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 54170824.6348             nan     0.1000 199061.8610
##      2 53174417.9441             nan     0.1000 122421.4854
##      3 52901663.3075             nan     0.1000 171650.9865
##      4 52521228.2318             nan     0.1000 107373.9772
##      5 52206753.4887             nan     0.1000 48329.8708
##      6 52043755.8266             nan     0.1000 73736.7390
##      7 51935397.0900             nan     0.1000 50779.8031
##      8 51618515.1220             nan     0.1000 43357.5465
##      9 51380614.1682             nan     0.1000 -102181.4008
##     10 51176350.5529             nan     0.1000 -40285.6388
##     20 49454297.9489             nan     0.1000 -160856.2174
##     40 48038888.1653             nan     0.1000 -42887.5057
##     60 47547901.7347             nan     0.1000 -143257.4584
##     80 45936817.3564             nan     0.1000 -45658.5727
##    100 45182974.7463             nan     0.1000 -118846.1240
##    120 44092587.4313             nan     0.1000 -93068.8352
##    140 43446071.7913             nan     0.1000 -48529.2856
##    150 42476758.8676             nan     0.1000 -6859.9032

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 53754124.8033             nan     0.1000 307266.0764
##      2 53229233.2799             nan     0.1000 266960.5336
##      3 52749232.4040             nan     0.1000 -23587.2271
##      4 52214497.9988             nan     0.1000 214993.4762
##      5 52023743.1060             nan     0.1000 -74413.9534
##      6 51683839.8499             nan     0.1000 -28849.8338
##      7 51407116.8397             nan     0.1000 56780.0364
##      8 51100810.8438             nan     0.1000 24337.1057
##      9 50653782.0769             nan     0.1000 -199470.1282
##     10 50338906.5793             nan     0.1000 85867.0832
##     20 48423590.1394             nan     0.1000 -91546.2173
##     40 45884439.4624             nan     0.1000 -280512.0341
##     60 44204871.3287             nan     0.1000 -190295.8244
##     80 43041635.7395             nan     0.1000 -128361.7293
##    100 40789852.0312             nan     0.1000 -2458.7234
##    120 39819261.3826             nan     0.1000 -127332.4953
##    140 38637635.3767             nan     0.1000 -34486.1095
##    150 38173161.4008             nan     0.1000 -151681.7042

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 61615136.7379             nan     0.1000 132471.3528
##      2 61476416.6747             nan     0.1000 144183.3525
##      3 61113944.0358             nan     0.1000 303515.2975
##      4 60956398.9223             nan     0.1000 159478.3026
##      5 60519675.8070             nan     0.1000 -30469.9238
##      6 60252649.9272             nan     0.1000 203970.2792
##      7 59998843.0687             nan     0.1000 75372.1235
##      8 59847618.2947             nan     0.1000 122395.4506
##      9 59701783.8650             nan     0.1000 -4804.1592
##     10 59465868.1495             nan     0.1000 47907.6933
##     20 58025133.0042             nan     0.1000 -202012.7613
##     40 56791725.7496             nan     0.1000 29486.6901
##     60 55854365.9157             nan     0.1000 -104886.6501
##     80 55309201.6109             nan     0.1000 -33145.0696
##    100 55112349.7103             nan     0.1000 -215418.1093
##    120 54490683.3358             nan     0.1000 -58435.3181
##    140 53990365.3650             nan     0.1000 -38820.1269
##    150 53928195.1677             nan     0.1000 -153158.2262

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 61634837.2752             nan     0.1000 -11033.7087
##      2 61355313.5397             nan     0.1000 70957.8768
##      3 60867733.6209             nan     0.1000 454835.0973
##      4 60213927.4742             nan     0.1000 88639.6178
##      5 59933773.6125             nan     0.1000 44457.5191
##      6 59624583.4286             nan     0.1000 142493.1795
##      7 59097823.0697             nan     0.1000 -210147.9858
##      8 58691821.0981             nan     0.1000 -39655.8413
##      9 58353929.2484             nan     0.1000 151640.8943
##     10 58082387.0481             nan     0.1000 127042.4161
##     20 56311439.0676             nan     0.1000 -367305.1835
##     40 54044110.2151             nan     0.1000 -73964.6882
##     60 51594069.5111             nan     0.1000 -128687.7846
##     80 49895267.2927             nan     0.1000 -188754.6236
##    100 48668198.3339             nan     0.1000 -99501.5119
##    120 47459670.9373             nan     0.1000 -288776.2628
##    140 46110806.8622             nan     0.1000 -47712.4426
##    150 45714297.0763             nan     0.1000 -237119.2456

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 60948848.4954             nan     0.1000 124857.9055
##      2 60289096.0179             nan     0.1000 380768.3733
##      3 59806052.7044             nan     0.1000 90227.9455
##      4 59322739.8958             nan     0.1000 62198.6476
##      5 58989347.5539             nan     0.1000 -12445.0055
##      6 58235534.3599             nan     0.1000 -87720.9689
##      7 57996718.8810             nan     0.1000 41042.2524
##      8 57836302.6332             nan     0.1000 50972.0019
##      9 57496867.8390             nan     0.1000 -48575.2493
##     10 57293118.1286             nan     0.1000 -57488.6969
##     20 54825170.5647             nan     0.1000 -116004.5128
##     40 50779465.7364             nan     0.1000 -395132.1164
##     60 48144831.0324             nan     0.1000 -255253.2850
##     80 46325045.6942             nan     0.1000 -133814.1554
##    100 45069975.9825             nan     0.1000 -247123.6826
##    120 43769803.4811             nan     0.1000 -182455.7507
##    140 42141944.4139             nan     0.1000 -259239.7558
##    150 41441698.9913             nan     0.1000 -165272.1195

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 64209161.1876             nan     0.1000 521956.8568
##      2 63871776.2573             nan     0.1000 270892.3016
##      3 63429190.2589             nan     0.1000 433150.5340
##      4 63224592.8991             nan     0.1000 13142.3371
##      5 62635699.5720             nan     0.1000 -4724.5590
##      6 62478640.1681             nan     0.1000 91108.3045
##      7 62367822.1430             nan     0.1000 75191.6201
##      8 61956400.6269             nan     0.1000 -143432.9306
##      9 61652450.0062             nan     0.1000 -314718.1098
##     10 61551172.0285             nan     0.1000 111080.3270
##     20 60432565.9959             nan     0.1000 -32508.2786
##     40 59582751.9174             nan     0.1000 -201802.4320
##     60 58591677.3318             nan     0.1000 -300563.6443
##     80 57961052.5976             nan     0.1000 -272845.7421
##    100 57637102.2941             nan     0.1000 -134672.7259
##    120 57250500.5410             nan     0.1000 -143486.0030
##    140 56613535.9694             nan     0.1000 -123058.0542
##    150 56372198.9067             nan     0.1000 -390579.7086

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 64377941.3730             nan     0.1000 -48162.6115
##      2 64100601.1805             nan     0.1000 207713.8031
##      3 63611369.2530             nan     0.1000 238145.1221
##      4 63374883.4268             nan     0.1000 -32384.7373
##      5 63046045.7225             nan     0.1000 92162.4872
##      6 62700860.3800             nan     0.1000 -13021.1858
##      7 62135817.9955             nan     0.1000 192182.2202
##      8 61766734.0342             nan     0.1000 109671.0909
##      9 61236697.4416             nan     0.1000 403248.8510
##     10 60758761.2541             nan     0.1000 -112055.9786
##     20 58433716.6936             nan     0.1000 -129436.5127
##     40 55254253.8324             nan     0.1000 -72172.7643
##     60 53663069.9308             nan     0.1000 -229454.0264
##     80 52236090.4372             nan     0.1000 -103428.8113
##    100 50373888.4306             nan     0.1000 -132443.6718
##    120 48325416.5832             nan     0.1000 -2521.6225
##    140 47566721.7621             nan     0.1000 -172582.3839
##    150 47307627.3069             nan     0.1000 14941.6953

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 63369569.5020             nan     0.1000 287546.2003
##      2 62684602.1379             nan     0.1000 544191.5133
##      3 62209418.3793             nan     0.1000 -15245.4881
##      4 61342445.4873             nan     0.1000 -61413.0519
##      5 60708385.3285             nan     0.1000 136495.6383
##      6 60104374.4751             nan     0.1000 -68615.3401
##      7 59558436.7593             nan     0.1000 -149244.6786
##      8 59296293.2853             nan     0.1000 16488.5315
##      9 59009338.1459             nan     0.1000 -19509.8144
##     10 58722843.7726             nan     0.1000 -14606.8158
##     20 55825472.8258             nan     0.1000 -187258.9529
##     40 52200806.4432             nan     0.1000 -335456.5447
##     60 49945795.7220             nan     0.1000 -224667.9492
##     80 47740092.1342             nan     0.1000 -291999.0280
##    100 46128751.9716             nan     0.1000 -342165.1825
##    120 44477462.6093             nan     0.1000 -173573.5566
##    140 42982472.8301             nan     0.1000 -173228.9748
##    150 42266263.3228             nan     0.1000 -123374.6155

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 61722748.1842             nan     0.1000 -58311.4506
##      2 61444063.1655             nan     0.1000 67111.0253
##      3 61292321.3404             nan     0.1000 -9670.3428
##      4 60904845.2135             nan     0.1000 258979.5924
##      5 60463312.9537             nan     0.1000 -160330.8167
##      6 60334564.7980             nan     0.1000 -96513.2707
##      7 60114379.0941             nan     0.1000 209336.6560
##      8 59867098.6070             nan     0.1000 162800.6981
##      9 59560243.6981             nan     0.1000 -182353.5260
##     10 59397736.7839             nan     0.1000 23347.1638
##     20 58154133.3214             nan     0.1000 -182502.8065
##     40 57176220.8311             nan     0.1000 -375776.8573
##     60 56430031.7028             nan     0.1000 -20296.7819
##     80 55270973.4844             nan     0.1000 -205598.4379
##    100 54815172.8912             nan     0.1000 -158761.9683
##    120 54168696.2857             nan     0.1000 -70805.8870
##    140 53654648.5812             nan     0.1000 -182157.9761
##    150 53272664.4876             nan     0.1000 -346347.3236

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 61742572.4384             nan     0.1000 360731.1021
##      2 61319218.5495             nan     0.1000 321079.8199
##      3 60930069.3797             nan     0.1000 92362.4347
##      4 60691073.7808             nan     0.1000 -65747.7811
##      5 59863272.8315             nan     0.1000 -135684.6573
##      6 59658509.0360             nan     0.1000 3891.1120
##      7 59318685.7940             nan     0.1000 231928.1649
##      8 59059847.4691             nan     0.1000 -11922.1633
##      9 58542033.4743             nan     0.1000 -141607.4021
##     10 58172311.9263             nan     0.1000 -28971.3531
##     20 56241977.1385             nan     0.1000 -62158.5923
##     40 53621136.8237             nan     0.1000 -77319.2964
##     60 52365230.4222             nan     0.1000 -195830.8060
##     80 51136355.7956             nan     0.1000 -171189.8751
##    100 49267473.0459             nan     0.1000 -208381.8033
##    120 48002174.5657             nan     0.1000 -79359.5649
##    140 47042880.8101             nan     0.1000 -80807.3893
##    150 46604823.7837             nan     0.1000 -206773.8107

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 61361732.3182             nan     0.1000 22918.5390
##      2 60527180.2921             nan     0.1000 21008.1028
##      3 59829524.7755             nan     0.1000 -165263.0243
##      4 59350835.2167             nan     0.1000 -33096.5772
##      5 58975561.6758             nan     0.1000 91619.4672
##      6 58448858.3458             nan     0.1000 -214008.9410
##      7 58156173.1259             nan     0.1000 -68820.7895
##      8 57354028.8176             nan     0.1000 376110.2793
##      9 57015859.2136             nan     0.1000 -211659.9204
##     10 56802151.1642             nan     0.1000 -88442.6351
##     20 53967496.9270             nan     0.1000 -209140.2222
##     40 51044634.7436             nan     0.1000 -237544.2832
##     60 48952864.9747             nan     0.1000 -187874.9891
##     80 46736305.9114             nan     0.1000 -195424.6276
##    100 44827435.1396             nan     0.1000 42044.1723
##    120 43714208.8820             nan     0.1000 -151743.5425
##    140 42291031.6747             nan     0.1000 -260554.1759
##    150 41842287.7746             nan     0.1000 -115608.4363

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 45359626.6870             nan     0.1000 152375.5121
##      2 45131413.6063             nan     0.1000 40264.6931
##      3 44842867.0793             nan     0.1000 -170.7789
##      4 44658882.5694             nan     0.1000 188398.1732
##      5 44398523.0187             nan     0.1000 21026.6723
##      6 44211472.3245             nan     0.1000 -62307.1540
##      7 44061510.4699             nan     0.1000 42235.2246
##      8 43872108.9692             nan     0.1000 81727.1423
##      9 43699498.6605             nan     0.1000 126660.4728
##     10 43566532.2997             nan     0.1000 -46071.8861
##     20 42555610.5080             nan     0.1000 39958.4089
##     40 41600821.5580             nan     0.1000 -65711.2295
##     60 41116478.9935             nan     0.1000 -88685.5627
##     80 40770096.9544             nan     0.1000 -102054.1074
##    100 40509981.0456             nan     0.1000 -107386.4962
##    120 40269251.3421             nan     0.1000 -77379.0889
##    140 40148802.4980             nan     0.1000 -55677.7440
##    150 39991295.5460             nan     0.1000 -120621.3023

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 45269568.8170             nan     0.1000 141093.3922
##      2 44818138.5969             nan     0.1000 119129.0935
##      3 44495913.5877             nan     0.1000 120907.0369
##      4 44192413.1497             nan     0.1000 104588.6299
##      5 43881499.9987             nan     0.1000 104176.2211
##      6 43701537.2930             nan     0.1000 142322.9591
##      7 43512783.3041             nan     0.1000 -26614.5970
##      8 43313503.9783             nan     0.1000 -9222.1103
##      9 43097376.3834             nan     0.1000 1710.4018
##     10 42991158.6154             nan     0.1000 35189.6813
##     20 40831168.0092             nan     0.1000 31963.2064
##     40 38200116.5546             nan     0.1000 6301.9427
##     60 36761188.7320             nan     0.1000 -45024.5309
##     80 36050676.4381             nan     0.1000 -25575.7023
##    100 34850106.4707             nan     0.1000 -78042.0243
##    120 34240235.7902             nan     0.1000 -73781.0716
##    140 33632433.8315             nan     0.1000 -143496.7814
##    150 33123151.9142             nan     0.1000 -37703.4616

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 44860379.2244             nan     0.1000 335818.3685
##      2 44552761.9917             nan     0.1000 109045.7013
##      3 44297227.8064             nan     0.1000 147136.9047
##      4 43742997.3767             nan     0.1000 129320.2471
##      5 43305030.9107             nan     0.1000 34272.0629
##      6 43025077.6160             nan     0.1000 83304.7097
##      7 42569335.5351             nan     0.1000 122168.1302
##      8 42159800.5583             nan     0.1000 -47541.4092
##      9 41832408.4814             nan     0.1000 143720.8269
##     10 41550780.6300             nan     0.1000 120019.4003
##     20 39384501.6341             nan     0.1000 -129400.7025
##     40 36594556.7323             nan     0.1000 -29488.3701
##     60 35199305.1595             nan     0.1000 -131734.5064
##     80 33598775.0705             nan     0.1000 -31885.4157
##    100 32405299.5829             nan     0.1000 -106593.0136
##    120 31643660.2913             nan     0.1000 -92378.3149
##    140 30459562.0953             nan     0.1000 -132838.8905
##    150 30204680.8710             nan     0.1000 -79761.3002

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut = 10, : These variables have zero
## variances: data_channel_is_lifestyle, data_channel_is_entertainment, data_channel_is_bus, data_channel_is_socmed,
## data_channel_is_tech, data_channel_is_world

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 12:
## data_channel_is_lifestyle has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 13:
## data_channel_is_entertainment has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 14:
## data_channel_is_bus has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 15:
## data_channel_is_socmed has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 16:
## data_channel_is_tech has no variation.

## Warning in (function (x, y, offset = NULL, misc = NULL, distribution = "bernoulli", : variable 17:
## data_channel_is_world has no variation.

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1 57627640.1786             nan     0.1000 -11927.1630
##      2 57461052.5242             nan     0.1000 -103953.5982
##      3 57182495.7558             nan     0.1000 184411.1647
##      4 56867859.4972             nan     0.1000 43467.4084
##      5 56537115.3978             nan     0.1000 -77690.9639
##      6 56303884.8874             nan     0.1000 156197.1188
##      7 56110725.3429             nan     0.1000 138691.7657
##      8 55935822.6258             nan     0.1000 -104938.1087
##      9 55791581.3300             nan     0.1000 17400.1433
##     10 55631480.0037             nan     0.1000 70018.8459
##     20 54763737.4088             nan     0.1000 -19426.9318
##     40 53829618.3410             nan     0.1000 -64323.2236
##     50 53397196.7901             nan     0.1000 100716.1634
bt_fit
## Stochastic Gradient Boosting 
## 
## 4939 samples
##   58 predictor
## 
## Pre-processing: centered (58), scaled (58) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 3952, 3951, 3951, 3952, 3950 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE      Rsquared    MAE     
##   1                   50      7395.282  0.02085300  2865.590
##   1                  100      7404.537  0.02388135  2878.526
##   1                  150      7418.735  0.02181471  2868.734
##   2                   50      7410.129  0.02126213  2877.848
##   2                  100      7460.038  0.02148872  2904.557
##   2                  150      7479.911  0.02126598  2912.081
##   3                   50      7433.061  0.02215608  2862.701
##   3                  100      7494.895  0.02224647  2867.994
##   3                  150      7522.705  0.02286536  2896.889
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## Tuning parameter 'n.minobsinnode' was held
##  constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 50, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode
##  = 10.

Comparison - Jordan

Finally, let’s compare our four models: 2 linear models, 1 random forest model, and 1 boosted tree model.

# random forest prediction on testing model and its performance
predRF <- predict(ranfor, newdata = Testing)
RF <- postResample(predRF, Testing$shares)

# linear model 1 prediction on testing model and its performance
predlm1 <- predict(fit1, newdata = Testing)
LM <- postResample(predlm1, Testing$shares)

# linear model 2 prediction on testing model and its performance
predlm2 <- predict(lm_fit, newdata = Testing)
LM2 <- postResample(predlm2, Testing$shares)

# boosted tree prediction on testing model and its performance
predbt <- predict(bt_fit, newdata = Testing)
BT <- postResample(predbt, Testing$shares)

# combine each of the performance stats for the models and add a column with the model names
dat <- data.frame(rbind(t(data.frame(LM)), t(data.frame(RF)), t(data.frame(LM2)), t(data.frame(BT))))
df <- as_tibble(rownames_to_column(dat, "models"))

# find the model with the lowesr RMSE
best <- df %>% filter(RMSE == min(RMSE)) %>% select(models)

# print "The Best fitting model according to RMSE is [insert model name for lowest RMSE here]"
paste("The Best fitting model according to RMSE is", best$models, sep = " ")
## [1] "The Best fitting model according to RMSE is RF"

Automation - Jonathan

#rmarkdown::render(
#  "Tanley-Wood-Project2.Rmd",
#  output_format="github_document",
#  output_dir="./Analysis",
#  output_options = list(
#    html_preview = FALSE
#  )
#)