1 π«Ά About
This is the official course website for Statistics & Big data2 2022 - 2023 for laboratories. This website augments lecture topics and provides exercises for home and class assignments. Additional theory wrt slides and textbook will not be part of the exam, indeed they are for your growth and hopefully in the future as quicksilver resource to recover R proficiency from lethargy.
1.1 π§ Logistics
- Lectures: Mon 15:00 - 17:00 CET, Fri 14:00 - 17:00 CET. Class: 75% lectures, 25% tutorials.
- Location: Campus Gemelli /optionally remote
-
Office hours:
- Prof Giuseppe Arbia : Mon 12:00 - 13:00 CET
- Prof. Daniel Zelterman : Tue *:** - *:**am CET
- Dr. NiccolΓ² Salvini: office times = lecture times (reach me by email)
- Contacts: Students should ask all course-related questions on our Piazza forum, where you will also find all the announcements.
1.3 π labsβ contents
- Introduction to the R ecosystem.
- Install R
- Install R Studio and how it works
- Some R tricks that might be useful in your reasearch and professional life
- data wrangling with R
- Basic statistics (Descriptive statistics. Point and interval estimation, test of statistical hypotheses on an average and on a percentage).
- Hypothesis testing on 2 averages and 2 percentages. - Hypothesis testing on more than 2 averages (ANOVA) and on more than 2 percentages (CHI square).
- Multiple linear regression model.
- Nonlinear regression.
- Regression with dummy variables.
- Binomial and multinomial logistic regression. Factor analysis. Cluster analysis.
- Other supervised classification models: outline of regression trees approach (CART), CHAID, C.5, Random Forest, and Gradient Boosting classification algorithms; Bagging, Boosting and other ensembling techniques; Approach to the evaluation criteria of a binary classification model.
1.4 π Suggested reading list
I am going to split resources by the expected level of their audience:
1.4.1 Minimal or 0 knowledge of R
- Zelterman, D. (2021) Applied Multivariate Statistics with R, Spinger-Verlag (ch. 1 & 2)
- Zelterman, D. (2022). Regression for health and social science: Applied linear models with R. Cambridge University Press.
- Everitt, B., Hothorn, T. (2011) An Introduction to Applied Multivariate Analysis with R, Springer-Verlag
- James, G, Witten, D, Hastie, T and Tibshirani, R, (2015) An Introduction to Statistical Learning, with Applications in R
- T. Timbers, T. Campbell, M. Lee Data Science: A First Introduction, Jul 2022 online version
- Wickham, H., Grolemund G. (2018) R for Data Science, OβReilly. Freely available on-line at https://r4ds.had.co.nz/index.html
- R for non-programmers, Daniel Dauber 2022, free book
1.5 π Honor Code
Permissive but strict. If unsure, please ask the course staff!
- OK to search, ask in public about the systems weβre studying. Cite all the resources you reference. E.g. if you read it in a paper, cite it. If you ask on Quora, include the link.
- NOT OKAY to ask someone to do assignments/projects for you, we are monitoring freelancing websites, we have a plethora of bots doing this job daily.
- OK to discuss questions with classmates. Disclose your discussion partners.
- NOT OKAY to blindly copy solutions from classmates.
- OK to use existing solutions as part of your projects/assignments. Clarify your contributions.
- NOT OKAY to pretend that someoneβs solution is yours.
- OK to publish your final project after the course is over (we encourage that and if you need it I would love to help you!)
- NOT OKAY to post your assignment solutions online.
1.7 Colophon
This book was authored using bookdown inside RStudio with bs4 theme The website is hosted with Netlify, and automatically updated after Netlify CI. The complete source is available from GitHub.
This version of the book was built with:
library(devtools)
#> Loading required package: usethis
library(roxygen2)
library(testthat)
#>
#> Attaching package: 'testthat'
#> The following object is masked from 'package:devtools':
#>
#> test_file
#> The following object is masked from 'package:dplyr':
#>
#> matches
devtools::session_info()
#> β Session info βββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 4.2.0 (2022-04-22)
#> os macOS 13.0.1
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Rome
#> date 2023-02-09
#> pandoc 2.17.1.1 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/ (via rmarkdown)
#>
#> β Packages βββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date (UTC) lib source
#> bookdown 0.29 2022-09-12 [1] CRAN (R 4.2.0)
#> brio 1.1.3 2021-11-30 [1] CRAN (R 4.2.0)
#> bslib 0.4.0 2022-07-16 [1] CRAN (R 4.2.0)
#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
#> callr 3.7.1 2022-07-13 [1] CRAN (R 4.2.0)
#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0)
#> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.0)
#> desc 1.4.2 2022-09-08 [1] CRAN (R 4.2.0)
#> devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.2.0)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
#> downlit 0.4.2 2022-07-05 [1] CRAN (R 4.2.0)
#> dplyr * 1.1.0 2023-01-29 [1] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.16 2022-08-09 [1] CRAN (R 4.2.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> glue * 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)
#> htmltools 0.5.3 2022-07-18 [1] CRAN (R 4.2.0)
#> httr 1.4.4 2022-08-17 [1] CRAN (R 4.2.0)
#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.0)
#> kableExtra * 1.3.4 2021-02-20 [1] CRAN (R 4.2.0)
#> knitr * 1.40 2022-08-24 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0)
#> lubridate * 1.9.1 2023-01-24 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.0)
#> pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.2.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
#> processx 3.7.0 2022-07-07 [1] CRAN (R 4.2.0)
#> ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.0)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.0)
#> rmarkdown 2.16 2022-08-24 [1] CRAN (R 4.2.0)
#> roxygen2 * 7.2.0 2022-05-13 [1] CRAN (R 4.2.0)
#> rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0)
#> rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)
#> sass 0.4.2 2022-07-16 [1] CRAN (R 4.2.0)
#> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0)
#> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.0)
#> svglite 2.1.0 2022-02-03 [1] CRAN (R 4.2.0)
#> systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)
#> testthat * 3.1.4 2022-04-26 [1] CRAN (R 4.2.0)
#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.0)
#> usethis * 2.1.6 2022-05-25 [1] CRAN (R 4.2.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0)
#> vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.0)
#> viridisLite 0.4.1 2022-08-22 [1] CRAN (R 4.2.0)
#> webexercises * 1.0.0 2021-09-15 [1] CRAN (R 4.2.0)
#> webshot 0.5.3 2022-04-14 [1] CRAN (R 4.2.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.33 2022-09-12 [1] CRAN (R 4.2.0)
#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)
#> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
#>
#> [1] /Users/niccolo/Library/R/arm64/4.2/library
#> [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#>
#> ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ