Linear Regression Resource Guide#
Linear regression is a popular, simple, and flexible technique to model phenomena in a variety of fields. Linear regression predicts response variables from explanatory variables and provides a means of interpreting their association.Linear regression falls under a special class of statistical techniques called generalized linear models (GLMs). Ordinary least squares (OLS) regression is one type of linear regression model, but there are also others.
This guide focuses on resources for learning how to implement linear regression in R and Python, but it is also a part of every data analysis program (even Excel!). Linear regression is used in both statistical analysis, where you are likely interested in estimating the relationships between variables and testing hypotheses about coefficients in the model, and machine learning, where the primary focus is on building a model that will correctly predict an outcome given new input data. The model is the same in both cases, but the approaches differ in terms of how the models are used, evaluated, and optimized.
Getting Started#
- Data Analysis Examples : UCLA Statistical Consulting Center This site provides examples for implementing and interpreting multiple types of regression analysis in Stata, SPSS, MPlus, SAS, and R. Always a good place to start when you have questions on implementing a statistical model in one of these programs. 
R#
- Linear Regression and ANOVA from Princeton Library Guides: Simple and brief overview of linear regression using base R (just standard libraries). 
- Linear Regression from UC Business Analytics R Programming Guide: Somewhat more detailed and comprehensive introduction to linear regression in R from a machine learning perspective. This tutorial adopts a supervised machine learning approach, for example, discussing how to split the data into training and test sets. The code uses the tidyverse packages, which some people may appreciate. 
- Linear Regression and ANOVA in R Cookbook by James D. Long and Paul Teetor: Comprehensive introduction to linear modeling in R. This tutorial discusses linear regression alongside ANOVA, which can be helpful for people with a background using ANOVA. The tutorial also points to many other helpful resources. 
Python#
- Linear Regression in Python using Statsmodels from GeeksforGeeks: Brief and simple introduction to linear regression using the Statsmodels package, including how to install the package, load and visualize the data, and fit and understand models. More generally, GeekforGeeks tutorials tend to be good. 
- Linear Regression in Python from Real Python: Comprehensive, yet approachable, introduction to linear regression and how to implement it using Python. This tutorial shows the implementation with two popular packages: Scikit-Learn and Statsmodels. RealPython tutorials are typically very good resources (and more comprehensive than GeeksforGeeks). 
- Linear Regression in Machine Learning in Python by Ott Toomet: Explanation of linear regression from a machine learning perspective. Section 10.2 of the book discusses how to implement linear regression using both the statsmodels and scikit-learn packages. 
Getting Better#
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani: This textbook is a well-known and approachable introduction to statistical/machine learning, which includes linear regression. Traditionally, the book provided code on how to implement linear regression in R, but now it includes both R and Python. 
- Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill: Although the book is focused on hierarchical models, chapters 3 and 4 discuss linear regression, providing a great overview from a statistical perspective. 
- Applied Linear Statistical Models by Michael H. Kutner, Christopher J. Nachsheim, John Neter, and William Li: Very comprehensive and detailed book about linear models, including linear regression (parts one and two). In contrast to the previous one, this book adopts a more traditional statistical perspective. This book is a great resource for people who want to learn regression in more depth. 
