What Is Regression Analysis in Biostatistics?

Fundamentally, regression analysis examines the relationship between one or more independent variables (predictors) and a dependent variable (outcome). The dependent variable in biostatistics typically refers to a biological measurement or health outcome that we intend to understand or forecast, whereas the independent variables could be risk factors, treatments, or demographic traits.

Examples of Regression

How does BMI (Body Mass Index) affect blood glucose levels?
Does smoking status impact lung function?
Can we predict a patient’s risk of heart disease based on age and cholesterol?

Types of Regression in Biostatistics

Depending on the type of data and the research question, several regression methods can be applied:

Linear Regression

The simplest form of regression assumes a linear relationship between variables. It’s used when the dependent variable is continuous (e.g., blood pressure, weight, cholesterol levels).

Formula:

Y = β₀ + β₁X₁ + β₂X₂ + ... + ε

Where:

Y = outcome (e.g., blood pressure)
X₁, X₂ = predictors (e.g., age, weight)
β = coefficients
ε = error term

Logistic Regression

When dealing with binary outcomes (like disease presence/absence, Yes/No), logistic regression becomes essential. Rather than predicting the value directly, it models the probability of an outcome using the logit function:

log(p/(1-p)) = β₀ + β₁X₁ + ... + βₙXₙ

Example: The output is a probability, which is then converted to a yes/no prediction

Applications include:

Predicting the likelihood of disease based on risk factors
Modelling treatment response (success/failure)
Calculating odds ratios for epidemiological studies

Multivariate Regression

Applied when there are multiple dependent variables being predicted at once.

Poisson and Negative Binomial Regression

Used for count data, like the number of hospital visits or number of mutations in a gene.

Cox Proportional Hazards Regression

For time-to-event data (survival analysis), Cox regression models the hazard function:

h(t) = h₀(t) × exp (β₁X₁ + ... + βₙXₙ)

This is crucial for:

Analyzing patient survival times
Assessing time to disease recurrence
Evaluating treatment efficacy while accounting for varying follow-up periods

Applications of Regression

Regression isn't just about fitting lines to data. In biostatistics, it plays a crucial role in:

Hypothesis testing – Are the variables significantly related?
Adjusting for confounders – Controlling for other variables that may affect the outcome.
Risk prediction – Estimating the likelihood of an event or disease.
Public health policy – Guiding decisions based on data-driven evidence

Non-Linear Regression

When there isn't a straight line that can explain the link between the independent and dependent variables, non-linear regression is used. Unlike linear regression, which fits data to a line, non-linear regression fits data to a curve.

These curves could be:

Exponential
Logarithmic
Sigmoidal (S-shaped)
Polynomial
Michaelis-Menten (in enzyme kinetics)

In non-linear regression:

The relationship is defined by a non-linear equation.
The parameters are estimated iteratively, often using numerical methods like least squares optimization.
Software like BioStat Prime, GraphPad Prism, R, SAS, or Python (SciPy) helps fit these complex models easily.

Unlike linear regression, there’s no closed-form solution. That’s why non-linear models need good initial estimates to converge on the best-fit solution.

Types of Nonlinear Regression Models

Exponential Regression
Logistic Regression
Polynomial Regression
Power Regression
Sigmoidal (Logistic Growth) Regression
Michaelis-Menten
Gompertz Model
Hill Equation Regression

Non-linear regressions Examples

1. Dose-Response Relationship

Increasing a drug's dosage in pharmacology doesn't always increase its effect linearly. At a certain point, the effect plateaus. This relationship is often modelled using a sigmoidal (logistic) curve, such as the four-parameter logistic (4PL) or five-parameter logistic (5PL) models.

2. Enzyme Kinetics

The Michaelis-Menten equation is a classic example of non-linear regression used to model enzyme-substrate interactions:

V = (Vmax × [S]) / (Km + [S])