Statistical Workshops

The Center for Statistical Computing (CSC) welcomes all graduate students, staff, and faculty to participate in our statistics workshops. These sessions are held on Zoom and/or in Lab C on the upper level (UL) of Healey Library. Our workshops cover using statistical software such as SPSS, SAS, Stata, Excel, R, RStudio, and Python, along with a variety of statistical procedures. We also provide topics in applied statistics encompassing recently developed statistical methods, utilizing tools such as SPSS, SAS, Stata, R, AMOS, Mplus, and WinBUGS. Descriptions for each workshop are provided below:

Spring 2026 Statistics Workshop Schedule

Applied Statistical Workshops Schedule and Description (Spring 2026)

Basic Statistical Workshops Schedule and Description (Spring 2026)

Find workshop materials.

Statistics Workshop Descriptions

Introduction to Stata

Introduction to Stata provides a comprehensive overview of the Stata software, covering both the graphic user interface and intuitive command syntax approaches. This hands-on workshop is designed to efficiently introduce participants to the fundamentals of using Stata for data analysis. Topics include data browsing and management, descriptive statistics, independent sample t-test, Chi-square tests, linear and logistic regression models. Practice datasets and Stata do-files are provided to support hands-on learning. No prior experience with Stata or statistical software is required.

Introduction to R for Statistical Analysis

Introduction to R provides a comprehensive introduction to the R statistical software, with an emphasis on conducting fundamental statistical analyses. This hands-on workshop is designed to efficiently introduce participants to the basics of using R for data analysis. Topics include descriptive statistics, frequency distributions, Chi-square tests, independent sample t-tests, one-way ANOVA, and linear and logistic regressions. Additional topics include downloading and installing R packages, reading and writing data files, and creating graphical displays in R. Practice datasets and R scripts are provided to support hands-on learning. R is free, open-source software supported by a large and active user community. No prior experience with R or programming is required.

SPSS 1

SPSS 1 provides a comprehensive introduction to SPSS for Windows, covering essential tools for data analysis. Topics include entering and importing data, documenting variable and value labels, examining frequency and crosstab tables for individual and group data, creating and interpreting graphical displays (e.g., bar charts, histogram, and boxplots), recoding variables, performing independent sample t-tests, and conducting simple linear regression with regression plots. Practice datasets are provided to support hands-on learning. No prior experience with SPSS is required.

Statistical Analysis Using Excel

Statistical Analysis Using Excel provides practical guidance on improving efficiency in data analysis using Microsoft Excel. This hands-on workshop covers data entry and organization, descriptive statistics, frequency distribution and crosstabulations, independent and paired sample t-tests, correlation analysis, and simple linear regression. The workshop also briefly discusses the strengths and limitations of Excel and highlights when more advanced statistical software, such as SPSS or R, is recommended for more complex analyses.

SPSS 2

SPSS 2 builds on the concepts and skills introduced in SPSS 1 and focuses on advanced data management and statistical procedures. Topics include selecting cases, combining cases from multiple files, and linking datasets containing different types of information. Statistical procedures covered include Chi-square tests, one-way ANOVA, repeated measurement analysis, non-, multiple regression, and logistic regression. Practice datasets are provided.

Introduction to SAS

Introduction to SAS provides a practical introduction to the SAS system, covering both the SAS DATA step and commonly used PROC procedures for data management and statistical analysis. This hands-on workshop is designed for beginners and focuses on building a solid foundation for working with SAS. Topics include creating and managing SAS datasets, importing data, assigning variable and value labels, and performing basic statistical analyses using PROC FREQ, PROC MEANS, and PROC GLM, as well as an introduction to regression diagnostics. No prior experience with SAS or programming is required.

Intro. To Python in Statistics

Introduction to Python in Statistics is a beginner level, hands-on workshop that uses Spyder to provide a ready-to-use Python computing environment. No prior programming or statistical software experience is required. We begin with an introduction to basic Python concepts and essential data types. Participants will learn to use pandas and NumPy for simple data manipulation, and statsmodels for fitting and interpreting basic statistical models. Topics focus on descriptive statistics and simple linear regression using practical examples from the social sciences.

Introduction to RStudio

Introduction to RStudio provides a comprehensive introduction to RStudio, a user-friendly integrated development environment for the R programming language. This hand-on workshop is designed to efficiently introduce participants to the basics of working in R through RStudio. Topics include basic R concepts and data structures, importing data into R, and performing simple statistical procedures such as descriptive statistics, t-tests, and linear regression. Participants will also learn to use ggplot2 to create and interpret graphical displays, including simple regression plots. No prior experience with R or programming is required.

Statistics using ChatGPT

Statistics using ChatGPT provides a hands-on introduction to using ChatGPT as a support tool for conducting statistical analysis. This workshop introduces the fundamentals of ChatGPT and demonstrates how it can assist participants with statistical programming, model specification, and interpretation of results. Topics include independent sample t-tests, one-way ANOVA, Chi-square tests, and linear regression. The workshop aims to equip participants with practical strategies for effectively and responsibly leveraging ChatGPT in various statistical analysis scenarios.

Introduction to HLM (Mixed Models) using SPSS

This workshop provides an overview of the fundamental concepts of multilevel (hierarchical) linear models, also known as mixed effects models. It focuses on why specialized methods are needed to account for data dependencies, such as the clustering of students within schools or repeated observation within individuals. Participants will learn how to formulate and interpret two-level multilevel models, understand fixed and random effects, variance components, and intraclass correlation (ICC), and use SPSS Mixed Models to estimate and interpret model parameters. Basic knowledge of linear regression is recommended.

Sample Size Estimation and Power Calculations (SAS and G*Power)

This workshop introduces the principles and practice of sample size determinations and statistical power analysis for common research designs. Using SAS PROC POWER and G*Power, participants will learn how to calculate required sample sizes, estimate power, and examine the impact of key design assumptions such as effect size, significance level, and group allocation. Examples will focus on mean and proportion comparisons commonly encountered in thesis, dissertation, and grant proposal planning. No prior experience with power analysis is required. Basic statistical knowledge is recommended.

Introduction to Statistical Learning Using R

This is an introductory workshop that provides an overview of statistical learning methods that are central to modern data analysis, with a focus on regression, classification, and model evaluation. Topics include linear and logistic regression, linear discriminant analysis, tree-based methods such as decision trees, and random forests. Participants will also learn how to evaluate classification models using cross-validation. Dimension reduction and unsupervised learning methods, including principal component analysis (PCA) and clustering, are briefly introduced. All examples are demonstrated using R, with an emphasis on practical implementation and interpretation of results.

Missing Data Analysis using SAS & Stata

This workshop introduces the key concepts and practical methods for handling missing data in applied research. Topics include mechanisms of missingness (MCAR, MAR, and MNAR), assessment of potential non-random selection bias, and the use of single imputation and multiple imputation (MI) strategies. Participants will learn how missing data are typically handled by default in statistical software often through complete case deletion and why this approach can lead to reduced sample size and biased results. Hands-on examples using SAS and Stata will demonstrate how to implement and interpret imputation procedures and appropriately analyze imputed datasets. Basic familiarity with regression analysis is recommended.

Intro to Time Series Analysis using R

This workshop emphasizes the practical aspects of time series analysis. Methods are hierarchically introduced, starting with terminology and exploratory graphics, moving to descriptive statistics, and ending with practical modeling procedures including how to choose an appropriate time series forecasting method, fit a model, evaluate its performance, and use it for forecasting. It focuses on the most popular business forecasting methods: regression models, smoothing methods including Moving Average (MA) and Exponential Smoothing, and Autoregressive (AR) models. Practical implementation in R is illustrated at each stage of the workshop.

Predictive Modeling in Data Science: Logistic Regression and Random Forest (R):

This workshop introduces supervised machine-learning classification methods commonly used in data science, with focus on Logistic Regression and Random Forest implemented in R. Participants will learn how to prepare data, build predictive models, tune parameters, and evaluate performance using metrics such as confusion matrices, ROC curves, and AUC. The session aims to equip students with the analytical skills needed to critically evaluate and apply classification methods in research and applied data analysis projects across a wide range of academic disciplines.

Structural Equation Modeling I (AMOS & R)

This workshop introduces techniques for structural equation modeling (SEM). SEM is employed to test complex relationships between observed (measured) and unobserved (latent) variables. Topics covered include fundamentals underlying SEM, SEM notation, path diagrams, data preparation, mediation analysis, path analysis, parameter estimation, and assessment of model fit. AMOS and R are used to demonstrate examples.

Structural Equation Modeling II (AMOS & R)

The second SEM workshop delves into advanced topics including measurement error, latent variables analysis, exploratory factor analysis (EFA), confirmatory factor analysis (CFA), development of structural equation models with estimation, and model testing. Additionally, this workshop introduces latent growth models for longitudinal data. R program and AMOS are utilized to demonstrate model structures, parameter estimation, and model modification.

Event History Analysis (Survival Models) in SPSS

This workshop introduces statistical methods for event history (survival) analysis using SPSS, focusing on studies in which the outcome of interest is a time-to-event variable. Topics include estimating survival time using the life table and Kaplan-Meier Methods, modeling survival risk, and assessing the relationship between risk factors and survival time using the Cox proportional hazards regression model. All data analysis and demonstrations are conducted in SPSS, with an emphasis on practical implementation and interpretation of results.

Spatial Regression using R

This workshop introduces spatial data analysis and spatial regression modeling using R, with a focus on applied methods for handling spatial dependence. Participants will learn how to work with spatial data, visualize geographic patterns, and assess spatial autocorrelation. The workshop covers the construction of spatial weights matrices and statistical methods for modeling spatial dependence, culminating in the estimation and interpretation of spatial regression models. All examples are demonstrated using widely used R spatial packages, including exploring maptools, sp, sf, and spdep.

Event-Study Regression using R

This is a causal inference research design method for analyzing the impact of a specific event on a particular outcome or variable of interest over a defined time period. The event can be considered as the treatment in a Difference-in-Difference (DiD) analysis, and the dynamics of the impact can be assessed by comparing the changes in outcomes over time between the treated and control groups. This workshop will make use of a variety of R packages, specifically, fixest, plm, and did for event-study regression. Topics covered include data preparation, DiD analysis, dynamic DiD model, and the graphic display of the dynamic event effects.

COVID-19 Data Analysis Using R

This workshop will involve downloading COVID-19 data for states and Massachusetts from the Center for Systems Science and Engineering of Johns Hopkins University and the Department of Public Health (DPH) Massachusetts. We will employ time series and spatial regression models to analyze the COVID-19 data, utilizing R packages such as forecast, tseries, spdep, maptools, and ggplot2. Additionally, this workshop will demonstrate how to use R to generate reports for COVID data.

Menu