The 2025 paper by Robinett et al., ‘Atlas-based Manifold Representations for Interpretable Riemannian Machine Learning’, provides an algorithm for fitting a low dimensional manifold from a point cloud by means of a novel algorithm for approximating an atlas of charts. This post illustrates the Atlas-Learn method by reconstructing a sphere from a 3D point cloud of naive random samples and works through some checks on accuracy.
An idiosyncratic, unabashedly biased, time-constrained attempt to capture the depth and breadth of the new packages submitted to CRAN in a single month entirely without the use of AI: clearly unsustainable, but maybe entertaining and somewhat useful.
This post presents a proof of concept for constructing Healthcare Economics Models from a patient’s perspective. Taking oral squamous cell carcinoma as a case study, it models the health states that a patient would visit after treatment as a continuous-time Markov chain. Estimates of jump-chain train transition probabilities and sojourn times are then used to construct a synthetic data set. Calculated values for the total expected amount of time that patients will spend it each health state for two treatment arms are combined with utilities for each health state derived from the literature to compare the total quality adjusted life years for each treatment arm. It is envisioned that models of this sort, constructed under the supervision medical experts, would be useful to clinicians both in documenting the treatment decision process and in communicating with patients.
An attempt to capture the depth and breadth of what’s new on CRAN.
Posit Assistant does indeed support Skills.
The community has come together to create some great Claude Skills that you can try out today.
One hundred eighty-three new packages made it to CRAN in November 2025. Here are my picks organized into sixteen categories: AI, Astronomy, Causal Inference, Data, Ecology, Epidemiology, Genomics, Machine Learning, Marketing, Medical Statistics, Risk Analysis, Shiny, Statistics, Time Series, Utilities, and Visualization.
This post proposes smoothing ROC curves to make them into objects that can be studied with calculus. It shows how taking derivatives of the ROC curve enables conducting likelihood ratio tests and explores how basic concepts from differential geometry, such as curvature and arc length, may be helpful in examining the behavior of ROC curves. Code is provided to illustrate the ideas presented, and some trouble is taken to examine the effects of smoothing.
Here are my picks for the best of the two hundred and four new packages that landed on CRAN in October. Packages are organized into fifteen categories: Data, Decision Analysis, Ecology, Econometrics, Finance, Genomics, Logic, Machine Learning, Mathematics, Medical Statistics, Statistics, Time Series, Utilities, and Visualization.
This post explains the basics of ROC curves using simple code and intuitive explanations.
The ggextenders club provides inspiration and resources for those venturing into the exciting world of creating custom ggplot2 extensions.
Here are my picks of the best of the two hundred forty new packages that landed on CRAN in September, organized into thirteen categories: Computational Methods, Data, Decision Analysis, Ecology, Epidemiology, Finance, Machine Learning, Mathematics, Medical Applications, Programming, Statistics, Utilities, and Visualization.
In a previous post, A Simple Bayesian Multi-state Survival Model for a Clinical Trial, I described a textbook example comparing the effectiveness of asthma treatments by fitting a discrete-time Markov model to clinical trial data. In this post, I elaborate on that analysis by fitting a continuous-time Markov chain (CTMC) to the same, discretely observed state-table data. An EM algorithm reconstructs the likelihood function to estimate the generator matrix for the CTMC. I describe some of the differences between discrete-time and continuous-time Markov models and highlight the importance of understanding how the data implicitly sets time scales when modeling real world processes.
Here are my picks for the best from the ninety-seven new packages that landed on CRAN in August, organized in eighteen categories: Causal Inference, Data, Differential Privacy, Ecology, Environmental Studies, Epidemiology, Geology, Genetics, Genomics, Health Technology Assessment, Machine Learning, Medical Statistics, Statistics, Surveys, Time Series, Toxicology, Utilities, and Visualization.
In this post, I explore some properties of the Dirichlet distribution and illustrate the behavior of the symmetric Dirichlet distribution as alpha, the concentration parameter, varies. Understanding this behavior may be helpful in constructing an informative prior for the multinomial distribution.
The CRAN new package Top 40 - Here are my picks for the best new packages to land on CRAN in July, organized in thirteen categories: Causal Inference, Computational Methods, Data, Ecology, Epidemiology, Machine Learning, Mathematics, Medical Statistics, Pharma, Statistics, Time Series, Utilities, and Visualization.
Explore the newly released ANES 2024 survey data in R with {survey} and {srvyr}.
An attempt to capture the depth and breadth of what’s new on CRAN: here are my Top 40 picks in twenty-one categories: AI, Chess, Computational Methods, Data, Decision Analysis, Ecology, Epidemiology, Finance, Genomics, Lingusitics, Machine Learning, Mathematics, Medical Statistics, Music Theory, Networks, Programming, Statistics, Time Series, Utilities, and Visualization.
In this post, we walk the reader through the fundamental driver of successfully using AI agents – prompting. Prompting is the method of instructing an LLM to perform a task by carefully designing an input text.
In this second installment, we analyze a complex dataset of diabetes patient encounters with 100,000+ rows and over 50 columns and dive deeper to evaluate the effectiveness of healthcare delivery and the impact of patient-level variables on downstream outcomes.
An attempt to capture the depth and breadth of what’s new on CRAN. - One Hundred seventy-six new packages made it to CRAN in May. Here are my Top 40 picks in eighteen categories: Climate Science, Computational Methods, Data, Decision Analysis, Ecology, Epidemiology, Finance, Genomics, Machine Learning, Medicine, Networks, Phylogenetics, Programming, Statistics, Time Series, Topological Data Analysis, Utilities, and Visualization.
This post shows how to use the elementary theory of discrete time Markov Chains to construct a multi-state model of patients progressing through various health states in a randomized clinical trial comparing different treatments for asthma management under the assumption that all patients in each of the two arms of the trial will eventually experience treatment failure. The post shows how to calculate transition probabilities using a simple multinomial Bayesian model and exploit the theory of absorbing Markov chains to calculate fundamental health metrics.
In this post, the first in a series of two focused on healthcare applications, I will introduce a complex healthcare data set and outline the problems I propose to solve using tidyverse and GitHub Copilot to facilitate nuanced argument tuning of tidyverse functions and accelerate the data analysis process.
An attempt to capture the depth and breadth of what’s new on CRAN. Here are my Top 40 picks in twenty-two categories: Archaeology, Artificial Intelligence, Biology, Chemistry, Climate Science, Data, Ecology, Epidemiology, Genomics, Geoscience, Health Technology Assessment, Linguistics, Machine Learning,Mathematics, Medicine, Networks, Psychology, Statistics, Time Series, Utilities, and Visualization.
In a previous post, we presented an rjags version of a Bayesian model from the textbook: Evidence Synthesis for Decision Making in Healthcare by Welton et al. (2012). This post continues the work of porting key models from the text to JAGS.
rjags
JAGS
VS Code is Microsoft’s open-source software development environment that can be customized for any language. It is arguably the general purpose GUI programming tool of choice when working across multiple computer languages. As a data scientist, its R - Python integration deserves careful consideration.
An attempt to capture the depth and breadth of what’s new on CRAN: here are my Top 40 picks in sixteen categories: Agriculture, Archaeology, Biology, Climate Modeling, Computational Methods, Data, Ecology, Epidemiology, Genomics, Machine Learning, Medicine, Risk Forecasting, Statistics, Time Series, Utilities, and Visualization.
This post describes a chance encounter with a time series data set for which the forecast and fable packages found different ARIMA models that don’t look much alike, but produce surprisingly close forecasts. It is a reminder of the inherent identifiability problem of ARIMA models and a record of a couple of afternoons spent down this rabbit hole.
forecast
fable
An attempt to capture the depth and breadth of what’s new on CRAN: Here are my Top 40 picks in fifteen categories: Artificial Intelligence, Computational Methods, Ecology, Genomics, Health Sciences, Mathematics, Machine Learning, Medicine, Music, Pharma, Statistics, Time Series, Utilities, Visualization, and Weather.
A simplified version of the four color theorem: suppose you have a 2x2 grid of squares, and you need to paint each square one of four colors: red, blue, green, or yellow. The restriction is that no two adjacent squares (sharing a side) can have the same color. How many valid ways can you color the grid?
This post presents a JAGS version of a WinBUGS model presented in the classic textbook Evidence Synthesis for Decision Making in Healthcare by Nicky J. Welton, Alexander J. Sutton, Nicola J. Cooper, Keith R. Abrams, and A.E. Ades.
TimeGPT is a pre-trained, multi-layer, encoder/decoder transformer model with self-attention mechanisms designed specifically for time series forecasting. This post, a revision of the of the post first published on 2025-02-12, corrects an error that deleteriously affected the ARIMA and exponential smoothing forecasts which are contrasted with the TimeGPT forecast.
In this post, I show an example of Simpson’s paradox in a logistic regression model of synthetic clinical trial data.
In a previous post, I described The Twelve Coins Problem, a notoriously hard problem that comes in many flavors and was popular on both sides of the Atlantic during World War II. In this post, I show how to build on Freeman Dyson’s solution to solve a generalization of the problem.
We share a list of upcoming conferences that either focus on the R programming language or showcase its use in the field.
In our previous post, Examining Meta Analysis, we contrasted a frequentist version of a meta analysis conducted with R’s meta package with a Bayesian meta analysis done mostly in stan using therstan package as a front end. In this post, we repeat the analysis using the brms package, which also depends on stan but allows the user to formulate complex Bayesian models without writing any stan code.
meta
stan
rstan
brms
The Twelve Coins Problem, a notoriously hard problem that comes in many flavors, was popular on both sides of the Atlantic during World War II; it was even suggested that it should be dropped over Germany in an attempt to sabotage their war effort.
A man went into a bank with 1,000 silver dollars and 10 bags. He said, ‘Place this money, please, in the bags in such a way that if I call and ask for a certain number of dollars you can hand me over one or more bags, giving me the exact amount called for without opening any of the bags.’
Let’s run some modular arithmetic using R.
Learn how to solve Bachet’s Four Weights Problem using R, with code and explanations to measure weights from 1 to 40 efficiently.
In this post we would like to review the idea of meta-analysis and compare a traditional, frequentist style, random effects meta-analysis to Bayesian methods.
One hundred eighty-one new packages made CRAN’s final cut in October.
We find more solutions to the 100 Bushels of Corn puzzle using the numbers R package.
100 bushes of corn are distributed to 100 people such that every man receives 3 bushels, every woman 2 bushels, and every child 1/2 a bushel. How many men, women, and children are there? (Solved with R).
Manifold Learning reduces data dimensions to discover patterns for analysis and visualization. This post provides an overview of Manifold Learning and its algorithms, the tsne package, and other R tools and resources.
Explore new job opportunities that highlight R skills.
Two hundred thirty new packages made it to CRAN in September. Here are my “Top 40” selections in 17 categories.
“Top 40” is back, broadcasting on the new R Works blog.
We hope that the R Works blog informs and inspires R users everywhere.