One hundred eighty-one new packages made CRAN’s final cut in October. Here are my Top 40 picks in thirteen categories: AI, Climate Analysis, Computational Methods, Data, Epidemiology, Genomics, Machine Learning, Medicine, Quality Management, Statistics, Time Series, Utilities, and Visualization.
AI
hollr v1.0.0: Enables chat completion and text annotation with local and OpenAI language models and supports batch processing, multiple annotators, and consistent output formats. See README.
spBPS v0.0-4: Provides functions for Bayesian Predictive Stacking within the Bayesian transfer learning framework for geospatial artificial systems, as described in Rewaiccw & Banerjee (2024). Core functions leverage C++
, making the framework well-suited for large-scale spatial data analysis in parallel and distributed computing environments.
tidyllm v0.1.0: Implements a tidy interface for integrating large language model (LLM) APIs such as Claude, ChatGPT, Groq, and local models via Ollama into R workflows and supports text and media-based interactions, interactive message history, stateful rate limit handling, and a tidy, pipeline-oriented interface for streamlined integration into data workflows. See the vignette.
Climate Analysis
carbonr v0.2.1: Provides a tool for calculating carbon-equivalent emissions based on the UK Government’s Greenhouse Gas Conversion Factors report; it facilitates transparent emissions calculations for various sectors, including travel, accommodation, and clinical activities. See the vignette.
fluxfinder v1.0.0: Parse static-chamber greenhouse gas measurement files generated by a variety of instruments; compute flux rates using multi-observation metadata; and generate diagnostic metrics and plots. Designed to be easy to integrate into reproducible scientific workflows. There is an Introduction and a vignette on integrating with the gasfluxes
package.
Computational Methods
gmresls v0.2.2: Implements a method to solve a least squares system Ax~=b (dim(A)=(m,n) with m >= n) with a precondition matrix B: BAx=Bb (dim(B)=(n,m)) based on the General Minimal Residual Algorithm of Saad & Schultz (1986). See README.
mappeR v1.3.0: Implements an algorithm that generalizes the concept of a Reeb Graph as described in Mémoli and Carlsson (2007) for a Topological Data Analysis of high dimensional data. Look here for examples.
TensorTools v1.0.0: Provides a set of tools for basic tensor operators based on the Discrete Fourier Transform, including the eigenvalue, QR decomposition, LU decompositions of a tensor, and the calculation of the inverse of a tensor and the transpose of a symmetric tensor. See Kernfeld et al. (2015) for the details and the vignette for examples. Note, in this context, a tensor is a multidimensional array.
Data
openFDA v0.1.0: Facilitates access to U.S. Food and Drug Administration’s openFDA data on drugs, devices, foodstuffs, tobacco, and more. See Kass-Hout et al. (2016) for background and the vignette to get started.
ozbabynames v0.1.0: Provides data on the most popular baby names by sex and year for each state in Australia as provided by the state and territory governments. The quality and quantity of the data varies with the state. Look here for information.
Epidemiology
epichains v0.1.1: Provides methods to simulate and analyze the size and length of branching processes with an arbitrary offspring distribution. These can be used, for example, to analyze the distribution of chain sizes or length of infectious disease outbreaks, as discussed in Farrington et al. (2003). There are six vignettes, including a Getting Started guide and Theoretical Background.
epizootic v1.0.0: Extends the pattern-oriented modeling framework of the poems
package to provide functions for modeling disease transmission on a population scale in a spatiotemporally explicit manner and includes seasonal time steps, dispersal functions, objects that store disease states, and a population simulator that includes disease dynamics. See the vignette for an extended example.
serosv v1.0.1: Implements tools to estimate infectious diseases parameters using serological data. Implemented models include SIR models, nonparametric models, semiparametric models, and hierarchical models, which are based on the book by Hens et al. (2013). There are eight vignettes including Hierarchical Bayesian Models and Modeling directly from antibody levels.
Genomics
GSEMA v0.99.3: Provides functions to perform various steps of gene set enrichment meta-analysis, including meta-analysis of effect sizes from different pathways in different studies. See the vignette for examples.
phylotypr v0.1.0: Implements Naive Bayesian Classifier from the Ribosomal Database Project that traditionally has been used to classify 16S rRNA gene sequences to bacterial taxonomic outlines but which applies to any type of gene sequence. See Wang et al. (2007) for background and the vignette for examples.
Machine Learning
textpress v1.0.0: Provides a Natural Language Processing (NLP) toolkit focused on search-centric workflows with minimal dependencies. The package offers key features for web scraping, text processing, corpus search, and text embedding generation via the HuggingFace API. See README for examples.
Medicine
eyetools v0.7.2: Provides functions to help researchers analyze eye data enabling the automation of actions across a pipeline that includes transforming binocular data, gap repair, and event-based processing such as fixations, saccades, and entry and duration in areas of interest. Functions implement the fixation and saccade detection methods proposed by Salvucci and Goldberg (2000) and visualize eye movement. See the Introduction.
MedDataSets v0.1.0: Provides an extensive collection of datasets related to medicine, diseases, treatments, drugs, and public health, covering topics such as drug effectiveness, vaccine trials, survival rates, infectious disease outbreaks, and medical treatments. including AIDS, cancer, bacterial infections, and COVID-19, and information on pharmaceuticals and vaccines. These datasets are sourced from the R ecosystem and other R packages. See the vignette.
ODT v1.0.0: Implements a tree-based method specifically designed for personalized medicine applications that uses genomic and mutational data to identify optimal drug recommendations tailored to individual patient profiles. See Gimeno et al. (2023) for the details and the vignette for an example.
trtswitch v0.1.1: Implements rank-preserving structural failure time model (RPSFTM), iterative parameter estimation (IPE), inverse probability of censoring weights (IPCW), and two-stage estimation (TSE) methods for treatment switching in randomized clinical trials. See Latimer et al. (2017) for background. There are five vignettes including descriptions of RPSFTM and IPE.
UnplanSimon v0.1.0: Implements methods to manage under- and over-enrollment in Simon’s Two-Stage Design by providing adaptive threshold adjustments and sample size recalibration, and post-inference analysis tools to support clinical trial design and evaluation. See the vignette.
Quality Management
r6qualitytools v1.0.1: Implements a suite of statistical tools for Quality Management, designed around the Define, Measure, Analyze, Improve, and Control (DMAIC) cycle used in Six Sigma methodology. It refactors the qualitytools
package incorporating ‘R6’ object-oriented programming for increased flexibility and performance and replaces traditional graphics with modern, interactive visualizations. Look here for examples.
Statistics
clusTMB: v0.1.0: Fits a spatio-temporal finite mixture model using the TMB
package.. Covariate, spatial, and temporal random effects can be incorporated into the gating formula using multinomial logistic regression, the expert formula using a generalized linear mixed model framework, or both. See the vignettes Covariance Structure and Meuse Example.
Statistics (Continued)
extrememix v0.0.1: Fits extreme value mixture models, which are models for tails not requiring selection of a threshold, for continuous data. See Behrens et al.(2004) and Nascimento et al. (2011) for the theory and the vignette for examples.
MECfda v0.1.0: Implements functions to solve scalar-on-function linear models, including generalized linear mixed effect model and quantile linear regression model, and bias correction estimation methods due to measurement error. Details about the measurement error bias correction methods. See Luan et al. (2023), Tekwe et al. (2022), Zhang et al. (2023), and Tekwe et al. (2019) for background and the vignette for an introduction.
multipois v0.2.0: Implements a method to analyze polytomous responses with three or more unordered categories by transforming nominal response data into counts for each categorical alternative. These counts are then analyzed using (mixed) Poisson regression as per Baker (1994).
spStack v1.0.1: Fits Bayesian hierarchical spatial process models for point-referenced Gaussian, Poisson, binomial, and binary data using stacking of predictive densities by sampling from analytically available posterior distributions conditional upon some candidate values of the spatial process parameters. See Zhang et al. (2024) and Pan et al. (2024) for background and the vignette for examples.
svyROC v1.0.0: Provides functions to estimate the receiver operating characteristic (ROC) curve, area under the curve (AUC), and optimal cut-off points for individual classification, taking into account complex sampling designs when working with complex survey data. See Iparragirre et al. (2024), Iparragirre et al. (2022) and Iparragirre & Barrio (2024) for the theory and README for an example.
VDPO v0.1.0: Provides a comprehensive set of tools for analyzing and manipulating functional data with non-uniform lengths and addresses two common scenarios in functional data analysis: Variable Domain Data and Partially Observed Data. See Amaro et al. (2024) for the details. There are two vignettes: Introduction and Model fitting for variable domain data.
Time Series
FTSgof v1.0.0: Offers tools for the analysis of functional time series data, focusing on white noise hypothesis testing and goodness-of-fit evaluations, alongside functions for simulating data and advanced visualization techniques. See Kokoszka et al. (2017), Yeh et al. (2023), Kim et al. (2023), and Rice et al. (2020) for the theory and the vignette for examples.
utsf v1.0.0: Implements a uniform interface for univariate time series forecasting using different regression models in an autoregressive way and provides features such as preprocessing and estimation of forecast accuracy. See the vignette.
Utilities
ctypesio v0.1.1: Provides functions for reading and writing binary data (with files, connections, and raw vectors) using C
type descriptions that convert data between C
types and R types while checking for values outside the type limits, NA
values, etc. See the vignettes Parsing JPEG Markers and Handcrafting a WAV file.
GitStats v2.1.2: Provides functions to obtain standardized data from multiple Git services, including GitHub and GitLab. There are vignettes on Getting Repositories and Setting Hosts.
mall v0.1.0: Enables users to run multiple Large Language Model predictions against a table. The predictions run row-wise over a specified column. It works using a one-shot prompt, along with the current row’s content. Look here got examples.
MDPIexploreR v0.2.0: Provides tools to scrape and analyze data from the MDPI journals, including functions to extract metrics such as submission-to-acceptance times, article types, and plot data, and explore patterns of self-citations. See the vignette.
Visualization
affiner v0.1.1: Provides functions to dilate, permute, project, reflect, rotate, shear, and translate 2D and 3D points. Supports parallel projections including oblique projections such as the cabinet projection as well as axonometric projections such as the isometric projection. There is an Introduction and a vignette on geometry.
doceplot v0.1.3: Provides functions to visualize data with more than two categorical variables and additional continuous variables that are particularly useful for exploring complex categorical data in the context of pathway analysis across multiple conditions. Look here for documentation.
ggcompare v0.0.2: Add mean comparison annotations to a ggplot
to indicate if two or more groups are significantly different. For comparisons between two groups, the p-value is calculated by t-test (parametric) or Wilcoxon rank sum test (nonparametric). For comparisons among more than two groups, the p-value is calculated by One-way ANOVA (parametric) or Kruskal-Wallis test (nonparametric). Look here for examples.
ggpca v0.1.2: Provides tools for creating publication-ready dimensionality reduction plots, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) using the ggplot2
framework. See the vignette.
plotscaper v0.2.3: Implements a framework for creating interactive figures for data exploration. All plots are automatically linked and support several kinds of interactive features, including selection, zooming, panning, and parameter manipulation. There is an Introduction and four additional vignettes, including Available Interactions and Layout.
vchartr v0.1.3: Provides an htmlwidgets
interface to VChart, a cross-platform charting library and expressive data storyteller. See the vignette.