R packages by juliasilge

tidytext - Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Last updated 12 months ago

natural-language-processingtext-miningtidy-datatidyverse

16.86 score 1.2k stars 61 dependents 17k scripts 45k downloads

pins - Pin, Discover, and Share Resources

Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.

Last updated 2 months ago

azuregcloudrpinsrsconnects3storage

14.17 score 321 stars 17 dependents 1.9k scripts 4.2k downloads

butcher - Model Butcher

Provides a set of S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.

Last updated 13 days ago

11.66 score 132 stars 13 dependents 146 scripts 5.6k downloads

widyr - Widen, Process, then Re-Tidy Data

Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.

Last updated 2 years ago

11.12 score 329 stars 2 dependents 1.7k scripts 1.9k downloads

janeaustenr - Jane Austen's Complete Novels

Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".

Last updated 3 years ago

jane-austennovelstext-mining

11.03 score 95 stars 62 dependents 1.1k scripts 36k downloads

vetiver - Version, Share, Deploy, and Monitor Models

The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.

Last updated 6 months ago

10.48 score 185 stars 1 dependents 466 scripts 1.4k downloads

qualtRics - Download 'Qualtrics' Survey Data

Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.

Last updated 7 months ago

apiqualtricsqualtrics-apisurveysurvey-data

10.23 score 221 stars 272 scripts 2.2k downloads

bundle - Serialize Model Objects with a Consistent Interface

Typically, models in 'R' exist in memory and can be saved via regular 'R' serialization. However, some models store information in locations that cannot be saved using 'R' serialization alone. The goal of 'bundle' is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.

Last updated 5 months ago

8.16 score 30 stars 4 dependents 153 scripts 2.2k downloads

tidylo - Weighted Tidy Log Odds Ratio

How can we measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? Enter the weighted log odds, which 'tidylo' provides an implementation for, using tidy data principles. In particular, here we use the method outlined in Monroe, Colaresi, and Quinn (2008) <doi:10.1093/pan/mpn018> to weight the log odds ratio by a prior. By default, the prior is estimated from the data itself, an empirical Bayes approach, but an uninformative prior is also available.

Last updated 3 years ago

empirical-bayeslog-odds-ratiotidy-datatidyverseweighted-log-odds

7.35 score 96 stars 157 scripts 276 downloads

cereal - Serialize 'vctrs' Objects to 'JSON'

The 'vctrs' package provides a concept of vector prototype that can be especially useful when deploying models and code. Serialize these object prototypes to 'JSON' so they can be used to check and coerce data in production systems, and deserialize 'JSON' back to the correct object prototypes.

Last updated 2 years ago

5.11 score 24 stars 2 dependents 4 scripts 1.8k downloads