tidytext - Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Last updated 7 months ago
natural-language-processingtext-miningtidy-datatidyverse
16.81 score 1.2k stars 60 packages 16k scripts 45k downloadspins - Pin, Discover, and Share Resources
Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.
Last updated 1 months ago
azuregcloudrpinsrsconnects3storage
14.30 score 315 stars 18 packages 1.7k scripts 5.0k downloadsbutcher - Model Butcher
Provides a set of S3 generics to axe components of fitted model objects and help reduce the size of model objects saved to disk.
Last updated 3 months ago
11.29 score 131 stars 12 packages 144 scripts 5.3k downloadswidyr - Widen, Process, then Re-Tidy Data
Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.
Last updated 2 years ago
11.23 score 327 stars 2 packages 1.7k scripts 2.5k downloadsjaneaustenr - Jane Austen's Complete Novels
Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".
Last updated 2 years ago
jane-austennovelstext-mining
11.01 score 97 stars 62 packages 1.1k scripts 35k downloadsvetiver - Version, Share, Deploy, and Monitor Models
The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.
Last updated 1 months ago
10.70 score 183 stars 1 packages 472 scripts 1.3k downloadsqualtRics - Download 'Qualtrics' Survey Data
Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.
Last updated 2 months ago
apiqualtricsqualtrics-apisurveysurvey-data
10.33 score 215 stars 218 scripts 3.0k downloadsbundle - Serialize Model Objects with a Consistent Interface
Typically, models in 'R' exist in memory and can be saved via regular 'R' serialization. However, some models store information in locations that cannot be saved using 'R' serialization alone. The goal of 'bundle' is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.
Last updated 7 days ago
8.65 score 28 stars 2 packages 171 scripts 1.3k downloadstidylo - Weighted Tidy Log Odds Ratio
How can we measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? Enter the weighted log odds, which 'tidylo' provides an implementation for, using tidy data principles. In particular, here we use the method outlined in Monroe, Colaresi, and Quinn (2008) <doi:10.1093/pan/mpn018> to weight the log odds ratio by a prior. By default, the prior is estimated from the data itself, an empirical Bayes approach, but an uninformative prior is also available.
Last updated 3 years ago
empirical-bayeslog-odds-ratiotidy-datatidyverseweighted-log-odds
7.32 score 95 stars 146 scripts 298 downloadscereal - Serialize 'vctrs' Objects to 'JSON'
The 'vctrs' package provides a concept of vector prototype that can be especially useful when deploying models and code. Serialize these object prototypes to 'JSON' so they can be used to check and coerce data in production systems, and deserialize 'JSON' back to the correct object prototypes.
Last updated 1 years ago
5.02 score 25 stars 2 packages 4 scripts 1.4k downloads