
tidytext - Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Last updated
natural-language-processingtext-miningtidy-datatidyverse
17.43 score 1.2k stars 56 dependents 26k scripts 60k downloads
pins - Pin, Discover, and Share Resources
Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.
Last updated
azuregcloudrpinsrsconnects3storage
14.82 score 334 stars 31 dependents 2.3k scripts 7.6k downloadswidyr - Widen, Process, then Re-Tidy Data
Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.
Last updated
11.44 score 334 stars 3 dependents 1.7k scripts 2.6k downloadsjaneaustenr - Jane Austen's Complete Novels
Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".
Last updated
jane-austennovelstext-mining
11.21 score 97 stars 59 dependents 1.4k scripts 46k downloads
qualtRics - Download 'Qualtrics' Survey Data
Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.
Last updated
apiqualtricsqualtrics-apisurveysurvey-data
10.72 score 229 stars 1 dependents 364 scripts 2.8k downloads
vetiver - Version, Share, Deploy, and Monitor Models
The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.
Last updated
10.59 score 198 stars 1 dependents 610 scripts 2.0k downloadsbundle - Serialize Model Objects with a Consistent Interface
Typically, models in 'R' exist in memory and can be saved via regular 'R' serialization. However, some models store information in locations that cannot be saved using 'R' serialization alone. The goal of 'bundle' is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.
Last updated
8.72 score 31 stars 4 dependents 172 scripts 2.7k downloads
tidylo - Weighted Tidy Log Odds Ratio
How can we measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? Enter the weighted log odds, which 'tidylo' provides an implementation for, using tidy data principles. In particular, here we use the method outlined in Monroe, Colaresi, and Quinn (2008) <doi:10.1093/pan/mpn018> to weight the log odds ratio by a prior. By default, the prior is estimated from the data itself, an empirical Bayes approach, but an uninformative prior is also available.
Last updated
empirical-bayeslog-odds-ratiotidy-datatidyverseweighted-log-odds
7.52 score 97 stars 225 scripts 439 downloadscereal - Serialize 'vctrs' Objects to 'JSON'
The 'vctrs' package provides a concept of vector prototype that can be especially useful when deploying models and code. Serialize these object prototypes to 'JSON' so they can be used to check and coerce data in production systems, and deserialize 'JSON' back to the correct object prototypes.
Last updated
5.08 score 26 stars 2 dependents 4 scripts 1.5k downloads