• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
juliasilge
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links tojuliasilge

tidytext - Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Last updated

natural-language-processingtext-miningtidy-datatidyverse

17.43 score 1.2k stars 56 dependents 26k scripts 60k downloads

pins - Pin, Discover, and Share Resources

Publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues. You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with 'DropBox'), 'Posit Connect', 'AWS S3', and more.

Last updated

azuregcloudrpinsrsconnects3storage

14.82 score 334 stars 31 dependents 2.3k scripts 7.6k downloads

widyr - Widen, Process, then Re-Tidy Data

Encapsulates the pattern of untidying data into a wide matrix, performing some processing, then turning it back into a tidy form. This is useful for several operations such as co-occurrence counts, correlations, or clustering that are mathematically convenient on wide matrices.

Last updated

11.44 score 334 stars 3 dependents 1.7k scripts 2.6k downloads

janeaustenr - Jane Austen's Complete Novels

Full texts for Jane Austen's 6 completed novels, ready for text analysis. These novels are "Sense and Sensibility", "Pride and Prejudice", "Mansfield Park", "Emma", "Northanger Abbey", and "Persuasion".

Last updated

jane-austennovelstext-mining

11.21 score 97 stars 59 dependents 1.4k scripts 46k downloads

qualtRics - Download 'Qualtrics' Survey Data

Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.

Last updated

apiqualtricsqualtrics-apisurveysurvey-data

10.72 score 229 stars 1 dependents 364 scripts 2.8k downloads

vetiver - Version, Share, Deploy, and Monitor Models

The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.

Last updated

10.59 score 198 stars 1 dependents 610 scripts 2.0k downloads

bundle - Serialize Model Objects with a Consistent Interface

Typically, models in 'R' exist in memory and can be saved via regular 'R' serialization. However, some models store information in locations that cannot be saved using 'R' serialization alone. The goal of 'bundle' is to provide a common interface to capture this information, situate it within a portable object, and restore it for use in new settings.

Last updated

8.72 score 31 stars 4 dependents 172 scripts 2.7k downloads

tidylo - Weighted Tidy Log Odds Ratio

How can we measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents? One option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability; we haven't counted every feature the same number of times so how do we know which differences are meaningful? Enter the weighted log odds, which 'tidylo' provides an implementation for, using tidy data principles. In particular, here we use the method outlined in Monroe, Colaresi, and Quinn (2008) <doi:10.1093/pan/mpn018> to weight the log odds ratio by a prior. By default, the prior is estimated from the data itself, an empirical Bayes approach, but an uninformative prior is also available.

Last updated

empirical-bayeslog-odds-ratiotidy-datatidyverseweighted-log-odds

7.52 score 97 stars 225 scripts 439 downloads

cereal - Serialize 'vctrs' Objects to 'JSON'

The 'vctrs' package provides a concept of vector prototype that can be especially useful when deploying models and code. Serialize these object prototypes to 'JSON' so they can be used to check and coerce data in production systems, and deserialize 'JSON' back to the correct object prototypes.

Last updated

5.08 score 26 stars 2 dependents 4 scripts 1.5k downloads