Package: tidytext 0.4.2.9000

tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Authors:Gabriela De Queiroz [ctb], Colin Fay [ctb], Emil Hvitfeldt [ctb], Os Keyes [ctb], Kanishka Misra [ctb], Tim Mastny [ctb], Jeff Erickson [ctb], David Robinson [aut], Julia Silge [aut, cre]

tidytext_0.4.2.9000.tar.gz
tidytext_0.4.2.9000.zip(r-4.5)tidytext_0.4.2.9000.zip(r-4.4)tidytext_0.4.2.9000.zip(r-4.3)
tidytext_0.4.2.9000.tgz(r-4.5-any)tidytext_0.4.2.9000.tgz(r-4.4-any)tidytext_0.4.2.9000.tgz(r-4.3-any)
tidytext_0.4.2.9000.tar.gz(r-4.5-noble)tidytext_0.4.2.9000.tar.gz(r-4.4-noble)
tidytext_0.4.2.9000.tgz(r-4.4-emscripten)tidytext_0.4.2.9000.tgz(r-4.3-emscripten)
tidytext.pdf |tidytext.html✨
tidytext/json (API)
NEWS

# Install 'tidytext' in R:

install.packages('tidytext', repos = c('https://juliasilge.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/juliasilge/tidytext/issues

Pkgdown site:https://juliasilge.github.io

Datasets:

nma_words - English negators, modals, and adverbs
parts_of_speech - Parts of speech for English words from the Moby Project
sentiments - Sentiment lexicon from Bing Liu and collaborators
stop_words - Various lexicons for English stop words

On CRAN:

natural-language-processing text-mining tidy-data tidyverse

16.86 score 1.2k stars 61 packages 17k scripts 45k downloads 26 mentions 25 exports 25 dependencies

Last updated 12 months agofrom:e1cb807450. Checks:6 OK, 3 NOTE. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 06 2025
R-4.5-win	NOTE	Mar 06 2025
R-4.5-mac	NOTE	Mar 06 2025
R-4.5-linux	NOTE	Mar 06 2025
R-4.4-win	OK	Mar 06 2025
R-4.4-mac	OK	Mar 06 2025
R-4.4-linux	OK	Mar 06 2025
R-4.3-win	OK	Mar 06 2025
R-4.3-mac	OK	Mar 06 2025

Exports:augment bind_tf_idf cast_dfm cast_dtm cast_sparse cast_tdm get_sentiments get_stopwords glance reorder_func reorder_within scale_x_reordered scale_y_reordered tidy unnest_character_shingles unnest_characters unnest_lines unnest_ngrams unnest_paragraphs unnest_ptb unnest_regex unnest_sentences unnest_skip_ngrams unnest_tokens unnest_tweets

Dependencies:cli dplyr fansi generics glue janeaustenr lattice lifecycle magrittr Matrix pillar pkgconfig purrr R6 Rcpp rlang SnowballC stringi stringr tibble tidyselect tokenizers utf8 vctrs withr

Converting to and from Document-Term Matrix and Corpus objects

Julia Silge and David Robinson

Rendered fromtidying_casting.Rmdusingknitr::rmarkdownon Mar 06 2025.

Last update: 2024-04-10
Started: 2016-04-20

Introduction to tidytext

Julia Silge and David Robinson

Rendered fromtidytext.Rmdusingknitr::rmarkdownon Mar 06 2025.

Last update: 2023-09-05
Started: 2016-04-19

Term Frequency and Inverse Document Frequency (tf-idf) Using Tidy Data Principles

Julia Silge and David Robinson

Rendered fromtf_idf.Rmdusingknitr::rmarkdownon Mar 06 2025.

Last update: 2023-03-23
Started: 2016-05-24

Help page	Topics
Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset	bind_tf_idf
Create a sparse matrix from row names, column names, and values in a table.	cast_sparse
Casting a data frame to a DocumentTermMatrix, TermDocumentMatrix, or dfm	cast_dfm cast_dtm cast_tdm
Tidiers for a corpus object from the quanteda package	corpus_tidiers glance.corpus tidy.corpus
Tidy dictionary objects from the quanteda package	dictionary_tidiers tidy.dictionary2
Get a tidy data frame of a single sentiment lexicon	get_sentiments
Get a tidy data frame of a single stopword lexicon	get_stopwords
Tidiers for LDA and CTM objects from the topicmodels package	augment.CTM augment.LDA glance.CTM glance.LDA lda_tidiers tidy.CTM tidy.LDA
Tidiers for Latent Dirichlet Allocation models from the mallet package	augment.jobjRef mallet_tidiers tidy.jobjRef
English negators, modals, and adverbs	nma_words
Parts of speech for English words from the Moby Project	parts_of_speech
Reorder an x or y axis within facets	reorder_func reorder_within scale_x_reordered scale_y_reordered
Sentiment lexicon from Bing Liu and collaborators	sentiments
Tidiers for Structural Topic Models from the stm package	augment.STM glance.estimateEffect glance.STM stm_tidiers tidy.estimateEffect tidy.STM
Various lexicons for English stop words	stop_words
Tidy DocumentTermMatrix, TermDocumentMatrix, and related objects from the tm package	tdm_tidiers tidy.dfm tidy.dfmSparse tidy.DocumentTermMatrix tidy.simple_triplet_matrix tidy.TermDocumentMatrix
Utility function to tidy a simple triplet matrix	tidy_triplet
Tidy a Corpus object from the tm package	tidy.Corpus
Wrapper around unnest_tokens for characters and character shingles	unnest_characters unnest_character_shingles
Wrapper around unnest_tokens for n-grams	unnest_ngrams unnest_skip_ngrams
Wrapper around unnest_tokens for Penn Treebank Tokenizer	unnest_ptb
Wrapper around unnest_tokens for regular expressions	unnest_regex
Wrapper around unnest_tokens for sentences, lines, and paragraphs	unnest_lines unnest_paragraphs unnest_sentences
Split a column into tokens	unnest_tokens

Package: tidytext 0.4.2.9000

tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Converting to and from Document-Term Matrix and Corpus objects

Introduction to tidytext

Term Frequency and Inverse Document Frequency (tf-idf) Using Tidy Data Principles

Citation

Development and contributors

Readme and manuals

Help Manual

Usage by other packages (reverse dependencies)