Package: tidytext 0.4.2.9000

tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.

Authors:Gabriela De Queiroz [ctb], Colin Fay [ctb], Emil Hvitfeldt [ctb], Os Keyes [ctb], Kanishka Misra [ctb], Tim Mastny [ctb], Jeff Erickson [ctb], David Robinson [aut], Julia Silge [aut, cre]

tidytext_0.4.2.9000.tar.gz
tidytext_0.4.2.9000.zip(r-4.5)tidytext_0.4.2.9000.zip(r-4.4)tidytext_0.4.2.9000.zip(r-4.3)
tidytext_0.4.2.9000.tgz(r-4.4-any)tidytext_0.4.2.9000.tgz(r-4.3-any)
tidytext_0.4.2.9000.tar.gz(r-4.5-noble)tidytext_0.4.2.9000.tar.gz(r-4.4-noble)
tidytext_0.4.2.9000.tgz(r-4.4-emscripten)tidytext_0.4.2.9000.tgz(r-4.3-emscripten)
tidytext.pdf |tidytext.html
tidytext/json (API)
NEWS

# Install 'tidytext' in R:
install.packages('tidytext', repos = c('https://juliasilge.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/juliasilge/tidytext/issues

Datasets:
  • nma_words - English negators, modals, and adverbs
  • parts_of_speech - Parts of speech for English words from the Moby Project
  • sentiments - Sentiment lexicon from Bing Liu and collaborators
  • stop_words - Various lexicons for English stop words

On CRAN:

natural-language-processingtext-miningtidy-datatidyverse

25 exports 1.2k stars 16.72 score 25 dependencies 60 dependents 26 mentions 16.1k scripts 36.6k downloads

Last updated 6 months agofrom:e1cb807450. Checks:OK: 5 NOTE: 2. Indexed: yes.

TargetResultDate
Doc / VignettesOKOct 07 2024
R-4.5-winNOTEOct 07 2024
R-4.5-linuxNOTEOct 07 2024
R-4.4-winOKOct 07 2024
R-4.4-macOKOct 07 2024
R-4.3-winOKOct 07 2024
R-4.3-macOKOct 07 2024

Exports:augmentbind_tf_idfcast_dfmcast_dtmcast_sparsecast_tdmget_sentimentsget_stopwordsglancereorder_funcreorder_withinscale_x_reorderedscale_y_reorderedtidyunnest_character_shinglesunnest_charactersunnest_linesunnest_ngramsunnest_paragraphsunnest_ptbunnest_regexunnest_sentencesunnest_skip_ngramsunnest_tokensunnest_tweets

Dependencies:clidplyrfansigenericsgluejaneaustenrlatticelifecyclemagrittrMatrixpillarpkgconfigpurrrR6RcpprlangSnowballCstringistringrtibbletidyselecttokenizersutf8vctrswithr

Converting to and from Document-Term Matrix and Corpus objects

Rendered fromtidying_casting.Rmdusingknitr::rmarkdownon Oct 07 2024.

Last update: 2024-04-10
Started: 2016-04-20

Introduction to tidytext

Rendered fromtidytext.Rmdusingknitr::rmarkdownon Oct 07 2024.

Last update: 2023-09-05
Started: 2016-04-19

Term Frequency and Inverse Document Frequency (tf-idf) Using Tidy Data Principles

Rendered fromtf_idf.Rmdusingknitr::rmarkdownon Oct 07 2024.

Last update: 2023-03-23
Started: 2016-05-24

Readme and manuals

Help Manual

Help pageTopics
Bind the term frequency and inverse document frequency of a tidy text dataset to the datasetbind_tf_idf
Create a sparse matrix from row names, column names, and values in a table.cast_sparse
Casting a data frame to a DocumentTermMatrix, TermDocumentMatrix, or dfmcast_dfm cast_dtm cast_tdm
Tidiers for a corpus object from the quanteda packagecorpus_tidiers glance.corpus tidy.corpus
Tidy dictionary objects from the quanteda packagedictionary_tidiers tidy.dictionary2
Get a tidy data frame of a single sentiment lexiconget_sentiments
Get a tidy data frame of a single stopword lexiconget_stopwords
Tidiers for LDA and CTM objects from the topicmodels packageaugment.CTM augment.LDA glance.CTM glance.LDA lda_tidiers tidy.CTM tidy.LDA
Tidiers for Latent Dirichlet Allocation models from the mallet packageaugment.jobjRef mallet_tidiers tidy.jobjRef
English negators, modals, and adverbsnma_words
Parts of speech for English words from the Moby Projectparts_of_speech
Reorder an x or y axis within facetsreorder_func reorder_within scale_x_reordered scale_y_reordered
Sentiment lexicon from Bing Liu and collaboratorssentiments
Tidiers for Structural Topic Models from the stm packageaugment.STM glance.estimateEffect glance.STM stm_tidiers tidy.estimateEffect tidy.STM
Various lexicons for English stop wordsstop_words
Tidy DocumentTermMatrix, TermDocumentMatrix, and related objects from the tm packagetdm_tidiers tidy.dfm tidy.dfmSparse tidy.DocumentTermMatrix tidy.simple_triplet_matrix tidy.TermDocumentMatrix
Utility function to tidy a simple triplet matrixtidy_triplet
Tidy a Corpus object from the tm packagetidy.Corpus
Wrapper around unnest_tokens for characters and character shinglesunnest_characters unnest_character_shingles
Wrapper around unnest_tokens for n-gramsunnest_ngrams unnest_skip_ngrams
Wrapper around unnest_tokens for Penn Treebank Tokenizerunnest_ptb
Wrapper around unnest_tokens for regular expressionsunnest_regex
Wrapper around unnest_tokens for sentences, lines, and paragraphsunnest_lines unnest_paragraphs unnest_sentences
Split a column into tokensunnest_tokens