|
Gensim is an open-source vector space modeling and topic modeling toolkit, implemented in the Python programming language, using NumPy, SciPy and optionally Cython for performance. It is specifically intended for handling large text collections, using efficient online algorithms. Gensim includes implementations of tf–idf, random projections, deep learning with Google's (word2vec ) and document2vec algorithms 〔(Deep learning with word2vec and gensim )〕 (reimplemented and optimized in Cython), hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.〔Radim Řehůřek and Petr Sojka (2010). (Software framework for topic modelling with large corpora ). Proc. LREC Workshop on New Challenges for NLP Frameworks.〕 Gensim has been used in a number of commercial as well as academic applications.〔(Interview with Radim Řehůřek, creator of gensim )〕〔(gensim academic citations )〕 The code is hosted on GitHub〔(gensim source code )〕 and a support forum is maintained on Google Groups.〔(gensim mailing list )〕 Some of the online algorithms in gensim were also published in the PhD dissertation ''Scalability of Semantic Analysis in Natural Language Processing'' of Radim Řehůřek (2011). ==Gensim's tagline== * Topic Modelling for Humans 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Gensim」の詳細全文を読む スポンサード リンク
|