MeatballWiki logo MeatballWiki

Edit History Raw RSS Talk

KeywordCondensationClustering

One can theoretically detect clusters of related documents by the commonality of the words used in them. To do this effectively, simply:

  1. Remove stop words.
  2. Stem.
  3. Band-pass filter. Remove high and low frequency words.
  4. Condensation cluster

Nouns tend to appear only in limited topics, whereas modifiers like adjectives are more random. Thus, your clusters will probably be based on noun fourms.

See Wise, J. A. (1999). The ecological approach to text visualization. Journal of the American Society for Information Science, 50(13), 1224-1233.

CategoryGraphTheory

87 words · 1 min read · 0 pages link here