The tf–idf weight term frequency–inverse document frequency is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a documents relevance given a user query. Tf-idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.

