Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Significance can be thought of as a way of measuring interesting or uncommon occurrences of terms in a data set.  he  The more interesting or uncommon a specific term, the more it can be said to have significance.  Significance can be an important source of insight, and can impart a lot of analytic value for many use cases: measuring geographic anomalies, credit card fraud detection, product recommendation to name only a few.

The Entity Significance widget describes significance using the following metrics:

Query Coverage: % of matching docs in which the entity occurs.

Query Significance:  TF-IDF score of entity for the query.   Entities with low document counts have their significance suppressed by 33% (well below a dynamically calculated "noise floor") or 66% (just below/at the "noise floor").  When only a subset of the matching documents are returned (eg > 1000 documents), the significance is adjusted to estimate the TF-IDF across the entire matching dataset, not just the returned subset.

MaxDoc Significance: % of times the entity occurs in the documents in comparison to the other entities taken together as an average.

MaxDoc Frequency: The most number of times within a single document that the entity occurred.

Using the Widget

Workspace Filtering

...