...
Significance can be thought of as a way of measuring interesting or uncommon occurrences of terms in a data set. he The more interesting or uncommon a specific term, the more it can be said to have significance. Significance can be an important source of insight, and can impart a lot of analytic value for many use cases: measuring geographic anomalies, credit card fraud detection, product recommendation to name only a few.
The Entity Significance widget describes significance using the following metrics:
Query Coverage: % of matching docs in which the entity occurs.
Query Significance: TF-IDF score of entity for the query. Entities with low document counts have their significance suppressed by 33% (well below a dynamically calculated "noise floor") or 66% (just below/at the "noise floor"). When only a subset of the matching documents are returned (eg > 1000 documents), the significance is adjusted to estimate the TF-IDF across the entire matching dataset, not just the returned subset.
MaxDoc Significance: % of times the entity occurs in the documents in comparison to the other entities taken together as an average.
MaxDoc Frequency: The most number of times within a single document that the entity occurred.
Using the Widget
Workspace Filtering
...