Entity Significance

Overview

You can use the Entity Significance widget to view the entities across all documents, ranked by score or frequency.  This is useful for seeing the entities that are common within a dataset.

About Significance

Significance can be thought of as a way of measuring interesting or uncommon occurrences of terms in a data set.  The more interesting or uncommon a specific term, the more it can be said to have significance.  Significance can be an important source of insight, and can impart a lot of analytic value for many use cases: measuring geographic anomalies, credit card fraud detection, product recommendation to name a few.

The Entity Significance widget describes significance using the following metrics:

Query Coverage: % of matching docs in which the entity occurs.

Query Significance:  TF-IDF score of entity for the query.   Entities with low document counts have their significance suppressed by 33% (well below a dynamically calculated "noise floor") or 66% (just below/at the "noise floor").  When only a subset of the matching documents are returned (eg > 1000 documents), the significance is adjusted to estimate the TF-IDF across the entire matching dataset, not just the returned subset.

MaxDoc Significance: % of times the entity occurs in the documents in comparison to the other entities taken together as an average.

MaxDoc Frequency: The most number of times within a single document that the entity occurred.

Using the Widget

Workspace Filtering

You can use the Entity Significance widget to filter the workspace of other widgets.  You can think of this as a temporary sub-query within your query results.

To filter the workspace

  1. Drag several widgets onto the workspace, including the Entity Significance Widget (a good example involves the Doc. Viewer widget)
  2. Double click on an entity to filter.  The other widgets in your workspace will populate with only the documents that contain that particular entity.

Dragging to the Case Visualizer

You can select individual or multipel entities in the Entity Significance widget and drag them to the Case Visualizer for further analysis.

To drag and drop the entities

  1. Hold Ctrl (PC) or Command (Mac) and select entities,

  2. Drag & drop the icon to the Case Visualizer.

 

Entities dragged together will all be linked by coreference.

Entities dragged from the Entity Significance widget will contain a workspace link in their node properties.

In this section:


 

Related Reference Documentation:

Entity Significance Interface

Related Links:

Thorough explanation of significance: http://www.elasticsearch.org/blog/significant-terms-aggregation/