Aggregation

Aggregation

Aggregation in Community Edition (CE) involves the aggregation of various objects in order to create summaries of statistics/metrics that are useful analytically and for visualization.  

Overview

There are a variety of different aggregations available in CE: geo-spatial, temporal, entities and associations.

When a query is performed against a data set in CE, document objects and sub-objects such as entities and associations are returned as part of the query output.  Some objects have temporal and geo-spatial information returned, which can be aggregated to provide useful insight.  Entities and associations, while they exist in their own right as document sub-objects, are also represented as aggregations.  

Entities

When entities  are represented as aggregates they are indexed by disambiguated name and include useful metrics such as docCount, frequency, and dataSetSignificance, which measures the significance of the entity across the entire matching document data set.

Associations

When associations  are represented as aggregates they include useful metrics such as assocSignificance (significance of the association across the data set), entity1_sig (significance of entity1 across the data set), entity2_sig (significance of entity2 across the data set) and docCount (count of documents that include the tracked association).

You can see entity and association aggregations at work in the Entity Significance and Event Timeline widgets.

Geo-spatial and Temporal Aggregations

When entities are returned to a query, they can include a geotag with associated lat/long data.  This enables the entities to be located on a map.  This data can also be rolled up as an aggregation for all occurrences of an entity within a matching data set.  This can be very useful for visualizing data as a heat map.  For more information, see section Map.

Documents themselves contain temporal information which captures when documents were created and published.  This is useful for viewing documents (and the entities and associations comprised by them) on a timeline.  For more information, see section Event Timeline

Advanced Options

The advanced options of the GUI include some settings that pertain to aggregation.  For more information, see section Advanced Options.

 

In this section: