Knowledge - Document - Query - Manual Aliases

Aliases

Manual Aliases

The Infinit.e query engine provides support for the following activities:

  • Aliasing several different entities into a single entity
    • This applies both to queries, ie adding the aliases to queries involving the master entity, and also to query results, ie replacing "aliased entities" with the "master entity" 
  • Discarding unwanted entities

It should be noted that setting aliases does not affect the stored documents, the transformation occurs in the API before documents are returned.

This means that alias changes are quick-to-apply, easily reversible, and different users can have different aliases; the downside is that the process is not perfect - the statistics are slightly less accurate, there will be occasional duplicate entities/associations (though entity aggregations are never duplicated) etc.

Unlike other query engine controls, aliasing is not configured from either the URL/POST parameters or from the static configuration. Instead JSON share are used. This makes it easy to maintain large numbers of aliases and also to manage which users have which alias configuration enabled. The following sections describe how to write, maintain, and share these JSON configurations.

In this section:

Manual Aliases and Automatic Aliases

As described above, manual aliasing applies at the level of the community and is configured using JSON shares.  In this way, it is possible to build entire alias sets of master entities and associated aliases, which will apply to all queries against the source data in the communities.  

Aliasing can be manually disabled by setting the top-level query field "expandAlias" to false 

It is important to differentiate this type of "manual aliasing" with "automatic aliasing" which can be setup simply be sending the entity query term with the "entityOpt.expandAlias" parameter.  This will allow matching not just on the entity but also on common, automatically extracted, "aliases". 

The diagrams below are intended to clarify these important distinctions

Manual Aliasing

Automatic Aliasing

Share Format

The format of the alias configuration object is as follows:

{
	"<alias-of-master-entity>": { // ie index in the format disambiguatedName/type (MUST BE LOWER CASE)
		"index": string, // OPTIONAL, just used for display, if present should be the same as the index "key" (MUST BE LOWER CASE)
		"disambiguated_name": string, // The disambiguated name corresponding to the index "key"  (CASE SENSITIVE)
		"type": string, // The type corresponding to the index "key" (CASE SENSITIVE)
		"dimension": string, // OPTIONAL: Should be one of "What"/"Who"/"Where" - just used for display/iconography
		"alias": [ string ], // A list of indexes of entities (MUST BE LOWER CASE) that should be aliased to the master entity
		"linkdata": [ string ] // A list of "etext" terms (CASE *IN*SENSITIVE) that are added to any query terms involving the master entity
	},
 
	// Other master entities, in the same format
 
	// Optionally:
	"DISCARD": {
		"disambiguated_name": "DISCARD", // (just for display)
		"index": "DISCARD", // (just for display)
		"type": "SPECIAL", // (just for display)
		"alias": [ string ] // A list of indexes of entities to be discarded
	}
}

Developers who use the Java API/Java driver will notice that this format is simply a map of EntityFeaturePojo objects, which makes its use in Java clients easy. 

Examples

Here is an example of an alias file:

 {
	"brooklyn, ny/location": {
		disambiguated_name: "Brooklyn, NY",
		type: "Location",
		dimension: "Where",
		index: "brooklyn, ny/location",
		alias: ["brooklyn/keyword"]
	},
	"DISCARD": {
		disambiguated_name: "DISCARD",
		index: "DISCARD",
		type: "SPECIAL",
		alias: [
			"sandy/keyword",
			"rt/keyword",
			"amp/keyword"
		]
	},
	"new york city, ny/location": {
		disambiguated_name: "New York City, NY",
		type: "Location",
		dimension: "Where",
		index: "new york city, ny/location",
		alias: [
			"nyc/keyword",
			"downtown nyc/keyword",
			"new york city/keyword",
			"new york city/location"
		]
	}
}

Alias Management

When a query is performed, all JSON share objects belonging to the queried communities, and with type "infinite-entity-alias", are read from the database (or from cache if unchanged).

Where multiple entities are aliased differently across the files, it is undefined which setting will be used, EXCEPT THAT the aliases from any shares solely in the user's personal share will always override community-wide aliases.

The reason for this priority is to allow users to utilize their personal communities as staging areas to test new aliases. That way there is no risk of damaging other users' queries (and conversely the aliases "under test" will always be applied, even if they conflict with operationally deployed aliases).

Once an alias file has been created it should be uploaded either using the Share - Add - JSON API call or from the file uploader

(If using the file uploader GUI then ensure that it is uploaded as JSON, ie select the "Upload New JSON" option, not "Upload New File").

Files uploaded via the API can then be shared with other communities using the Share - Add - Community or Share - Remove - Community calls or again from the file uploader.

Aliasing is a complicated topic, and the following additional functional items are planned on the roadmap:

  • The ability to apply aliases at harvest time (unlike the current aliasing this will not be dynamic, but it may be desirable for very well established aliases.
  • Currently aliases are not passed to Hadoop plugins. This will be added in a later version.
  • Currently aliases are not applied to documents returned from Knowledge - Document - Get. This will be added in a later version.
  • Currently there is no way for community moderators or owners to authorize shares - any member of a community can share their JSON alias set and it will be automatically applied to any appropriate queries.