Overview

In order to understand the scoring parameters presented by the Infinit.e API, it is necessary to have a basic understanding of the query and scoring process:

Significance

It is beyond the scope of this documentation to go into much detail about significance, but this section provides a brief description (a 1-line summary is also provided below!):

Separate documentation will describe the scoring algorithms in more detail.

In summary: relevance measures how well a document matches the user's query, significance measures how well an entity matches the user's query (and document significance is simply the sum of the entity significances).

Scoring parameters

All scoring parameters are maintained under a "score" object under the top level query. The remainder of this section describes the "score" object's fields.

{
	"score": {
		"numAnalyze": integer, // (default: 1000)
		// See following sections for other parameters
	}
}

The "numAnalyze" parameter dictates the maximum number of documents to be returned from the Lucene (/ElasticSearch) query and analyzed according to the significance algorithm described above. The larger the number, the more accurate the results but the slower the query.

Empirically, the default of 1000, which takes 0.5-1 second has produced good results.

Note that this parameter is also currently used to determine how many documents are used to generate the "event timeline".

{
	"score": {
		// See preceding sections for other parameters
		"sigWeight": number, // (default: 0.67)
		"relWeight": number, // (default: 0.33)
		// See following sections for other parameters
	}
}

These two floating point numbers represent the relative weight of significance vs relevance (as described above). If they don't sum to 1, they are just divided by their sum.

Increasing the "sigWeight" field tends to return documents that are longer and don't necessarily strongly relate to the user's query; instead they will tend to return documents that discuss concepts particular to the query.

Increasing the "relWeight" field tends to return documents that are shorter and very strongly relates to the user's query.

If one of the two weights is set to 0 then its score is neither calculated nor used.

If both weights are set to 0 then documents are ranked in descending date order and no scoring is performed.

{
	"score": {
		// See preceding sections for other parameters
		"timeProx":{
			"time": string,
			"decay": string
		},
		// See following sections for other parameters
	}
}

"time" is the center point around which to decay.It has the same format as the "min" and "max" fields of the "time" query term, ie "now", Unix time in ms, or one of a standard set of date/time formats (ISO, SMTP, etc).

"decay" is the "half life" of the decay (ie the duration from "time" at which the score is halved). It is in the format "N[INF:dmwy]" where N is an integer and d,m,w,y denote "day", "month", "week" or "year" (eg "1w", "1m"; note currently if "m" is used, then the duration is always 1 month).

{
	"score": {
		// See preceding sections for other parameters
		"geoProx":{
			"ll": string,
			"decay": string
		}
	}
}

"ll" is the lat/long of the center point around which to decay. It has the same format as the "centerll"/"minll"/"maxll" fields of the geospatial query term, ie "lat,long" or "(lat,long)".

"decay" is the "half life" of the decay (ie the distance from "ll" at which the score is halved). It is in the same format as the "dist" field of the geospatial query term, ie in the format "<distance><unit>" where <distance> is an integer or floating point number, and unit is one of "m" (miles), "km" (kilometers), "nm" (nautical miles).

{
	"score":{
		"numAnalyze": 1000,
		"sigWeight": 0.67,
		"relWeight": 0.33,
		"timeProx": {
			"time": "now",
			"duration": "1m"
		}
	}
}