...
Code Block |
---|
{ "docCount": integer, // total number of docs processed "fields": { string: { // the string is the field name, except with "." replaced with "%2e" as described above "fieldName": string, // the actual field name // Human readable output: "frequencyInfo": string, // Frequency information about the field, see below for format "typeInfo": [ string, // Statistics on the different types of the field, see below for format ], "sampleStringData": [ string, // For string values, the "N" most common values ("N" taken from the configuration specified above, default 10) ], "numericDataStats": string, //TODO (field_cnt=5 (parent_cnt=5); pct_parent=100.0%, pct_total=100.0%" For numeric fields, some simple statistics, see below for format // Machine readable output: "count": integer, // (see human readable format, frequencyInfo.field_cnt) "objectCount": integer, // (see human readable format, frequencyInfo.object_cnt) "pctInParent": number, // (see human readable format, frequencyInfo.pct_parent) "pctInTotal": number, // (see human readable format, frequencyInfo.pct_total) "typeInfoexampleValues": [ "value": string, // TODO object, array, string, text, number_float, bool (%, total) ], "sampleStringData": [ One of the top occurring values (string fields only) "count": integer, // The number of times it occurs ], "minValue": number, // For numeric values, the smallest value seen "maxValue": number, // For numeric values, the largest value seen "avgValue": number, // For numeric values, the mean value seen "numberCount": integer, // The total number of samples (in theory - all the times the field has a numeric value) used in the above statistics "typePcts": { string: number, //TODO example data (number) the string is the type name one of (object, array, string, text, number_float, number_int, bool) // the number is the % of the time the field is that type ]}, "numericDataStatstypeCounts": { string: integer, //TODO: "min=25.0 max=411.0 avg=164.8 as above, but count instead of % } } } } |
TODO human readable vs machine readable
TODOThe human readable version has the following formats:
- frequencyInfo: "field_cnt=%1 (obj_cnt= $2% parent_cnt=$3%); pct_parent=$4, pct_total=%5"
- field_cnt is the total number of instances of the field
- obj_cnt is the number of types that the field contains an object (or array of objects) instead of a primitive value - discarded if == field_cnt
- parent_cnt is the number of parent instances in which this field occurs (can be different to field_cnt it the parent_cnt is an array)
- pct_parent is the % of parent fields in which this field occurs
- pct_total is the % of objects (ie docCount) in which this field occurs
- typeInfo: "<type> (<total_cnt>, <total_pct>%)"
- type is one of: object, array, string, text, number_float, number_int, bool
- total_cnt, total_pct: are the count and percentage of that type for that field
- Note that types can overlap, eg [ 1, 2, "3" ] would be 3x array, 3x numeric, 1x string
- sampleStringData: "<value> (<count>)"
- numericDataStats: "min=$1 max=$2 avg=$3"
- min is the smallest value seen
- max is the largest value seen
- avg is the mean value seen