Overview
This toolkit element allows you to use regex or javascript to set the document metadata fields (eg title, description, publishedDate).
TODO
Format
TODO convert to JSON
{ "display": string, "docMetadata": {} // see DocumentSpecPojo below } ////////////////////////////////// public static class DocumentSpecPojo { public String title; // The string expression or $SCRIPT(...) specifying the document title public String description; // The string expression or $SCRIPT(...) specifying the document description public String publishedDate; // The string expression or $SCRIPT(...) specifying the document publishedDate public String fullText; // The string expression or $SCRIPT(...) specifying the document fullText public String displayUrl; // The string expression or $SCRIPT(...) specifying the document displayUrl public Boolean appendTagsToDocs; // if true (*NOT* default) source tags are appended to the document public StructuredAnalysisConfigPojo.GeoSpecPojo geotag; // Specify a document level geo-tag }
Legacy documentation:
TODO
Description
Legacy documentation:
TODO
The following formats are currently supported:
if (null == _allowedDatesArray_startsWithLetter) { _allowedDatesArray_startsWithLetter = new String[] { DateFormatUtils.SMTP_DATETIME_FORMAT.getPattern(), "MMM d, yyyy hh:mm a", "MMM d, yyyy HH:mm", "MMM d, yyyy hh:mm:ss a", "MMM d, yyyy HH:mm:ss", "MMM d, yyyy hh:mm:ss.SS a", "MMM d, yyyy HH:mm:ss.SS", "EEE MMM dd HH:mm:ss zzz yyyy", "EEE MMM dd yyyy HH:mm:ss zzz", "EEE MMM dd yyyy HH:mm:ss 'GMT'Z (zzz)", }; _allowedDatesArray_numeric_1 = new String[] { "yyyy-MM-dd'T'HH:mm:ss'Z'", DateFormatUtils.ISO_DATE_FORMAT.getPattern(), DateFormatUtils.ISO_DATE_TIME_ZONE_FORMAT.getPattern(), DateFormatUtils.ISO_DATETIME_FORMAT.getPattern(), DateFormatUtils.ISO_DATETIME_TIME_ZONE_FORMAT.getPattern() }; _allowedDatesArray_numeric_2 = new String[] { "yyyyMMdd", "yyyyMMdd hh:mm a", "yyyyMMdd HH:mm", "yyyyMMdd hh:mm:ss a", "yyyyMMdd HH:mm:ss", "yyyyMMdd hh:mm:ss.SS a", "yyyyMMdd HH:mm:ss.SS", // Julian, these are unlikely "yyyyDDD", "yyyyDDD hh:mm a", "yyyyDDD HH:mm", "yyyyDDD hh:mm:ss a", "yyyyDDD HH:mm:ss", "yyyyDDD hh:mm:ss.SS a", "yyyyDDD HH:mm:ss.SS", }; _allowedDatesArray_stringMonth = new String[] { "dd MMM yy", "dd MMM yy hh:mm a", "dd MMM yy HH:mm", "dd MMM yy hh:mm:ss a", "dd MMM yy HH:mm:ss", "dd MMM yy hh:mm:ss.SS a", "dd MMM yy HH:mm:ss.SS", }; _allowedDatesArray_numericMonth = new String[] { "MM dd yy", "MM dd yy hh:mm a", "MM dd yy HH:mm", "MM dd yy hh:mm:ss a", "MM dd yy HH:mm:ss", "MM dd yy hh:mm:ss.SS a", "MM dd yy HH:mm:ss.SS", }; }
If the date doesn't match one of these formats, add a function along the following lines in the globals script:
// substitue YOUR.DATE.FIELD, and the date format function createPubDate(metadata) { var date = metadata.YOUR.DATE.FIELD; var parsedDate = new java.text.SimpleDateFormat('MM/dd/yyyy hh:mm:ss a (zzz)').parse(date); return '' + parsedDate.toString(); }
and then you can call it from the docMetadata.publishedDate field like:
{ "docMetadata": { //... publishedDate: "$SCRIPT( createPubDate(_doc.metadata) ); //... } }
Examples
TODO