Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Using manual text transformation you can specify the data source for your script to work on.  The script is used to enrich the data from the data sources so it can be outputted as metadata for the creation of advanced entities and associations.

The following parameters are used in the configuration of table describes the parameters of the manual text transformation configuration.

ParameterDescription
fieldName

Specifies the data source that the script will execute against

"fullText," "description," or "title"

scriptSpecify your script
flags

Standard Java regex field

Can have different values, based on scriptlang

See below.

javascript:

There are  a few flags that provide additional variables in the javascript:

  • "m" to get "_doc.metadata", written into the variable "_metadata"
    • (for example this flag can be used to copy a subset of the fields from one fieldname to another, before using the "metadataFields" field in the "structuredAnalysis" object to delete the larger field)
  • "d" to get "_doc", written into the variable "_doc",
  • "t" to return the full text of the document into "text". 
    • If the "flags" field is not specified, this is returned by default. If the "flags" field is specified, then "t" must be included or the "text" variable is not populated.

 

 

xpath (and regex, except for "O"):

  • 'H': will HTML-decode resulting fields. (Eg "&" -> "&")
  • 'o': if  the XPath expression points to an HTML (/XML) object, then this object is converted to JSON and stored as an object in the corresponding metadata field array. (Can also be done via the deprecated "groupNum":-1)
  • 'x': if the XPath expression points to an HTML (/XML) object, then the XML of the object is displayed with no decoding (eg stripping of fields)
  • 'D': described above 
  • 'c': if set then fields with the same name are chained together (otherwise they will all append their results to the field within metadata)

 

replacement

If scriptlang is regex or xpath, replacement can be used to replace the value indicated in the regex/xpath.

eg. You could find the instance C/M or C/F in a document and extract that it is important to note that the Race is Caucasian. The same can be done to extract M or F as a Sex meaning Male or Female.

scriptlang

Specifies the language of the script that will be provided

One of "javascript," "regex," or "xpath"

...