Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

Setting the URL to the above default is in many cases not desirable, since unlike title/description/fullText/displayUrl, the document "url" field cannot be changed (since it is used for deduplication).

Therefore there is a more complex syntax that enables the URL to be derived from one or more fields:

  • Simpler version: set "script" field to "<splitting-field>,<url-field>"
    • <splitting-field> is as described above
    • <url-field> takes the specified field from the JSON/XML/metadata and uses it for the URL
      • eg: "script": "fullText.object,url" would parse "<objects><object><meta>1</meta><url>http://blah1</url></object><object><meta>2</meta><url>http://blah2</url></object></objects>"
  • More complex version, set "script" field to "<splitting-field>,<url-string>,<url-field1>,<url-field2>,etc"
    • <url-string> is a string (no commas allowed) with substitutions for {01}, {12}, etc mapping to <url-field1>, <url-field2> etc (full format specification)
      • eg with the same XML fullText as the previous example then "script": "fullText.object,my_url_is_{01},url" would return "my_url_is_http://blah1" and "my_url_is_http://blah2" as the 2 URLs.

...

Code Block
languagejs
titleFull splitter example
////The source 
 
{
//...
	"processingPipeline": [
//...
		{
			"splitter": {
				"scriptlang": "automatic_json",
				"script": "fullText.object, http://test/{01}/{12}, url, meta"
			}
		}
//...
	]
//...
}
 
////Would map the extracted document
 
{
	"url": "blahurl",
	"title": "blah"
	"fullText": "<objects><object><meta>1</meta><url>blah1</url></object><object><meta>2</meta><url>blah2</url></object></objects>"
}
 
////to the 2 derived docs:
 
{
	"title": "blah (1)",
	"url": "http://test/blah1/1",
	"fullText": "<object><meta>1</meta><url>blah1</url></object>",
	"metadata: {
		"json": [ { "meta": "1", "url": "http://blah1" } ]
	}
},
{
	"title": "blah (2)",
	"url": "http://test/blah2/2",
	"fullText": "<object><meta>1</meta><url>blah1</url></object>",
	"metadata: {
		"json": [ { "meta": "2", "url": "http://blah2" } ]
	}
}
 
////Of course, subsequent pipeline elements can then manipulate/add fields other than "url" as per usual
 

...