Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

How to Use the Widget (User):

1A. Simple Mode

TODO

2A. Advanced Mode

TODO

Publish View/Summary View

TODO

 

How to Add Event Job Templates (Developer):

The event job creation widget works similar to the templated source widget in that a JSON representation of a set of sources is uploaded to the cluster via the file manager, that representation then shows up in the widget and will have some fields replaced, and be published to the source API.

 

Convert Existing/New Map Reduce Jobs to the source editor version

The first step in creating a template is to convert an existing map reduce job into the new source editor format.  More information on doing that can be found here: TODO find documentation page, basically just use the new source UI and only grab the custom processing sections

 

Create a JSON file following the event job creation format

The basic JSON format for event job templates looks like this:

{
   "jobs":[], //1 or more map reduce jobs in their source editor format
   "source:{} //1 final source for converting map reduce output to IKANOW documents
}

The jobs sections takes an array of "custom processing" sources.  They will be created in the order they exist in the array (e.g. jobs[0] will be submitted first, then jobs[1], etc, etc.

The source section takes a single "source".  It will always be submitted last.

In addition to just being plain old sources, each item in the "jobs" array must have 1 additional field named "templateName":"someName" for example:

{
   "jobs":[{
     "templateName":"part1",
     "extractType":"Custom",
     "processingPipeline":[//bunch of source pieces in here],
     //rest of source fields etc etc
   },{//job1+, each with a unique templateName}],
   "source":{//source json here}

Template name must have no space (must be a valid map reduce job name).  It will be used as a substitution variable shown in the next section.

Replace input/outputs of map reduce jobs with substitution variables

Next, we won't know the input/output collections ahead of time so you'll need to replace the input/output sections of your new source with these substitution variables:

Possible substitution variables:
$$TEMPLATE_TITLE$$ - Returns the name of the template which is the title of the file share the template was uploaded with
$$INPUT_COMM$$ - Returns the id of the selected data community from the widget
$$OUTPUT_COMM$$ - Returns the id of the selected event community from the widget
$$XXX_ID$$
    - Where XXX is the name of a job specified in the template via the templateName field, returns the id of that map reduce job
$$XXX_NAME$$
    - Where XXX is the name of a job specified in the template via the templateName field, returns the name of that map reduce job

 

Common use cases for the replacements include:

jobs[1] i.e. the 2nd job wants the firsts jobs input (jobs[0]), you can use $$XXX_ID$$ to get the id of the first job.

//previous fields of jobs[1]
"processingPipeline" : [{
    "custom_datastoreQuery" : {
         "customTable" : "$$part1_ID$$" //if jobs[0].templateName is "part1" this will replace this field with it's map reduce job id
    },
    "display" : ""
},
//rest of processing pipeline

jobs[1] i.e. the 2nd job wants to set jobs[0] as a dependency (so it waits for jobs[0] to finish before starting), you can use $$XXX_ID$$ to get the id of the first job

//previous fields of jobs[1]
"processingPipeline" : [
	//previous pipeline elements
	{
        "display" : "",
        "scheduler" : {
             "dependencies" : [
                  "$$part2_ID$$" //if jobs[1].templateName is "part2" this will replace this field with it's map reduce job id
             ],
             "frequency" : "once_only",
             "runDate" : "2015-06-09 11:07:57"
         }
     },
  //rest of pipeline elements

source wants the last map reduce jobs output as it's input

//previous fields of source
"processingPipeline" : [{
     "file" : {
          "XmlPreserveCase" : false,
          "type" : "json",
          "url" : "inf://custom/$$part2_ID$$/" //if jobs[1].templateName is "part2" this will replace this field with it's map reduce job id
     }
},
//rest of pipeline elements

XXX_NAME is not currently mapped correctly from the source editor to the map reduce engine, it is recommended you use XXX_ID always instead of it

$$INPUT_COMM$$ and $$OUTPUT_COMM$$ are used during source submission.  All mapreduce jobs are submitted against $$INPUT_COMM$$ and the source job is submitted against $$OUTPUT_COMM$$

Submit JSON file to file uploader

Once you have a JSON file with your map reduce jobs, source job, and all the correct substitutions made, you can submit it to the file uploader as a JSON file with the type set to "templated_event_sources"

 

 

Test in event job creation widget

 

If you open/close the event job creation widget, you're newly submitted job should show up, you can use the advanced settings to only submit that job to test if it works.  Validate once you have the sources submitting correctly that the map reduce jobs actually run and the source job creates documents correctly.  You can make changes to your JSON file at any time and upload over your previous version, all submitted future jobs will use this format, old jobs will have to be manually deleted.

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.