Plugin Manager Interface

The Plugin Manager is displayed when clicking on Source editor from the Manager interface.

 

 

Description:

Use the Plugin Manager as a user interface for uploading new and updated MapReduce plugins and saved queries to the system, and for sharing them across the different communities.

 

FieldDescriptionNotes
Actions dropdownUpload, Copy or Edit. 
TitleTitle of the MapReduce plugin or query. 
Next scheduled time

Time when the MapReduce plugin will be scheduled to run.

The time you want a job to be run after in long form. For example if you want it to run immediately when possible you can submit 0. If you want the job to run after January 1, 2015 submit: 1420106400000.

 
Frequency

Frequency at which the MapReduce job will run against the data sets in the communities.

How often the job should be ran, either: NONE, HOURLY, DAILY, WEEKLY, MONTHLY. This will cause the job to get resubmitted after running, use NONE if you only want the job to run once.

 
Input collection

The mongo collection you want to use as input. You can submit DOC_METADATA to get the documents metadata, DOC_CONTENT to get the document contents, or grab a previous map reduce jobs results table in your communities by submitting its id or title (must be a member of that community).

From March 2014 this can also be "filesystem", which can read files directly from HDFS. This is discussed further under Advanced Topics in the Hadoop Plugin Guide.

 
true/falseTODO needs docs. 
Query

Field used to specify the query when you are using the Plugin Manager to specify a query rather than scheduling a MapReduce plugin job.

A query must use the following format: 

  • MongoDB query (use the document data model, or content format), or (from March 2014) 
    • Only indexed fields should be used in the query, this is discussed further here: Hadoop Plugin Guide
    • Community Edition query JSON objects (note this will usually be slower than indexed MongoDB queries so should be used with care)

Note that MongoDB uses some JSON extensions that must be used in queries from the command line:

  • When querying the ObjectId type (eg "_id"), it should be queried as the object '{ "$oid": "<object id string>"' }'
  • When querying a Date type, it should be queried as the object '{ "$date": "<date in Java time, ie milliseconds since 01 Jan 1970" }'
  • The full list is here: http://docs.mongodb.org/manual/reference/mongodb-extended-json/
 
 CommunitiesCommunities to which the plugin or query will be applied. 
Mapper Class

The java classpath to the jobs mapper, it should be in the form of package.file$class 

Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

 
Combiner Class The java classpath to the jobs combiner, it should be in the form of package.file$class (use the reducer if you have not written a combiner or submit null). If not present, then only the mapper (or combiner) is run, and records with duplicate keys will overwrite each other in an arbitrary order.  
Reducer Class

The java classpath to the jobs reducer, it should be in the form of package.file$class

Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

 
Output key Class

The classpath for the map reduce output format key usually org.apache.hadoop.io.Text

 
Output Value ClassThe classpath for the map reduce output format value usually org.apache.hadoop.io.IntWritable 
Append ResultsIf you set append results to false, there is no need to set an age out. 
Job dependenciesIf you don't want your job to depend on another jobs completion, do not select any job dependencies (you can CTRL-click to remove selected options if necessary). 
JAR fileUpload JAR file from local machine/network file system. 
User Arguments

The "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats.

In particular, the built-in "HadoopJavascriptTemplate" template job uses the "user arguments" to hold the javascript code that gets executed in Hadoop.

 
SubmitSubmit will save the task (or update it if it already exists). If the frequency is not "Never" and the "Next Scheduled Time" is now or in the past, then job is immediately scheduled. The page refreshes immediately and the progress can be monitored as described under "Following a job's progress" below. 
QuickRun

"QuickRun" will set the frequency to "Once Only" and the time to "ASAP" (as soon as possible) and then will do 2 things:

  • Submit as above 
  • It will wait for the job to complete before refreshing the page (all the "action" buttons are disabled in the meantime). You can't see the progress (see below) in the meantime, so this is best used on smaller jobs.
 
CheckButton that can be used to check the validity of JSON pasted into the query field. 
Add OptionsButton that is used to add most typical options set to the query field. 
Save and DebugWhen running in typical cluster mode, this button enables you to run Jars locally only for testing purposes. Logs are collected and outputted. 
Export to HDFS

When set to yes, enables you to backup the platform data.

If HDFS is not installed this data dump goes to ~tomcat/completed/<communityid>_/<jobtitle>).

 
Append ResultsIf set to yes, appends the results of the job to the HDFS output file. 
Job dependenciesLists Jar dependencies for the scheduled job. 

 

 


 

Related User Documentation:

Plugin Manager