Page Comparison

The Plugin Manager is displayed when clicking on Source editor from the Manager interface.

Description:

Use the Plugin Manager as a simple web-based user interface for uploading new and updated map reduce MapReduce plugins and saved queries to the system, and for sharing them across the different communities.

The "query" field has a few noteworthy points:

Some additional control fields can be supplied, as described here.

The most common control fields can be specifically added with their default value by pressing the "Add Options" button

The aforementioned query must be a one of: MongoDB query (use the /wiki/spaces/INF/pages/3899780

Field

Description

Notes

Upload New PluginActions dropdown

Upload, Copy or Edit.

Title

Title of the MapReduce plugin or query.

Next scheduled time

Frequency

Input collection

Query

Next scheduled time

Time when the MapReduce plugin will be scheduled to run.

The time you want a job to be run after in long form. For example if you want it to run immediately when possible you can submit 0. If you want the job to run after January 1, 2015 submit: 1420106400000.

Frequency

Frequency at which the MapReduce job will run against the data sets in the communities.

How often the job should be ran, either: NONE, HOURLY, DAILY, WEEKLY, MONTHLY. This will cause the job to get resubmitted after running, use NONE if you only want the job to run once.

Input collection

The mongo collection you want to use as input. You can submit DOC_METADATA to get the documents metadata, DOC_CONTENT to get the document contents, or grab a previous map reduce jobs results table in your communities by submitting its id or title (must be a member of that community).

From March 2014 this can also be "filesystem", which can read files directly from HDFS. This is discussed further under Advanced Topics in the Hadoop Plugin Guide.

true/false

TODO needs docs.

Query

Field used to specify the query when you are using the Plugin Manager to specify a query rather than scheduling a MapReduce plugin job.

A query must use the following format:

MongoDB query (use the document data model, or content format), or (from March 2014)
- Only indexed fields should be used in the query, this is discussed further here: Hadoop Plugin Guide
- Infinit.e Community Edition query JSON objects (note this will usually be slower than indexed MongoDB queries so should be used with care)
Press the "Check" button next to the "Query" field to validate the query JSON.
(From March 2014) You can paste a saved workspace link into the query field (eg from the GUI) instead of typing out the JSON to generate an Infinit.e style query. You can include JSON below that to add query qualifiers, as described here.

Info

Note that MongoDB uses some JSON extensions that must be used in queries from the command line:

When querying the ObjectId type (eg "_id"), it should be queried as the object '{ "$oid": "<object id string>"' }'
When querying a Date type, it should be queried as the object '{ "$date": "<date in Java time, ie milliseconds since 01 Jan 1970" }'
The full list is here: http://docs.mongodb.org/manual/reference/mongodb-extended-json/

Communities

Communities to which the plugin or query will be applied.

Mapper Mapper Class

The java classpath to the jobs mapper, it should be in the form of package.file$class

Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

Combiner Class

The java classpath to the jobs combiner, it should be in the form of package.file$class (use the reducer if you have not written a combiner or submit null). If not present, then only the mapper (or combiner) is run, and records with duplicate keys will overwrite each other in an arbitrary order.

Reducer Class

The java classpath to the jobs reducer, it should be in the form of package.file$class

Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

Output key Class

The classpath for the map reduce output format key usually org.apache.hadoop.io.Text

Output Value Class

The classpath for the map reduce output format value usually org.apache.hadoop.io.IntWritable

Export to HDFS

Append Results

If you set append results to false, there is no need to set an age out.

Job dependencies

If you don't want your job to depend on another jobs completion, do not select any job dependencies (you can CTRL-click to remove selected options if necessary).

JAR file

Upload JAR file from local machine/network file system.

User argumentsthe "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats.Arguments

The "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats.

Info
In particular, the built-in "HadoopJavascriptTemplate" template job uses the "user arguments" to hold the javascript code that gets executed in Hadoop.

Submit

Submit will save the task (or update it if it already exists). If the frequency is not "Never" and the "Next Scheduled Time" is now or in the past, then job is immediately scheduled. The page refreshes immediately (unlike "QuickRun" below) and the progress can be monitored as described under "Following a job's progress" below.

QuickRun

"QuickRun" will set the frequency to "Once Only" and the time to "ASAP" (as soon as possible) and then will do 2 things:

Submit as above
It will wait for the job to complete before refreshing the page (all the "action" buttons are disabled in the meantime). You can't see the progress (see below) in the meantime, so this is best used on smaller jobs.

Check

Button that can be used to check the validity of JSON pasted into the query field.

Add Options

Button that is used to add most typical options set to the query field.

Save and Debug

When running in typical cluster mode, this button enables you to run Jars locally only for testing purposes. Logs are collected and outputted.

Export to HDFS

When set to yes, enables you to backup the platform data.

If HDFS is not installed this data dump goes to ~tomcat/completed/<communityid>_/<jobtitle>).

Append Results

If set to yes, appends the results of the job to the HDFS output file.

Job dependencies

Lists Jar dependencies for the scheduled job.

Panel
Related

Procedural

User Documentation:

Plugin Manager

Panel

In this section:

toc

Versions Compared

Old Version 4

New Version Current

Key