Plugin Manager Interface
- andrew johnston (Unlicensed)
The Plugin Manager is displayed when clicking on Source editor from the Manager interface.
Description:
Use the Plugin Manager as a user interface for uploading new and updated MapReduce plugins and saved queries to the system, and for sharing them across the different communities.
Field | Description | Notes |
---|---|---|
Actions dropdown | Upload, Copy or Edit. | |
Title | Title of the MapReduce plugin or query. | |
Next scheduled time | Time when the MapReduce plugin will be scheduled to run. The time you want a job to be run after in long form. For example if you want it to run immediately when possible you can submit 0. If you want the job to run after January 1, 2015 submit: 1420106400000. | |
Frequency | Frequency at which the MapReduce job will run against the data sets in the communities. How often the job should be ran, either: NONE, HOURLY, DAILY, WEEKLY, MONTHLY. This will cause the job to get resubmitted after running, use NONE if you only want the job to run once. | |
Input collection | The mongo collection you want to use as input. You can submit DOC_METADATA to get the documents metadata, DOC_CONTENT to get the document contents, or grab a previous map reduce jobs results table in your communities by submitting its id or title (must be a member of that community). From March 2014 this can also be "filesystem", which can read files directly from HDFS. This is discussed further under Advanced Topics in the Hadoop Plugin Guide. | |
true/false | TODO needs docs. | |
Query | Field used to specify the query when you are using the Plugin Manager to specify a query rather than scheduling a MapReduce plugin job. A query must use the following format:
Note that MongoDB uses some JSON extensions that must be used in queries from the command line:
| |
Communities | Communities to which the plugin or query will be applied. | |
Mapper Class | The java classpath to the jobs mapper, it should be in the form of package.file$class Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set. | |
Combiner Class | The java classpath to the jobs combiner, it should be in the form of package.file$class (use the reducer if you have not written a combiner or submit null). If not present, then only the mapper (or combiner) is run, and records with duplicate keys will overwrite each other in an arbitrary order. | |
Reducer Class | The java classpath to the jobs reducer, it should be in the form of package.file$class Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set. | |
Output key Class | The classpath for the map reduce output format key usually org.apache.hadoop.io.Text | |
Output Value Class | The classpath for the map reduce output format value usually org.apache.hadoop.io.IntWritable | |
Append Results | If you set append results to false, there is no need to set an age out. | |
Job dependencies | If you don't want your job to depend on another jobs completion, do not select any job dependencies (you can CTRL-click to remove selected options if necessary). | |
JAR file | Upload JAR file from local machine/network file system. | |
User Arguments | The "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats. In particular, the built-in "HadoopJavascriptTemplate" template job uses the "user arguments" to hold the javascript code that gets executed in Hadoop. | |
Submit | Submit will save the task (or update it if it already exists). If the frequency is not "Never" and the "Next Scheduled Time" is now or in the past, then job is immediately scheduled. The page refreshes immediately and the progress can be monitored as described under "Following a job's progress" below. | |
QuickRun | "QuickRun" will set the frequency to "Once Only" and the time to "ASAP" (as soon as possible) and then will do 2 things:
| |
Check | Button that can be used to check the validity of JSON pasted into the query field. | |
Add Options | Button that is used to add most typical options set to the query field. | |
Save and Debug | When running in typical cluster mode, this button enables you to run Jars locally only for testing purposes. Logs are collected and outputted. | |
Export to HDFS | When set to yes, enables you to backup the platform data. If HDFS is not installed this data dump goes to ~tomcat/completed/<communityid>_/<jobtitle>). | |
Append Results | If set to yes, appends the results of the job to the HDFS output file. | |
Job dependencies | Lists Jar dependencies for the scheduled job. |