Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Plugin Manager is displayed when clicking on Source editor from the Manager interface.

 

 

Description:

Use the Plugin Manager as a simple web-based user interface for uploading new and updated map reduce MapReduce plugins and saved queries to the system, and for sharing them across the different communities.

 

The "query" field has a few noteworthy points:

  • Some additional control fields can be supplied, as described here.
    • The most common control fields can be specifically added with their default value by pressing the "Add Options" button
  • The aforementioned query must be a one of: MongoDB query (use the /wiki/spaces/INF/pages/3899780
    FieldDescriptionNotes
    Upload New PluginActions dropdownUpload, Copy or Edit.  
    TitleTitle of the MapReduce plugin or query.  
    Next scheduled time  
    Frequency  
    Input collection  
       
    QueryNext scheduled time

    Time when the MapReduce plugin will be scheduled to run.

    The time you want a job to be run after in long form. For example if you want it to run immediately when possible you can submit 0. If you want the job to run after January 1, 2015 submit: 1420106400000.

     
    Frequency

    Frequency at which the MapReduce job will run against the data sets in the communities.

    How often the job should be ran, either: NONE, HOURLY, DAILY, WEEKLY, MONTHLY. This will cause the job to get resubmitted after running, use NONE if you only want the job to run once.

     
    Input collection

    The mongo collection you want to use as input. You can submit DOC_METADATA to get the documents metadata, DOC_CONTENT to get the document contents, or grab a previous map reduce jobs results table in your communities by submitting its id or title (must be a member of that community).

    From March 2014 this can also be "filesystem", which can read files directly from HDFS. This is discussed further under Advanced Topics in the Hadoop Plugin Guide.

     
    true/falseTODO needs docs. 
    Query

    Field used to specify the query when you are using the Plugin Manager to specify a query rather than scheduling a MapReduce plugin job.

    A query must use the following format: 

    • MongoDB query (use the document data model, or content format), or (from March 2014) 
      • Only indexed fields should be used in the query, this is discussed further here: Hadoop Plugin Guide
      • Infinit.e Community Edition query JSON objects (note this will usually be slower than indexed MongoDB queries so should be used with care)
    • Press the "Check" button next to the "Query" field to validate the query JSON.
    • (From March 2014) You can paste a saved workspace link into the query field (eg from the GUI) instead of typing out the JSON to generate an Infinit.e style query. You can include JSON below that to add query qualifiers, as described here.
    Info

    Note that MongoDB uses some JSON extensions that must be used in queries from the command line:

    • When querying the ObjectId type (eg "_id"), it should be queried as the object '{ "$oid": "<object id string>"' }'
    • When querying a Date type, it should be queried as the object '{ "$date": "<date in Java time, ie milliseconds since 01 Jan 1970" }'
    • The full list is here: http://docs.mongodb.org/manual/reference/mongodb-extended-json/
     
     Communities Communities to which the plugin or query will be applied. 
     Mapper Mapper Class

    The java classpath to the jobs mapper, it should be in the form of package.file$class 

    Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

     
    Combiner Class The java classpath to the jobs combiner, it should be in the form of package.file$class (use the reducer if you have not written a combiner or submit null). If not present, then only the mapper (or combiner) is run, and records with duplicate keys will overwrite each other in an arbitrary order.  
    Reducer Class

    The java classpath to the jobs reducer, it should be in the form of package.file$class

    Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

     
    Output key Class 

    The classpath for the map reduce output format key usually org.apache.hadoop.io.Text

     
    Output Value Class The classpath for the map reduce output format value usually org.apache.hadoop.io.IntWritable 
    Export to HDFS  
       
    Append ResultsIf you set append results to false, there is no need to set an age out. 
    Job dependenciesIf you don't want your job to depend on another jobs completion, do not select any job dependencies (you can CTRL-click to remove selected options if necessary). 
    JAR file Upload JAR file from local machine/network file system. 
    User argumentsthe "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats.Arguments

    The "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats.

    Info

    In particular, the built-in "HadoopJavascriptTemplate" template job uses the "user arguments" to hold the javascript code that gets executed in Hadoop.

     
    SubmitSubmit will save the task (or update it if it already exists). If the frequency is not "Never" and the "Next Scheduled Time" is now or in the past, then job is immediately scheduled. The page refreshes immediately (unlike "QuickRun" below) and the progress can be monitored as described under "Following a job's progress" below. 
    QuickRun

    "QuickRun" will set the frequency to "Once Only" and the time to "ASAP" (as soon as possible) and then will do 2 things:

    • Submit as above 
    • It will wait for the job to complete before refreshing the page (all the "action" buttons are disabled in the meantime). You can't see the progress (see below) in the meantime, so this is best used on smaller jobs.
     
    Check Button that can be used to check the validity of JSON pasted into the query field. 
    Add Options Button that is used to add most typical options set to the query field. 
    Save and DebugWhen running in typical cluster mode, this button enables you to run Jars locally only for testing purposes. Logs are collected and outputted. 
    Export to HDFS

    When set to yes, enables you to backup the platform data.

    If HDFS is not installed this data dump goes to ~tomcat/completed/<communityid>_/<jobtitle>).

     
    Append ResultsIf set to yes, appends the results of the job to the HDFS output file. 
    Job dependenciesLists Jar dependencies for the scheduled job. 

     

     


     

    Panel

    Related

    Procedural

    User Documentation:

    Plugin Manager


     


     

     

    Panel

    In this section:

    toc