Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This brings up username and password fields and a login button. (Unless already logged in, eg into the manager or main GUI - in which case skip to the next section).

Image RemovedNote that:

  • The plugin manager shares its cookie with the main GUI, the file uploader, the source builder, and the person manager - logging into any of them will log into all of them.

Scheduling a new Map Reduce Plugin

Image RemovedImage Added

The above figure shows the tool shortly after log-in.

...

The "query" field has a few noteworthy points:

  • It can be provided in 2 formats:
  • A single JSON object containing a MongoDB query (see next bullet), eg "{ 'docGeo': { '$exists': false } }"
  • An array of 2 JSON objects, the first of which is the above query object, the second of which controls the sorting and size of the output (see this tutorial for the format, under advanced topics, or the schedule/update API calls).  
  • An array of 3 JSON objects (the second can be null), where the third object is a list of fields to be returned (in standard MongoDB "projection" format, eg "{ <fieldname>:[1|0] (, <fieldname>:[1|0])* }" Some additional control fields can be supplied, as described here.
    • The most common control fields can be specifically added with their default value by pressing the "Add Options" button
  • The aforementioned query must be a MongoDB query (use the /wiki/spaces/INF/pages/3899780, or content format), Infinit.e queries are not currently supported (this functionality is coming).
  • Press the "Check" button next to the "Query" field to validate the query JSON.

Note: If you set append results to false, there is no need to set an age out.

...

Note: the "user arguments" field can be any string, it is interpreted by the code in the Hadoop JAR. For custom plugin developers: see this tutorial for a description of how to incorporate user arguments in the code (under advanced topics). Since the user arguments will normally be JSON or javascript (see info box below), a "Check" button has been provided that will validate either of those 2 formats.

Info

In particular, the built-in "HadoopJavascriptTemplate" template job uses the "user arguments" to hold the javascript code that gets executed in Hadoop.

Note: You can temporarily remove a combiner or reducer by putting "#" in front of it. Only the mapper is mandatory (others can be set to "none"), though normally at least the mapper and reducer are set.

Submit vs Quickrun

There are 2 options for submitting a job:

  • Submit (button to the right of "Title")
  • QuickRun (button to the right of "Frequency")

Submit will save the task (or update it if it already exists). If the frequency is not "Never" and the "Next Scheduled Time" is now or in the past, then job is immediately scheduled. The page refreshes immediately (unlike "QuickRun" below) and the progress can be monitored as described under "Following a job's progress" below.

"QuickRun" will set the frequency to "Once Only" and the time to "ASAP" (as soon as possible) and then will do 2 things:

  • Submit as above 
  • It will wait for the job to complete before refreshing the page (all the "action" buttons are disabled in the meantime). You can't see the progress (see below) in the meantime, so this is best used on smaller jobs.

You can also debug tasks, this is described in the next section.

Debugging new and existing tasks

Just above "user arguments" there is a "Save and debug" button. This is very similar to pressing the "QuickRun" button described above, except:

  • It will only run on the number of records specified in the text box next to the button.
  • It will always run the Hadoop JAR in "local mode" (ie it won't be distributed to the Hadoop cluster, if one exists)
  • Any log messages output by the Hadoop JAR (or in the javascript if running the prototype engine) are collected and output to the status message
    • (Note that if running in local mode, then "QuickRun" will log error messages - nothing will be currently logged in the typical cluster mode though, so the debug mode is necessary in this case - the alternative is running and testing in eclipse as described here, which is quite involved)

Scheduling a new Saved Query

...

Once a file has been chosen, it can be modified by changing the fields (and/or choosing a different file), and then selecting the "Submit" button.

Copying an existing custom task

Select the task to be copied from the top drop down menu, then select "Copy Current Plugin". You must change the title, edit other fields are required.

Deleting files

Log in and choose a file as above, then select the "Delete" button.

...

If you want to start a job you can select for the option "Next scheduled time" to Once Only and it will be scheduled as soon as possible and only run once.  If you want to schedule a job on a certain frequency you can adjust the frequency option to one of the other settings.

Following a

...

job's progress

Once a job has been scheduled you will be able to track its progress by refreshing next to the run status, the current map and reduce completion status as well as any errors that may have occurred when running will be displayed in an informational header.

...