Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Sample input document

Code Block
 Message-ID: <32220443.1075841552668.JavaMail.evans@thyme>
Date: Mon, 9 Jul 2001 11:33:32 -0700 (PDT)
Subject: RE: Testing Preschedule workspace
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-To: Smith, Will </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Wsmith>
X-Folder: \ExMerge - Semperger, Cara\Deleted Items
X-FileName: cara semperger 6-26-02.PST

I am trying to pull it up now, it's taking a long time

 -----Original Message-----
From: =09Semperger, Cara =20
Sent:=09Monday, July 09, 2001 10:40 AM
To:=09Smith, Will; Atta, Asem
Cc:=09Bentley, Corry; Poston, David
Subject:=09Testing Preschedule workspace

Good Morning,

My target testing date today is June 18th,  I am running in Test P in Local=
 Enpower using actual data from our scheduling sheets re-arranged to meet t=
he new guidelines.

The daily deals I coded X in columns J and N,  the Month long bookouts and =
BOM bookouts I coded R. =20

What worked:

I was able to retrieve my saved workspace with all data intact. I had previ=
ously sucessfully copied and pasted my entire sheet from EXCEL to the PSW.

I was able to run the build route report with the criteria of "Starting On-=
 June 18-PaloVerde-Day of week Mask Activated-Report Changes activated." A =
check of deals actually scheduled vs. build route results showed that all d=
eals were extracted correctly from Enpower. Because I am working on closed =
dates, a cumulative test of this app will not be fully testable until produ=
ction. We are expecting to see the same functionality as the current incarn=
ation of Build route. The data extracted should be read only, time stamped,=
 and when run mulitple times additional data should be shown below previous=
ly extracted data.  The improvement we are expecting to see is the app shou=
ld not duplicate deal strips on dates that have no physical power flow. (We=
st Light Load currently does this in Start view, but not Active view)

Navigating around the scheduling sheet itself I was able to accurately exec=
ute the sort function on a single criteria at a time. Multiple sorting will=
 contunue to be done in excel, or we can do a series of single sorts in the=
 PSW to acheive the same result.

Routing deals: Will had deleted all routes for June 18th, starting me with =
a clean slate.  I made every path be for DAY. I was unable to confirm total=
 unrouted MWH, as the real time position manager does not seem to be functi=
oning in TESTP. The routing appeared to take 19 minutes with the status bar=
 showing steady progress during that time. This time is 15-17 minutes longe=
r than current speed using the Excel Macro system we have now. The error li=
st gave me a row by row description of what did not route, a very useful to=
ol.  OK was visible on all rows that the PSW believed that it had routed. I=
 had difficulty checking the routing results, as I kept getting BDE errors =
in Scheduling after routing had occurred (Local Enpower). Scheduling kept s=
tarting up in 1899.  I was unable to login to TestP through Terminal server=
 2, but was able to in Terminal Server 5. The results there were very encou=
raging! Most routing was done, and a spot check of deals shows that they we=
re routed properly. The deals that were not routed appear to be due to a us=
er error of deal number duplication in the sheet. This is consistent with w=
hat I would expect. I will further evaluate routing ability with our more c=
omplicated paths later. This routing was very easy, a large point with on p=
eak non shaped deals only.

Things I did not expect that I liked:

When I highlight a group of cells in Build Route, it stays highlighted when=
 I move up to the scheduling sheet to highlight a comparison group of cells=
.  This is very handy for double checking Build route against the scheduler=
's sheet.

What does not appear to be working at this time:

The physical or not physical flag of path does not seem to be showing up pr=
operly in routing.

Path Confirmation:  The running time appeared to be over one hour for one s=
heet, only 70 rows of the sheet being flagged for insertion into confirmati=
on. This current speed will not be sufficient to work in production. Also, =
many rows that were flagged for confirmation were not imported, and I canno=
t find an error log to help understand why deals were not imported to path =
When the path confirmation task was finished,  the application simply froze=
.  The status bar was no longer visible, leading me to believe that it was =
done, however the app was not able to be saved or closed or examined furthe=

My conclusions:

The build route and routing functions work well enough to use in production=
, the copy-paste function works well for the West desk per our connectivity=

Path Confirmation is not functioning at this point, and appears to be blowi=
ng up the app. No data was visible for June 18th even after the PSW ran thr=
ough its import function.

Please let me know when the issues I have named have been addressed and are=
 ready for further testing.




Code Block
    "description": "All of the Enron emails corpus with TextRank keyword extraction enabled.",
    "extractTypeisPublic": "File"true,
    "filemediaType": "Email",
{    "searchCycle_secs": -1,
    "domaintags": "DOMAIN",[
        "passwordenron": "PASSWORD",
        "usernameemail": "USER",
       }, "fraud"
   "isPublic": true],
    "mediaTypetitle": "Email",All Enron Emails   "searchCycle_secs": -1(TextRank)",
    "structuredAnalysisprocessingPipeline": {[
[            "file": {
                "associationsdomain": ["DOMAIN",
                "password": "PASSWORD",
         {       "username": "USER",
                "assoc_typeurl": "Event",smb://modus:139/enron/enron_mail_20110402/maildir/"
   "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",  {
            "harvest": {
        "entity2": "$SCRIPT(return _value;)",      "searchCycle_secs": 1
    "iterateOver": "Message-To",   },
            "time_startdocMetadata": {
                "title": "$SCRIPT( return _doc.publishedDatemetadata._FILE_METADATA_[0].metadata.subject[0];)",
  "verb": "emailed",     {
               "text": [
   "verb_category": "emailed/communicated"            {
        }            "fieldName": "fullText",
   ],                 "iterateOverscript": "email_meta"(?:\\[.*?\\])",
            }         ],"scriptlang": "regex",
           "entities": [        "flags": "md",
   {                 "dimensionreplacement": "What ",
                "disambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",},
  "type": "Account",                 "useDocGeofieldName": false"description",
            },        "script": "(?:\\[.*?\\])",
   {                 "entitiesscriptlang": ["regex",
  "flags": "md",
                     "dimensionreplacement": "What", "
           "disambiguated_name": "",    {
                    "iterateOverfieldName": "Message-TofullText",
                    "typescript": "Account<.*?>",
                      "scriptlang": "regex",
 "useDocGeo": false                  "flags": "md",
 }                 ],  "replacement": ". "
            "iterateOver": "email_meta"   },
         }       {
 ],         "scriptEngine": "JavaScript",         "titlefieldName": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)""description",
     },     "tags": [         "enron",
        "emailscript": "<.*?>",
        "fraud"       ],     "titlescriptlang": "All Enron Emails (TextRank)regex",
    "unstructuredAnalysis": {         "meta": [         "flags": "md",
   {                 "contextreplacement": "All. ",
                 "fieldName": "email_meta",},
  "flags": "m",                 "scriptfieldName": "var x=_metadata._FILE_METADATA_[0].metadata;x;fullText",
                    "scriptlangscript": "javascript"(?:>|<)",
            }         ]"scriptlang": "regex",
        "simpleTextCleanser": [           "flags":  {"md",
                    "fieldreplacement": "fullText. ",
                "flags": "md",},
"replacement": " ",                  "scriptfieldName": "(?:\\[.*?\\])description",
                    "scriptlangscript": "regex"(?:>|<)",
            },        "scriptlang": "regex",
   {                 "fieldflags": "descriptionmd",
                    "flagsreplacement": "md. ",
"replacement": " ",              {
  "script": "(?:\\[.*?\\])",                 "scriptlangfieldName": "regexfullText",
            },        "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
   {                 "fieldscriptlang": "fullTextregex",
                    "flagsreplacement": "md ",
                "replacement": ". ",},
  "script": "<.*?>",                 "scriptlangfieldName": "regexdescription",
            },        "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
   {                 "fieldscriptlang": "descriptionregex",
         "flags": "md",                 "replacement": ". ",
                "script": "<.*?>"},
                "scriptlang": "regex"{
             },         "fieldName": "fullText",
   {                 "fieldscript": "fullText(?:\\*{2,})",
                    "flagsscriptlang": "mdregex",
                    "replacement": ". ",
                "script": "(?:>|<)"},
                "scriptlang": "regex"{
             },         "fieldName": "description",
   {                 "fieldscript": "description(?:\\*{2,})",
                    "flagsscriptlang": "mdregex",
                    "replacement": ". ",
                "script": "(?:>|<)",}
    "scriptlang": "regex"   },
},            "contentMetadata": {[
                "field": "fullText",{
                    "replacementfieldName": " email_meta",
                    "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",var x=_metadata._FILE_METADATA_[0].metadata;x;",
                    "scriptlang": "regexjavascript",
               },     "flags": "m"
      {          }
      "field": "description",     ]
  "replacement": " ",    {
            "scriptfeatureEngine": "(?:[-]{4,}(.*[-]{4,}|\\n))",
                "scriptlangengineName": "regextextrank"
    {    {
            "fieldentities": "fullText",[
 "replacement": " ",                 "scriptdimension": "(?:\\*{2,})What",
                    "scriptlangdisambiguated_name": "regex"$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
            },        "type": "Account",
   {                 "fielduseDocGeo": false
"description",                },
"replacement": " ",              {
  "script": "(?:\\*{2,})",                 "scriptlangdimension": "regexWhat",
             }       "disambiguated_name": "",
]     },     "url          "iterateOver": "smb://modus:139/enron/enron_mail_20110402/maildir/email_meta.Message-To",
    "useExtractor": "textrank",         "useTextExtractor": "none" }

Sample output document

Code Block
 {     "_idtype": "5048efb0e4b01fd6455420eeAccount",
     "title": "RE: Testing Preschedule workspace",     "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3",     "createduseDocGeo": false
"Sep 6, 2012 06:42:01 PM UTC",     "modified": "Jul 24, 2012 01:13:02 AM UTC",      }
          "publishedDate": "Jul 9,]
2001 06:33:32 PM UTC",     "source": [},
"Enron Emails (TextRank)"     ],     "sourceKeyassociations": [

       "modus.139.enron.testing.."     ],    {
"mediaType": [         "Email"     ],     "descriptionassoc_type": "Event"I,
am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ",
    "entities": [     "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
                    "entity2": "$SCRIPT(return _value;)",
                    "iterateOver": "email_meta.Message-To",
                    "time_start": "$SCRIPT( return _doc.publishedDate;)",
                    "verb": "emailed",
                    "verb_category": "emailed/communicated"

Sample output document

Code Block
    "_id": "5048efb0e4b01fd6455420ee",
    "title": "RE: Testing Preschedule workspace",
    "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3",
    "created": "Sep 6, 2012 06:42:01 PM UTC",
    "modified": "Jul 24, 2012 01:13:02 AM UTC",
    "publishedDate": "Jul 9, 2001 06:33:32 PM UTC",
    "source": [
        "Enron Emails (TextRank)"
    "sourceKey": [
    "mediaType": [
    "description": "I am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ",
    "entities": [
            "disambiguated_name": "on- june 18-paloverde-day",
            "index": "on- june 18-paloverde-day/keyword",
            "actual_name": "on- june 18-paloverde-day",
            "type": "Keyword",
            "relevance": 0.10585404743253149,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
            "disambiguated_name": "mulitple times additional data",
            "index": "mulitple times additional data/keyword",
            "actual_name": "mulitple times additional data",
            "type": "Keyword",
            "relevance": 0.18088061045762382,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
            "disambiguated_name": "scheduling sheets",
            "index": "scheduling sheets/keyword",
            "actual_name": "scheduling sheets",
            "type": "Keyword",
            "relevance": 0.15086086188384693,
            "frequency": 1,
            "totalfrequency": 20,
            "doccount": 20,
            "dimension": "What"
            "disambiguated_name": "app",
            "index": "app/keyword",
            "actual_name": "app",
            "type": "Keyword",
            "relevance": 0.20415634782171557,
            "frequency": 1,
            "totalfrequency": 58,
            "doccount": 58,
            "dimension": "What"
            "disambiguated_name": "data",
            "index": "data/keyword",
            "actual_name": "data",
            "type": "Keyword",
            "relevance": 0.1361375118885727,
            "frequency": 1,
            "totalfrequency": 3323,
            "doccount": 3323,
            "dimension": "What"
            "disambiguated_name": "paths",
            "index": "paths/keyword",
            "actual_name": "paths",
            "type": "Keyword",
            "relevance": 0.2041916488834702,
            "frequency": 1,
            "totalfrequency": 99,
            "doccount": 99,
            "dimension": "What"
            "disambiguated_name": "build route report",
            "index": "build route report/keyword",
            "actual_name": "build route report",
            "type": "Keyword",
            "relevance": 0.11476307758997932,
            "frequency": 1,
            "totalfrequency": 36,
            "doccount": 36,
            "dimension": "What"
            "disambiguated_name": "testing preschedule workspace cara",
            "index": "testing preschedule workspace cara/keyword",
            "actual_name": "testing preschedule workspace cara",
            "type": "Keyword",
            "relevance": 0.16803833041631702,
            "frequency": 1,
            "totalfrequency": 8,
            "doccount": 8,
            "dimension": "What"
            "disambiguated_name": "physical power flow",
            "index": "physical power flow/keyword",
            "actual_name": "physical power flow",
            "type": "Keyword",
            "relevance": 0.11805512187037151,
            "frequency": 1,
            "totalfrequency": 17,
            "doccount": 17,
            "dimension": "What"
            "disambiguated_name": "i",
            "index": "i/keyword",
            "actual_name": "i",
            "type": "Keyword",
            "relevance": 0.13651904141534263,
            "frequency": 1,
            "totalfrequency": 18162,
            "doccount": 18162,
            "dimension": "What"
            "disambiguated_name": "total running time",
            "index": "total running time/keyword",
            "actual_name": "total running time",
            "type": "Keyword",
            "relevance": 0.11233232851584997,
            "frequency": 1,
            "totalfrequency": 10,
            "doccount": 10,
            "dimension": "What"
            "disambiguated_name": "time",
            "index": "time/keyword",
            "actual_name": "time",
            "type": "Keyword",
            "relevance": 0.34020922533185516,
            "frequency": 1,
            "totalfrequency": 17102,
            "doccount": 17102,
            "dimension": "What"
            "disambiguated_name": "psw",
            "index": "psw/keyword",
            "actual_name": "psw",
            "type": "Keyword",
            "relevance": 0.13625985262266815,
            "frequency": 1,
            "totalfrequency": 46,
            "doccount": 46,
            "dimension": "What"
            "disambiguated_name": "semperger",
            "index": "semperger/keyword",
            "actual_name": "semperger",
            "type": "Keyword",
            "relevance": 0.2724417241053495,
            "frequency": 1,
            "totalfrequency": 226,
            "doccount": 226,
            "dimension": "What"
            "disambiguated_name": "peak non shaped deals",
            "index": "peak non shaped deals/keyword",
            "actual_name": "peak non shaped deals",
            "type": "Keyword",
            "relevance": 0.19127581970645322,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
            "disambiguated_name": "on- june 18-paloverde-daytable edit",
            "index": "on- june 18-paloverde-daytable edit/keyword",
            "actual_name": "on- june 18-paloverde-daytable edit",
            "type": "Keyword",
            "relevance": 0.1058540474325314921207334129182112,
            "frequency": 1,
            "totalfrequency": 1232,
            "doccount": 1232,
            "dimension": "What"
            "disambiguated_name": "mulitpleweek timesmask additionalactivated-report datachanges",
            "index": "mulitpleweek timesmask additionalactivated-report datachanges/keyword",
            "actual_name": "mulitpleweek timesmask additionalactivated-report datachanges",
            "type": "Keyword",
            "relevance": 0.180880610457623821484580867667756,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
            "disambiguated_name": "schedulingexcel macro sheetssystem",
            "index": "schedulingexcel macro sheetssystem/keyword",
            "actual_name": "schedulingexcel macro sheetssystem",
            "type": "Keyword",
            "relevance": 0.1508608618838469312208201691477336,
            "frequency": 1,
            "totalfrequency": 2012,
            "doccount": 2012,
            "dimension": "What"
            "disambiguated_name": "appreal time position manager",
            "index": "appreal time position manager/keyword",
            "actual_name": "appreal time position manager",
            "type": "Keyword",
            "relevance": 0.2041563478217155719213464212989614,
            "frequency": 1,
            "totalfrequency": 5839,
            "doccount": 5839,
            "dimension": "What"
            "disambiguated_name": "datatesting preschedule workspace",
            "index": "datatesting preschedule workspace/keyword",
            "actual_name": "datatesting preschedule workspace",
            "type": "Keyword",
            "relevance": 0.136137511888572717652180791002264,
            "frequency": 1,
            "totalfrequency": 332312,
            "doccount": 332312,
            "dimension": "What"
            "disambiguated_name": "pathscara",
            "index": "pathscara/keyword",
            "actual_name": "pathscara",
            "type": "Keyword",
            "relevance": 0.204191648883470220414801224595303,
            "frequency": 1,
            "totalfrequency": 99736,
            "doccount": 99736,
            "dimension": "What"
            "disambiguated_name": "build route reportsmith",
            "index": "build route reportsmith/keyword",
            "actual_name": "build route reportsmith",
            "type": "Keyword",
            "relevance": 0.1147630775899793227217844252943296,
            "frequency": 1,
            "totalfrequency": 36783,
            "doccount": 36783,
            "dimension": "What"
            "disambiguated_name": "testing preschedule workspace caradavid subject",
            "index": "testing preschedule workspace caradavid subject/keyword",
            "actual_name": "testing preschedule workspace caradavid subject",
            "type": "Keyword",
            "relevance": 0.1680383304163170215139765579194864,
            "frequency": 1,
            "totalfrequency": 8930,
            "doccount": 8930,
            "dimension": "What"
            "disambiguated_name": "physical power flowsheet",
            "index": "physical power flowsheet/keyword",
            "actual_name": "physical power flowsheet",
            "type": "Keyword",
            "relevance": 0.1180551218703715120416968108320477,
            "frequency": 1,
            "totalfrequency": 17436,
            "doccount": 17436,
            "dimension": "What"
            "disambiguated_name": "itotal unrouted mwh",
            "index": "itotal unrouted mwh/keyword",
            "actual_name": "itotal unrouted mwh",
            "type": "Keyword",
            "relevance": 0.136519041415342631141385057566826,
            "frequency": 1,
            "totalfrequency": 1816216,
            "doccount": 1816216,
            "dimension": "What"
            "disambiguated_name": "totaltarget testing runningdate timetoday",
            "index": "totaltarget testing runningdate timetoday/keyword",
            "actual_name": "totaltarget testing runningdate timetoday",
            "type": "Keyword",
            "relevance": 0.1123323285158499718726422286448255,
            "frequency": 1,
            "totalfrequency": 1012,
            "doccount": 1012,
            "dimension": "What"
            "disambiguated_name": "timedeals",
            "index": "timedeals/keyword",
            "actual_name": "timedeals",
            "type": "Keyword",
            "relevance": 0.3402092253318551634025706056156424,
            "frequency": 1,
            "totalfrequency": 171025740,
            "doccount": 171025261,
            "dimension": "What"
            "disambiguated_name": "psw"": "double checking build route",
            "index": "pswdouble checking build route/keyword",
            "actual_name": "pswdouble checking build route",
            "type": "Keyword",
            "relevance": 0.1362598526226681518886230001363824,
            "frequency": 1,
            "totalfrequency": 4612,
            "doccount": 4612,
            "dimension": "What"
            "disambiguated_name": "sempergerpath confirmation task",
            "index": "sempergerpath confirmation task/keyword",
            "actual_name": "sempergerpath confirmation task",
            "type": "Keyword",
            "relevance": 0.272441724105349512326679747563907,
            "frequency": 1,
            "totalfrequency": 22616,
            "doccount": 22616,
            "dimension": "What"
            "disambiguated_name": "peak non shaped dealsroutes",
            "index": "peak non shaped dealsroutes/keyword",
            "actual_name": "peak non shaped dealsroutes",
            "type": "Keyword",
            "relevance": 0.1912758197064532240825322818399834,
            "frequency": 1,
            "totalfrequency": 12142,
            "doccount": 12142,
            "dimension": "What"
            "disambiguated_name": "tablewest light editload",
            "index": "tablewest light editload/keyword",
            "actual_name": "tablewest light editload",
            "type": "Keyword",
            "relevance": 0.2120733412918211211288042191103252,
            "frequency": 1,
            "totalfrequency": 3216,
            "doccount": 3216,
            "dimension": "What"
            "disambiguated_name": "week mask activated-report changesrows",
            "index": "week mask activated-report changesrows/keyword",
            "actual_name": "week mask activated-report changes "rows",
            "type": "Keyword",
            "relevance": 0.14845808676677562721919612854695,
            "frequency": 1,
            "totalfrequency": 1272,
            "doccount": 1272,
            "dimension": "What"
            "disambiguated_name": "excelpath macro systemconfirmation",
            "index": "excelpath macro systemconfirmation/keyword",
            "actual_name": "excelpath macro systemconfirmation",
            "type": "Keyword",
            "relevance": 0.122082016914773362124247462661659,
            "frequency": 1,
            "totalfrequency": 12169,
            "doccount": 12169,
            "dimension": "What"
            "disambiguated_name": "realmonth timelong position managerbookouts",
            "index": "realmonth timelong position managerbookouts/keyword",
            "actual_name": "realmonth timelong position managerbookouts",
            "type": "Keyword",
            "relevance": 0.1921346421298961412514486683483175,
            "frequency": 1,
            "totalfrequency": 3918,
            "doccount": 3918,
            "dimension": "What"
            "disambiguated_name": "testingdeal preschedulenumber workspaceduplication",
            "index": "testingdeal preschedulenumber workspaceduplication/keyword",
            "actual_name": "testingdeal preschedulenumber workspaceduplication",
            "type": "Keyword",
            "relevance": 0.1765218079100226412910499876653425,
            "frequency": 1,
            "totalfrequency": 1216,
            "doccount": 1216,
            "dimension": "What"
            "disambiguated_name": "caraminutes",
            "index": "caraminutes/keyword",
            "actual_name": "caraminutes",
            "type": "Keyword",
            "relevance": 0.2041480122459530313613482658399254,
            "frequency": 1,
            "totalfrequency": 7361234,
            "doccount": 7361172,
            "dimension": "What"
            "disambiguated_name": "",
            "index": "smith/",
            "actual_name": "",
            "type": "KeywordAccount",
            "relevance": 0.27217844252943296,
            "frequency": 1,
            "totalfrequency": 7833251,
            "doccount": 7833251,
            "dimension": "What"
            "disambiguated_name": "david",
            "index": "david subject/",
            "actual_name": "david",
            "type": "KeywordAccount",
            "relevance": 0.15139765579194864,
            "frequency": 1,
            "totalfrequency": 930408,
            "doccount": 930408,
            "dimension": "What"
   { "tags": [
          "disambiguated_name": "sheet",
          "index": "sheet/keyword"email",
   "actual_name": "sheet" ],
    "communityId": [
       "type": "Keyword500df237e4b00e332fe993aa",
       "relevanceassociations": 0.20416968108320477,[
   "frequency": 1,             "totalfrequencyentity1": 436,"",
            "entity1_index": "doccount":",
            "dimensionverb": "Whatemailed",
        },      "verb_category": "emailed/communicated",
  {             "disambiguated_name"entity2": "total unrouted",
            "entity2_index": "total unrouted mwh/",
            "actualtime_namestart": "total unrouted mwh2001-07-09T14:33:32",
            "assoc_type": "KeywordEvent",
   "relevance": 0.1141385057566826 ],
    "metadata": {
        "frequency_FILE_METADATA_": 1,[
   "totalfrequency": 16,            {
"doccount": 16,             "dimension": "What"     "metadata": {
  },         {             "disambiguated_nameCreation-Date": "target[
testing date today",             "index": "target testing date today/keyword",         "2001-07-09T18:33:32Z"
   "actual_name": "target testing date today",                  "type": "Keyword",  ],
          "relevance": 0.18726422286448255,             "frequencysubject": 1,[
              "totalfrequency": 12,             "doccount"RE: 12,Testing Preschedule workspace"
          "dimension": "What"         },    ],
    {             "disambiguated_name": "deals",             "index"Message-From": "deals/keyword",[
             "actual_name": "deals",             "type": "",
            "relevance": 0.34025706056156424,             "frequency": 1],
            "totalfrequency": 5740,
            "doccountAuthor": 5261,[
            "dimension": "What"         },      ""
  {             "disambiguated_name": "double checking build route",     ],
       "index": "double checking build route/keyword",             "actual_nameMessage-To": "double[
checking build route",             "type": "Keyword",
            "relevance": 0.18886230001363824,"
           "frequency": 1,             "totalfrequency": 12,
          "doccount": 12,             "dimensiondate": "What"[
        },         {           "2001-07-09T18:33:32Z"
 "disambiguated_name": "path confirmation task",             "index": "path confirmation task/keyword",    ],
        "actual_name": "path confirmation task",             "typeContent-Type": "Keyword",[
              "relevance": 0.12326679747563907,             "frequency": 1,message/rfc822"
                   "totalfrequency": 16,    ]
        "doccount": 16,           }
 "dimension": "What"         },     }
   {         ]
   "disambiguated_name": "routes",    ],
        "indexemail_meta": "routes/keyword",[
            "actual_name": "routes",[
             "type": "Keyword",  {
          "relevance": 0.40825322818399834,
            "frequencyCreation-Date": [
1,             "totalfrequency": 142,          "2001-07-09T18:33:32Z"
  "doccount": 142,             "dimension": "What"   ],
     },         {      "Message-To": [
     "disambiguated_name": "west light load",             "index": "west light load/keyword", ""
            "actual_name": "west light load",      ],
      "type": "Keyword",             "relevanceContent-Type": 0.11288042191103252,[
            "frequency": 1,             "totalfrequencymessage/rfc822":
16,             "doccount": 16,         ],
   "dimension": "What"         },       "subject": [
{             "disambiguated_name": "rows",          "RE: Testing Preschedule "indexworkspace":
"rows/keyword",             "actual_name": "rows",      ],
      "type": "Keyword",             "relevancedate": 0.2721919612854695, [
                "frequency": 1,       "2001-07-09T18:33:32Z"
     "totalfrequency": 72,             "doccount": 72],
             "dimension": "What"      "Author": [
 },         {             "disambiguated_name": "path confirmation","
           "index": "path confirmation/keyword",       ],
     "actual_name": "path confirmation",             "typeMessage-From": "Keyword",[
            "relevance": 0.2124247462661659,             "frequency": 1,""
             "totalfrequency": 169,          ]
  "doccount": 169,             "dimension": "What"}
         },   ]
     {   ]

Annex - Old Format Source

Code Block
    "disambiguated_namedescription": "monthAll longof bookouts",the Enron emails corpus with TextRank keyword extraction enabled.",
    "indexextractType": "monthFile",
long bookouts/keyword",   "file": {
        "actual_namedomain": "month long bookoutsDOMAIN",
            "typepassword": "KeywordPASSWORD",

           "relevanceusername": 0.12514486683483175,"USER"
      "frequencyisPublic": 1true,
       "mediaType": "Email",
    "totalfrequencysearchCycle_secs": 18-1,
    "structuredAnalysis": {
        "doccountassociations": 18,[
            "dimension": "What"{
         },       "associations": [
{             "disambiguated_name": "deal number duplication",    {
        "index": "deal number duplication/keyword",             "actualassoc_nametype": "deal number duplicationEvent",
          "type": "Keyword",             "relevanceentity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0.12910499876653425].metadata.Author[0];)",
            "frequency": 1,
            "totalfrequencyentity2": 16"$SCRIPT(return _value;)",
            "doccount": 16,
            "dimensioniterateOver": "What"
  },         {             "disambiguatedtime_namestart": "minutes$SCRIPT( return _doc.publishedDate;)",
            "index": "minutes/keyword",
            "actual_nameverb": "minutesemailed",
            "type": "Keyword",
            "relevanceverb_category": 0.13613482658399254,"emailed/communicated"
            "frequency": 1,       }
     "totalfrequency": 1234,          ],
  "doccount": 1172,             "dimensioniterateOver": "Whatemail_meta"
        },    }
    {    ],
        "disambiguated_nameentities": "", [
                "indexdimension": "",
                "actualdisambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
                "type": "Account",
            "relevance": 0,
            "frequencyuseDocGeo": false
 1,           },
 "totalfrequency": 3251,          {
  "doccount": 3251,             "dimensionentities": [
"What"         },         {  {
          "disambiguated_name": "",             "indexdimension": "",
            "actual_name": "",
            "typedisambiguated_name": "Account",
          "relevance": 0,             "frequencyiterateOver": 1,
          "totalfrequency": 408,             "doccounttype": 408"Account",
            "dimension": "What"         }  "useDocGeo": false
 ],     "tags": [         "enron",      }
  "email",         "fraud"     ],
      "communityId": [         "500df237e4b00e332fe993aaiterateOver":  "email_meta"
  ],     "associations": [    }
    {    ],
        "entity1scriptEngine": "cara.semperger@enron.comJavaScript",
         "entity1_indextitle": ""$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)"
    "tags": [
       "verb": "emailedenron",
    "verb_category": "emailed/communicated",   "fraud"
    "entity2title": "will.smith@enron.comAll Enron Emails (TextRank)",
    "unstructuredAnalysis": {
        "entity2_indexmeta": "",[
  "time_start": "2001-07-09T14:33:32",             "assoc_typecontext": "EventAll",
        }     ],
    "metadatafieldName": {         "_FILE_METADATA_": ["email_meta",
              [  "flags": "m",
             {   "script": "var x=_metadata._FILE_METADATA_[0].metadata;x;",
                "metadatascriptlang": {"javascript"
        "Creation-DatesimpleTextCleanser": [
                "field"2001-07-09T18:33:32Z": "fullText",
 "flags": "md",
                      "subject"replacement": [" ",
                "script": "(?:\\[.*?\\])",
         "RE: Testing Preschedule workspace"    "scriptlang": "regex"
      ],      {
                 "field": "Message-Fromdescription":,
[                "flags": "md",
           ""     "replacement": " ",
                "script": "(?:\\[.*?\\])",
                "scriptlang": "regex"
      "Author": [     },
          ""      "field": "fullText",
                 ]"flags": "md",
                "replacement": ". ",
     "Message-To": [          "script": "<.*?>",
                "scriptlang": "will.smith@enron.comregex"
           ], {
                "field": "description",
     "date": [          "flags": "md",
                "replacement": "2001-07-09T18:33:32Z". ",
                "script": "<.*?>",
      ],          "scriptlang": "regex"
             "Content-Type": [},
                "field": "message/rfc822fullText",
                "flags": "md",
      ]          "replacement": ". ",
        }        "script": "(?:>|<)",
        }        "scriptlang": "regex"
        ]    },
    ],        {
"email_meta": [             [  "field": "description",
             {   "flags": "md",
                "Creation-Datereplacement": ". [",
                "script": "(?:>|<)",
      "2001-07-09T18:33:32Z"          "scriptlang": "regex"
        "Message-To": [       "field": "fullText",
                "will.smith@enron.comreplacement": " ",
                  ]"script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
                    "Content-Type"scriptlang": ["regex"
         "message/rfc822"   {
 "field": "description",
                  "subjectreplacement": [" ",
                "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
      "RE: Testing Preschedule workspace"       "scriptlang": "regex"
       "date": [        "field": "fullText",
               "2001-07-09T18:33:32Z" "replacement": " ",
"script": "(?:\\*{2,})",
                   "Authorscriptlang": ["regex"
           "" {
                   ]"field": "description",
                    "Message-From"replacement": [" ",
                       """script": "(?:\\*{2,})",
                "scriptlang": "regex"
  ]          }
      }  ]
     ]"url": "smb://modus:139/enron/enron_mail_20110402/maildir/",
    "useExtractor": "textrank",
  ]     }"useTextExtractor": "none"