Enron source gallery

Sample input document

 Message-ID: <32220443.1075841552668.JavaMail.evans@thyme>
Date: Mon, 9 Jul 2001 11:33:32 -0700 (PDT)
From: cara.semperger@enron.com
To: will.smith@enron.com
Subject: RE: Testing Preschedule workspace
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-From: Semperger, Cara </O=ENRON/OU=NA/CN=RECIPIENTS/CN=CSEMPER>
X-To: Smith, Will </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Wsmith>
X-cc: 
X-bcc: 
X-Folder: \ExMerge - Semperger, Cara\Deleted Items
X-Origin: SEMPERGER-C
X-FileName: cara semperger 6-26-02.PST

I am trying to pull it up now, it's taking a long time

 -----Original Message-----
From: =09Semperger, Cara =20
Sent:=09Monday, July 09, 2001 10:40 AM
To:=09Smith, Will; Atta, Asem
Cc:=09Bentley, Corry; Poston, David
Subject:=09Testing Preschedule workspace

Good Morning,

My target testing date today is June 18th,  I am running in Test P in Local=
 Enpower using actual data from our scheduling sheets re-arranged to meet t=
he new guidelines.

The daily deals I coded X in columns J and N,  the Month long bookouts and =
BOM bookouts I coded R. =20


What worked:

I was able to retrieve my saved workspace with all data intact. I had previ=
ously sucessfully copied and pasted my entire sheet from EXCEL to the PSW.

I was able to run the build route report with the criteria of "Starting On-=
 June 18-PaloVerde-Day of week Mask Activated-Report Changes activated." A =
check of deals actually scheduled vs. build route results showed that all d=
eals were extracted correctly from Enpower. Because I am working on closed =
dates, a cumulative test of this app will not be fully testable until produ=
ction. We are expecting to see the same functionality as the current incarn=
ation of Build route. The data extracted should be read only, time stamped,=
 and when run mulitple times additional data should be shown below previous=
ly extracted data.  The improvement we are expecting to see is the app shou=
ld not duplicate deal strips on dates that have no physical power flow. (We=
st Light Load currently does this in Start view, but not Active view)

Navigating around the scheduling sheet itself I was able to accurately exec=
ute the sort function on a single criteria at a time. Multiple sorting will=
 contunue to be done in excel, or we can do a series of single sorts in the=
 PSW to acheive the same result.

Routing deals: Will had deleted all routes for June 18th, starting me with =
a clean slate.  I made every path be for DAY. I was unable to confirm total=
 unrouted MWH, as the real time position manager does not seem to be functi=
oning in TESTP. The routing appeared to take 19 minutes with the status bar=
 showing steady progress during that time. This time is 15-17 minutes longe=
r than current speed using the Excel Macro system we have now. The error li=
st gave me a row by row description of what did not route, a very useful to=
ol.  OK was visible on all rows that the PSW believed that it had routed. I=
 had difficulty checking the routing results, as I kept getting BDE errors =
in Scheduling after routing had occurred (Local Enpower). Scheduling kept s=
tarting up in 1899.  I was unable to login to TestP through Terminal server=
 2, but was able to in Terminal Server 5. The results there were very encou=
raging! Most routing was done, and a spot check of deals shows that they we=
re routed properly. The deals that were not routed appear to be due to a us=
er error of deal number duplication in the sheet. This is consistent with w=
hat I would expect. I will further evaluate routing ability with our more c=
omplicated paths later. This routing was very easy, a large point with on p=
eak non shaped deals only.



Things I did not expect that I liked:

When I highlight a group of cells in Build Route, it stays highlighted when=
 I move up to the scheduling sheet to highlight a comparison group of cells=
.  This is very handy for double checking Build route against the scheduler=
's sheet.



What does not appear to be working at this time:

The physical or not physical flag of path does not seem to be showing up pr=
operly in routing.

Path Confirmation:  The running time appeared to be over one hour for one s=
heet, only 70 rows of the sheet being flagged for insertion into confirmati=
on. This current speed will not be sufficient to work in production. Also, =
many rows that were flagged for confirmation were not imported, and I canno=
t find an error log to help understand why deals were not imported to path =
confirmation.
When the path confirmation task was finished,  the application simply froze=
.  The status bar was no longer visible, leading me to believe that it was =
done, however the app was not able to be saved or closed or examined furthe=
r.


My conclusions:

The build route and routing functions work well enough to use in production=
, the copy-paste function works well for the West desk per our connectivity=
 issues.

Path Confirmation is not functioning at this point, and appears to be blowi=
ng up the app. No data was visible for June 18th even after the PSW ran thr=
ough its import function.


Please let me know when the issues I have named have been addressed and are=
 ready for further testing.

Thanks

Cara
503/464-3814

Source

{
    "description": "All of the Enron emails corpus with TextRank keyword extraction enabled.",
    "isPublic": true,
    "mediaType": "Email",
    "searchCycle_secs": -1,
    "tags": [
        "enron",
        "email",
        "fraud"
    ],
    "title": "All Enron Emails (TextRank)",
    "processingPipeline": [
        {
            "file": {
                "domain": "DOMAIN",
                "password": "PASSWORD",
                "username": "USER",
                "url": "smb://modus:139/enron/enron_mail_20110402/maildir/"
            }
        },
        {
            "harvest": {
                "searchCycle_secs": 1
            }
        },
        {
            "docMetadata": {
                "title": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)"
            }
        },
        {
            "text": [
                {
                    "fieldName": "fullText",
                    "script": "(?:\\[.*?\\])",
                    "scriptlang": "regex",
                    "flags": "md",
                    "replacement": " "
                },
                {
                    "fieldName": "description",
                    "script": "(?:\\[.*?\\])",
                    "scriptlang": "regex",
                    "flags": "md",
                    "replacement": " "
                },
                {
                    "fieldName": "fullText",
                    "script": "<.*?>",
                    "scriptlang": "regex",
                    "flags": "md",
                    "replacement": ". "
                },
                {
                    "fieldName": "description",
                    "script": "<.*?>",
                    "scriptlang": "regex",
                    "flags": "md",
                    "replacement": ". "
                },
                {
                    "fieldName": "fullText",
                    "script": "(?:>|<)",
                    "scriptlang": "regex",
                    "flags": "md",
                    "replacement": ". "
                },
                {
                    "fieldName": "description",
                    "script": "(?:>|<)",
                    "scriptlang": "regex",
                    "flags": "md",
                    "replacement": ". "
                },
                {
                    "fieldName": "fullText",
                    "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
                    "scriptlang": "regex",
                    "replacement": " "
                },
                {
                    "fieldName": "description",
                    "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
                    "scriptlang": "regex",
                    "replacement": " "
                },
                {
                    "fieldName": "fullText",
                    "script": "(?:\\*{2,})",
                    "scriptlang": "regex",
                    "replacement": " "
                },
                {
                    "fieldName": "description",
                    "script": "(?:\\*{2,})",
                    "scriptlang": "regex",
                    "replacement": " "
                }
            ]
        },
        {
            "contentMetadata": [
                {
                    "fieldName": "email_meta",
                    "script": "var x=_metadata._FILE_METADATA_[0].metadata;x;",
                    "scriptlang": "javascript",
                    "flags": "m"
                }
            ]
        },
        {
            "featureEngine": {
                "engineName": "textrank"
            }
        },
        {
            "entities": [
                {
                    "dimension": "What",
                    "disambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
                    "type": "Account",
                    "useDocGeo": false
                },
                {
                    "dimension": "What",
                    "disambiguated_name": "",
                    "iterateOver": "email_meta.Message-To",
                    "type": "Account",
                    "useDocGeo": false
                }
            ]
        },
        {
            "associations": [
                {
                    "assoc_type": "Event",
                    "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
                    "entity2": "$SCRIPT(return _value;)",
                    "iterateOver": "email_meta.Message-To",
                    "time_start": "$SCRIPT( return _doc.publishedDate;)",
                    "verb": "emailed",
                    "verb_category": "emailed/communicated"
                }
            ]
        }
    ]
}

Sample output document

 {
    "_id": "5048efb0e4b01fd6455420ee",
    "title": "RE: Testing Preschedule workspace",
    "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3",
    "created": "Sep 6, 2012 06:42:01 PM UTC",
    "modified": "Jul 24, 2012 01:13:02 AM UTC",
    "publishedDate": "Jul 9, 2001 06:33:32 PM UTC",
    "source": [
        "Enron Emails (TextRank)"
    ],
    "sourceKey": [
        "modus.139.enron.testing.."
    ],
    "mediaType": [
        "Email"
    ],
    "description": "I am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ",
    "entities": [
        {
            "disambiguated_name": "on- june 18-paloverde-day",
            "index": "on- june 18-paloverde-day/keyword",
            "actual_name": "on- june 18-paloverde-day",
            "type": "Keyword",
            "relevance": 0.10585404743253149,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "mulitple times additional data",
            "index": "mulitple times additional data/keyword",
            "actual_name": "mulitple times additional data",
            "type": "Keyword",
            "relevance": 0.18088061045762382,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "scheduling sheets",
            "index": "scheduling sheets/keyword",
            "actual_name": "scheduling sheets",
            "type": "Keyword",
            "relevance": 0.15086086188384693,
            "frequency": 1,
            "totalfrequency": 20,
            "doccount": 20,
            "dimension": "What"
        },
        {
            "disambiguated_name": "app",
            "index": "app/keyword",
            "actual_name": "app",
            "type": "Keyword",
            "relevance": 0.20415634782171557,
            "frequency": 1,
            "totalfrequency": 58,
            "doccount": 58,
            "dimension": "What"
        },
        {
            "disambiguated_name": "data",
            "index": "data/keyword",
            "actual_name": "data",
            "type": "Keyword",
            "relevance": 0.1361375118885727,
            "frequency": 1,
            "totalfrequency": 3323,
            "doccount": 3323,
            "dimension": "What"
        },
        {
            "disambiguated_name": "paths",
            "index": "paths/keyword",
            "actual_name": "paths",
            "type": "Keyword",
            "relevance": 0.2041916488834702,
            "frequency": 1,
            "totalfrequency": 99,
            "doccount": 99,
            "dimension": "What"
        },
        {
            "disambiguated_name": "build route report",
            "index": "build route report/keyword",
            "actual_name": "build route report",
            "type": "Keyword",
            "relevance": 0.11476307758997932,
            "frequency": 1,
            "totalfrequency": 36,
            "doccount": 36,
            "dimension": "What"
        },
        {
            "disambiguated_name": "testing preschedule workspace cara",
            "index": "testing preschedule workspace cara/keyword",
            "actual_name": "testing preschedule workspace cara",
            "type": "Keyword",
            "relevance": 0.16803833041631702,
            "frequency": 1,
            "totalfrequency": 8,
            "doccount": 8,
            "dimension": "What"
        },
        {
            "disambiguated_name": "physical power flow",
            "index": "physical power flow/keyword",
            "actual_name": "physical power flow",
            "type": "Keyword",
            "relevance": 0.11805512187037151,
            "frequency": 1,
            "totalfrequency": 17,
            "doccount": 17,
            "dimension": "What"
        },
        {
            "disambiguated_name": "i",
            "index": "i/keyword",
            "actual_name": "i",
            "type": "Keyword",
            "relevance": 0.13651904141534263,
            "frequency": 1,
            "totalfrequency": 18162,
            "doccount": 18162,
            "dimension": "What"
        },
        {
            "disambiguated_name": "total running time",
            "index": "total running time/keyword",
            "actual_name": "total running time",
            "type": "Keyword",
            "relevance": 0.11233232851584997,
            "frequency": 1,
            "totalfrequency": 10,
            "doccount": 10,
            "dimension": "What"
        },
        {
            "disambiguated_name": "time",
            "index": "time/keyword",
            "actual_name": "time",
            "type": "Keyword",
            "relevance": 0.34020922533185516,
            "frequency": 1,
            "totalfrequency": 17102,
            "doccount": 17102,
            "dimension": "What"
        },
        {
            "disambiguated_name": "psw",
            "index": "psw/keyword",
            "actual_name": "psw",
            "type": "Keyword",
            "relevance": 0.13625985262266815,
            "frequency": 1,
            "totalfrequency": 46,
            "doccount": 46,
            "dimension": "What"
        },
        {
            "disambiguated_name": "semperger",
            "index": "semperger/keyword",
            "actual_name": "semperger",
            "type": "Keyword",
            "relevance": 0.2724417241053495,
            "frequency": 1,
            "totalfrequency": 226,
            "doccount": 226,
            "dimension": "What"
        },
        {
            "disambiguated_name": "peak non shaped deals",
            "index": "peak non shaped deals/keyword",
            "actual_name": "peak non shaped deals",
            "type": "Keyword",
            "relevance": 0.19127581970645322,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "table edit",
            "index": "table edit/keyword",
            "actual_name": "table edit",
            "type": "Keyword",
            "relevance": 0.21207334129182112,
            "frequency": 1,
            "totalfrequency": 32,
            "doccount": 32,
            "dimension": "What"
        },
        {
            "disambiguated_name": "week mask activated-report changes",
            "index": "week mask activated-report changes/keyword",
            "actual_name": "week mask activated-report changes",
            "type": "Keyword",
            "relevance": 0.1484580867667756,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "excel macro system",
            "index": "excel macro system/keyword",
            "actual_name": "excel macro system",
            "type": "Keyword",
            "relevance": 0.12208201691477336,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "real time position manager",
            "index": "real time position manager/keyword",
            "actual_name": "real time position manager",
            "type": "Keyword",
            "relevance": 0.19213464212989614,
            "frequency": 1,
            "totalfrequency": 39,
            "doccount": 39,
            "dimension": "What"
        },
        {
            "disambiguated_name": "testing preschedule workspace",
            "index": "testing preschedule workspace/keyword",
            "actual_name": "testing preschedule workspace",
            "type": "Keyword",
            "relevance": 0.17652180791002264,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "cara",
            "index": "cara/keyword",
            "actual_name": "cara",
            "type": "Keyword",
            "relevance": 0.20414801224595303,
            "frequency": 1,
            "totalfrequency": 736,
            "doccount": 736,
            "dimension": "What"
        },
        {
            "disambiguated_name": "smith",
            "index": "smith/keyword",
            "actual_name": "smith",
            "type": "Keyword",
            "relevance": 0.27217844252943296,
            "frequency": 1,
            "totalfrequency": 783,
            "doccount": 783,
            "dimension": "What"
        },
        {
            "disambiguated_name": "david subject",
            "index": "david subject/keyword",
            "actual_name": "david subject",
            "type": "Keyword",
            "relevance": 0.15139765579194864,
            "frequency": 1,
            "totalfrequency": 930,
            "doccount": 930,
            "dimension": "What"
        },
        {
            "disambiguated_name": "sheet",
            "index": "sheet/keyword",
            "actual_name": "sheet",
            "type": "Keyword",
            "relevance": 0.20416968108320477,
            "frequency": 1,
            "totalfrequency": 436,
            "doccount": 436,
            "dimension": "What"
        },
        {
            "disambiguated_name": "total unrouted mwh",
            "index": "total unrouted mwh/keyword",
            "actual_name": "total unrouted mwh",
            "type": "Keyword",
            "relevance": 0.1141385057566826,
            "frequency": 1,
            "totalfrequency": 16,
            "doccount": 16,
            "dimension": "What"
        },
        {
            "disambiguated_name": "target testing date today",
            "index": "target testing date today/keyword",
            "actual_name": "target testing date today",
            "type": "Keyword",
            "relevance": 0.18726422286448255,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "deals",
            "index": "deals/keyword",
            "actual_name": "deals",
            "type": "Keyword",
            "relevance": 0.34025706056156424,
            "frequency": 1,
            "totalfrequency": 5740,
            "doccount": 5261,
            "dimension": "What"
        },
        {
            "disambiguated_name": "double checking build route",
            "index": "double checking build route/keyword",
            "actual_name": "double checking build route",
            "type": "Keyword",
            "relevance": 0.18886230001363824,
            "frequency": 1,
            "totalfrequency": 12,
            "doccount": 12,
            "dimension": "What"
        },
        {
            "disambiguated_name": "path confirmation task",
            "index": "path confirmation task/keyword",
            "actual_name": "path confirmation task",
            "type": "Keyword",
            "relevance": 0.12326679747563907,
            "frequency": 1,
            "totalfrequency": 16,
            "doccount": 16,
            "dimension": "What"
        },
        {
            "disambiguated_name": "routes",
            "index": "routes/keyword",
            "actual_name": "routes",
            "type": "Keyword",
            "relevance": 0.40825322818399834,
            "frequency": 1,
            "totalfrequency": 142,
            "doccount": 142,
            "dimension": "What"
        },
        {
            "disambiguated_name": "west light load",
            "index": "west light load/keyword",
            "actual_name": "west light load",
            "type": "Keyword",
            "relevance": 0.11288042191103252,
            "frequency": 1,
            "totalfrequency": 16,
            "doccount": 16,
            "dimension": "What"
        },
        {
            "disambiguated_name": "rows",
            "index": "rows/keyword",
            "actual_name": "rows",
            "type": "Keyword",
            "relevance": 0.2721919612854695,
            "frequency": 1,
            "totalfrequency": 72,
            "doccount": 72,
            "dimension": "What"
        },
        {
            "disambiguated_name": "path confirmation",
            "index": "path confirmation/keyword",
            "actual_name": "path confirmation",
            "type": "Keyword",
            "relevance": 0.2124247462661659,
            "frequency": 1,
            "totalfrequency": 169,
            "doccount": 169,
            "dimension": "What"
        },
        {
            "disambiguated_name": "month long bookouts",
            "index": "month long bookouts/keyword",
            "actual_name": "month long bookouts",
            "type": "Keyword",
            "relevance": 0.12514486683483175,
            "frequency": 1,
            "totalfrequency": 18,
            "doccount": 18,
            "dimension": "What"
        },
        {
            "disambiguated_name": "deal number duplication",
            "index": "deal number duplication/keyword",
            "actual_name": "deal number duplication",
            "type": "Keyword",
            "relevance": 0.12910499876653425,
            "frequency": 1,
            "totalfrequency": 16,
            "doccount": 16,
            "dimension": "What"
        },
        {
            "disambiguated_name": "minutes",
            "index": "minutes/keyword",
            "actual_name": "minutes",
            "type": "Keyword",
            "relevance": 0.13613482658399254,
            "frequency": 1,
            "totalfrequency": 1234,
            "doccount": 1172,
            "dimension": "What"
        },
        {
            "disambiguated_name": "cara.semperger@enron.com",
            "index": "cara.semperger@enron.com/account",
            "actual_name": "cara.semperger@enron.com",
            "type": "Account",
            "relevance": 0,
            "frequency": 1,
            "totalfrequency": 3251,
            "doccount": 3251,
            "dimension": "What"
        },
        {
            "disambiguated_name": "will.smith@enron.com",
            "index": "will.smith@enron.com/account",
            "actual_name": "will.smith@enron.com",
            "type": "Account",
            "relevance": 0,
            "frequency": 1,
            "totalfrequency": 408,
            "doccount": 408,
            "dimension": "What"
        }
    ],
    "tags": [
        "enron",
        "email",
        "fraud"
    ],
    "communityId": [
        "500df237e4b00e332fe993aa"
    ],
    "associations": [
        {
            "entity1": "cara.semperger@enron.com",
            "entity1_index": "cara.semperger@enron.com/account",
            "verb": "emailed",
            "verb_category": "emailed/communicated",
            "entity2": "will.smith@enron.com",
            "entity2_index": "will.smith@enron.com/account",
            "time_start": "2001-07-09T14:33:32",
            "assoc_type": "Event"
        }
    ],
    "metadata": {
        "_FILE_METADATA_": [
            [
                {
                    "metadata": {
                        "Creation-Date": [
                            "2001-07-09T18:33:32Z"
                        ],
                        "subject": [
                            "RE: Testing Preschedule workspace"
                        ],
                        "Message-From": [
                            "cara.semperger@enron.com"
                        ],
                        "Author": [
                            "cara.semperger@enron.com"
                        ],
                        "Message-To": [
                            "will.smith@enron.com"
                        ],
                        "date": [
                            "2001-07-09T18:33:32Z"
                        ],
                        "Content-Type": [
                            "message/rfc822"
                        ]
                    }
                }
            ]
        ],
        "email_meta": [
            [
                {
                    "Creation-Date": [
                        "2001-07-09T18:33:32Z"
                    ],
                    "Message-To": [
                        "will.smith@enron.com"
                    ],
                    "Content-Type": [
                        "message/rfc822"
                    ],
                    "subject": [
                        "RE: Testing Preschedule workspace"
                    ],
                    "date": [
                        "2001-07-09T18:33:32Z"
                    ],
                    "Author": [
                        "cara.semperger@enron.com"
                    ],
                    "Message-From": [
                        "cara.semperger@enron.com"
                    ]
                }
            ]
        ]
    }
}

Annex - Old Format Source

{
    "description": "All of the Enron emails corpus with TextRank keyword extraction enabled.",
    "extractType": "File",
    "file": {
        "domain": "DOMAIN",
        "password": "PASSWORD",
        "username": "USER"
    },
    "isPublic": true,
    "mediaType": "Email",
    "searchCycle_secs": -1,
    "structuredAnalysis": {
        "associations": [
            {
                "associations": [
                    {
                        "assoc_type": "Event",
                        "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
                        "entity2": "$SCRIPT(return _value;)",
                        "iterateOver": "Message-To",
                        "time_start": "$SCRIPT( return _doc.publishedDate;)",
                        "verb": "emailed",
                        "verb_category": "emailed/communicated"
                    }
                ],
                "iterateOver": "email_meta"
            }
        ],
        "entities": [
            {
                "dimension": "What",
                "disambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",
                "type": "Account",
                "useDocGeo": false
            },
            {
                "entities": [
                    {
                        "dimension": "What",
                        "disambiguated_name": "",
                        "iterateOver": "Message-To",
                        "type": "Account",
                        "useDocGeo": false
                    }
                ],
                "iterateOver": "email_meta"
            }
        ],
        "scriptEngine": "JavaScript",
        "title": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)"
    },
    "tags": [
        "enron",
        "email",
        "fraud"
    ],
    "title": "All Enron Emails (TextRank)",
    "unstructuredAnalysis": {
        "meta": [
            {
                "context": "All",
                "fieldName": "email_meta",
                "flags": "m",
                "script": "var x=_metadata._FILE_METADATA_[0].metadata;x;",
                "scriptlang": "javascript"
            }
        ],
        "simpleTextCleanser": [
            {
                "field": "fullText",
                "flags": "md",
                "replacement": " ",
                "script": "(?:\\[.*?\\])",
                "scriptlang": "regex"
            },
            {
                "field": "description",
                "flags": "md",
                "replacement": " ",
                "script": "(?:\\[.*?\\])",
                "scriptlang": "regex"
            },
            {
                "field": "fullText",
                "flags": "md",
                "replacement": ". ",
                "script": "<.*?>",
                "scriptlang": "regex"
            },
            {
                "field": "description",
                "flags": "md",
                "replacement": ". ",
                "script": "<.*?>",
                "scriptlang": "regex"
            },
            {
                "field": "fullText",
                "flags": "md",
                "replacement": ". ",
                "script": "(?:>|<)",
                "scriptlang": "regex"
            },
            {
                "field": "description",
                "flags": "md",
                "replacement": ". ",
                "script": "(?:>|<)",
                "scriptlang": "regex"
            },
            {
                "field": "fullText",
                "replacement": " ",
                "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
                "scriptlang": "regex"
            },
            {
                "field": "description",
                "replacement": " ",
                "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",
                "scriptlang": "regex"
            },
            {
                "field": "fullText",
                "replacement": " ",
                "script": "(?:\\*{2,})",
                "scriptlang": "regex"
            },
            {
                "field": "description",
                "replacement": " ",
                "script": "(?:\\*{2,})",
                "scriptlang": "regex"
            }
        ]
    },
    "url": "smb://modus:139/enron/enron_mail_20110402/maildir/",
    "useExtractor": "textrank",
    "useTextExtractor": "none"
}