Enron source gallery
Sample input document
Message-ID: <32220443.1075841552668.JavaMail.evans@thyme> Date: Mon, 9 Jul 2001 11:33:32 -0700 (PDT) From: cara.semperger@enron.com To: will.smith@enron.com Subject: RE: Testing Preschedule workspace Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-From: Semperger, Cara </O=ENRON/OU=NA/CN=RECIPIENTS/CN=CSEMPER> X-To: Smith, Will </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Wsmith> X-cc: X-bcc: X-Folder: \ExMerge - Semperger, Cara\Deleted Items X-Origin: SEMPERGER-C X-FileName: cara semperger 6-26-02.PST I am trying to pull it up now, it's taking a long time -----Original Message----- From: =09Semperger, Cara =20 Sent:=09Monday, July 09, 2001 10:40 AM To:=09Smith, Will; Atta, Asem Cc:=09Bentley, Corry; Poston, David Subject:=09Testing Preschedule workspace Good Morning, My target testing date today is June 18th, I am running in Test P in Local= Enpower using actual data from our scheduling sheets re-arranged to meet t= he new guidelines. The daily deals I coded X in columns J and N, the Month long bookouts and = BOM bookouts I coded R. =20 What worked: I was able to retrieve my saved workspace with all data intact. I had previ= ously sucessfully copied and pasted my entire sheet from EXCEL to the PSW. I was able to run the build route report with the criteria of "Starting On-= June 18-PaloVerde-Day of week Mask Activated-Report Changes activated." A = check of deals actually scheduled vs. build route results showed that all d= eals were extracted correctly from Enpower. Because I am working on closed = dates, a cumulative test of this app will not be fully testable until produ= ction. We are expecting to see the same functionality as the current incarn= ation of Build route. The data extracted should be read only, time stamped,= and when run mulitple times additional data should be shown below previous= ly extracted data. The improvement we are expecting to see is the app shou= ld not duplicate deal strips on dates that have no physical power flow. (We= st Light Load currently does this in Start view, but not Active view) Navigating around the scheduling sheet itself I was able to accurately exec= ute the sort function on a single criteria at a time. Multiple sorting will= contunue to be done in excel, or we can do a series of single sorts in the= PSW to acheive the same result. Routing deals: Will had deleted all routes for June 18th, starting me with = a clean slate. I made every path be for DAY. I was unable to confirm total= unrouted MWH, as the real time position manager does not seem to be functi= oning in TESTP. The routing appeared to take 19 minutes with the status bar= showing steady progress during that time. This time is 15-17 minutes longe= r than current speed using the Excel Macro system we have now. The error li= st gave me a row by row description of what did not route, a very useful to= ol. OK was visible on all rows that the PSW believed that it had routed. I= had difficulty checking the routing results, as I kept getting BDE errors = in Scheduling after routing had occurred (Local Enpower). Scheduling kept s= tarting up in 1899. I was unable to login to TestP through Terminal server= 2, but was able to in Terminal Server 5. The results there were very encou= raging! Most routing was done, and a spot check of deals shows that they we= re routed properly. The deals that were not routed appear to be due to a us= er error of deal number duplication in the sheet. This is consistent with w= hat I would expect. I will further evaluate routing ability with our more c= omplicated paths later. This routing was very easy, a large point with on p= eak non shaped deals only. Things I did not expect that I liked: When I highlight a group of cells in Build Route, it stays highlighted when= I move up to the scheduling sheet to highlight a comparison group of cells= . This is very handy for double checking Build route against the scheduler= 's sheet. What does not appear to be working at this time: The physical or not physical flag of path does not seem to be showing up pr= operly in routing. Path Confirmation: The running time appeared to be over one hour for one s= heet, only 70 rows of the sheet being flagged for insertion into confirmati= on. This current speed will not be sufficient to work in production. Also, = many rows that were flagged for confirmation were not imported, and I canno= t find an error log to help understand why deals were not imported to path = confirmation. When the path confirmation task was finished, the application simply froze= . The status bar was no longer visible, leading me to believe that it was = done, however the app was not able to be saved or closed or examined furthe= r. My conclusions: The build route and routing functions work well enough to use in production= , the copy-paste function works well for the West desk per our connectivity= issues. Path Confirmation is not functioning at this point, and appears to be blowi= ng up the app. No data was visible for June 18th even after the PSW ran thr= ough its import function. Please let me know when the issues I have named have been addressed and are= ready for further testing. Thanks Cara 503/464-3814
Source
{ "description": "All of the Enron emails corpus with TextRank keyword extraction enabled.", "isPublic": true, "mediaType": "Email", "searchCycle_secs": -1, "tags": [ "enron", "email", "fraud" ], "title": "All Enron Emails (TextRank)", "processingPipeline": [ { "file": { "domain": "DOMAIN", "password": "PASSWORD", "username": "USER", "url": "smb://modus:139/enron/enron_mail_20110402/maildir/" } }, { "harvest": { "searchCycle_secs": 1 } }, { "docMetadata": { "title": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)" } }, { "text": [ { "fieldName": "fullText", "script": "(?:\\[.*?\\])", "scriptlang": "regex", "flags": "md", "replacement": " " }, { "fieldName": "description", "script": "(?:\\[.*?\\])", "scriptlang": "regex", "flags": "md", "replacement": " " }, { "fieldName": "fullText", "script": "<.*?>", "scriptlang": "regex", "flags": "md", "replacement": ". " }, { "fieldName": "description", "script": "<.*?>", "scriptlang": "regex", "flags": "md", "replacement": ". " }, { "fieldName": "fullText", "script": "(?:>|<)", "scriptlang": "regex", "flags": "md", "replacement": ". " }, { "fieldName": "description", "script": "(?:>|<)", "scriptlang": "regex", "flags": "md", "replacement": ". " }, { "fieldName": "fullText", "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", "scriptlang": "regex", "replacement": " " }, { "fieldName": "description", "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", "scriptlang": "regex", "replacement": " " }, { "fieldName": "fullText", "script": "(?:\\*{2,})", "scriptlang": "regex", "replacement": " " }, { "fieldName": "description", "script": "(?:\\*{2,})", "scriptlang": "regex", "replacement": " " } ] }, { "contentMetadata": [ { "fieldName": "email_meta", "script": "var x=_metadata._FILE_METADATA_[0].metadata;x;", "scriptlang": "javascript", "flags": "m" } ] }, { "featureEngine": { "engineName": "textrank" } }, { "entities": [ { "dimension": "What", "disambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", "type": "Account", "useDocGeo": false }, { "dimension": "What", "disambiguated_name": "", "iterateOver": "email_meta.Message-To", "type": "Account", "useDocGeo": false } ] }, { "associations": [ { "assoc_type": "Event", "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", "entity2": "$SCRIPT(return _value;)", "iterateOver": "email_meta.Message-To", "time_start": "$SCRIPT( return _doc.publishedDate;)", "verb": "emailed", "verb_category": "emailed/communicated" } ] } ] }
Sample output document
{ "_id": "5048efb0e4b01fd6455420ee", "title": "RE: Testing Preschedule workspace", "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3", "created": "Sep 6, 2012 06:42:01 PM UTC", "modified": "Jul 24, 2012 01:13:02 AM UTC", "publishedDate": "Jul 9, 2001 06:33:32 PM UTC", "source": [ "Enron Emails (TextRank)" ], "sourceKey": [ "modus.139.enron.testing.." ], "mediaType": [ "Email" ], "description": "I am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ", "entities": [ { "disambiguated_name": "on- june 18-paloverde-day", "index": "on- june 18-paloverde-day/keyword", "actual_name": "on- june 18-paloverde-day", "type": "Keyword", "relevance": 0.10585404743253149, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "mulitple times additional data", "index": "mulitple times additional data/keyword", "actual_name": "mulitple times additional data", "type": "Keyword", "relevance": 0.18088061045762382, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "scheduling sheets", "index": "scheduling sheets/keyword", "actual_name": "scheduling sheets", "type": "Keyword", "relevance": 0.15086086188384693, "frequency": 1, "totalfrequency": 20, "doccount": 20, "dimension": "What" }, { "disambiguated_name": "app", "index": "app/keyword", "actual_name": "app", "type": "Keyword", "relevance": 0.20415634782171557, "frequency": 1, "totalfrequency": 58, "doccount": 58, "dimension": "What" }, { "disambiguated_name": "data", "index": "data/keyword", "actual_name": "data", "type": "Keyword", "relevance": 0.1361375118885727, "frequency": 1, "totalfrequency": 3323, "doccount": 3323, "dimension": "What" }, { "disambiguated_name": "paths", "index": "paths/keyword", "actual_name": "paths", "type": "Keyword", "relevance": 0.2041916488834702, "frequency": 1, "totalfrequency": 99, "doccount": 99, "dimension": "What" }, { "disambiguated_name": "build route report", "index": "build route report/keyword", "actual_name": "build route report", "type": "Keyword", "relevance": 0.11476307758997932, "frequency": 1, "totalfrequency": 36, "doccount": 36, "dimension": "What" }, { "disambiguated_name": "testing preschedule workspace cara", "index": "testing preschedule workspace cara/keyword", "actual_name": "testing preschedule workspace cara", "type": "Keyword", "relevance": 0.16803833041631702, "frequency": 1, "totalfrequency": 8, "doccount": 8, "dimension": "What" }, { "disambiguated_name": "physical power flow", "index": "physical power flow/keyword", "actual_name": "physical power flow", "type": "Keyword", "relevance": 0.11805512187037151, "frequency": 1, "totalfrequency": 17, "doccount": 17, "dimension": "What" }, { "disambiguated_name": "i", "index": "i/keyword", "actual_name": "i", "type": "Keyword", "relevance": 0.13651904141534263, "frequency": 1, "totalfrequency": 18162, "doccount": 18162, "dimension": "What" }, { "disambiguated_name": "total running time", "index": "total running time/keyword", "actual_name": "total running time", "type": "Keyword", "relevance": 0.11233232851584997, "frequency": 1, "totalfrequency": 10, "doccount": 10, "dimension": "What" }, { "disambiguated_name": "time", "index": "time/keyword", "actual_name": "time", "type": "Keyword", "relevance": 0.34020922533185516, "frequency": 1, "totalfrequency": 17102, "doccount": 17102, "dimension": "What" }, { "disambiguated_name": "psw", "index": "psw/keyword", "actual_name": "psw", "type": "Keyword", "relevance": 0.13625985262266815, "frequency": 1, "totalfrequency": 46, "doccount": 46, "dimension": "What" }, { "disambiguated_name": "semperger", "index": "semperger/keyword", "actual_name": "semperger", "type": "Keyword", "relevance": 0.2724417241053495, "frequency": 1, "totalfrequency": 226, "doccount": 226, "dimension": "What" }, { "disambiguated_name": "peak non shaped deals", "index": "peak non shaped deals/keyword", "actual_name": "peak non shaped deals", "type": "Keyword", "relevance": 0.19127581970645322, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "table edit", "index": "table edit/keyword", "actual_name": "table edit", "type": "Keyword", "relevance": 0.21207334129182112, "frequency": 1, "totalfrequency": 32, "doccount": 32, "dimension": "What" }, { "disambiguated_name": "week mask activated-report changes", "index": "week mask activated-report changes/keyword", "actual_name": "week mask activated-report changes", "type": "Keyword", "relevance": 0.1484580867667756, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "excel macro system", "index": "excel macro system/keyword", "actual_name": "excel macro system", "type": "Keyword", "relevance": 0.12208201691477336, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "real time position manager", "index": "real time position manager/keyword", "actual_name": "real time position manager", "type": "Keyword", "relevance": 0.19213464212989614, "frequency": 1, "totalfrequency": 39, "doccount": 39, "dimension": "What" }, { "disambiguated_name": "testing preschedule workspace", "index": "testing preschedule workspace/keyword", "actual_name": "testing preschedule workspace", "type": "Keyword", "relevance": 0.17652180791002264, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "cara", "index": "cara/keyword", "actual_name": "cara", "type": "Keyword", "relevance": 0.20414801224595303, "frequency": 1, "totalfrequency": 736, "doccount": 736, "dimension": "What" }, { "disambiguated_name": "smith", "index": "smith/keyword", "actual_name": "smith", "type": "Keyword", "relevance": 0.27217844252943296, "frequency": 1, "totalfrequency": 783, "doccount": 783, "dimension": "What" }, { "disambiguated_name": "david subject", "index": "david subject/keyword", "actual_name": "david subject", "type": "Keyword", "relevance": 0.15139765579194864, "frequency": 1, "totalfrequency": 930, "doccount": 930, "dimension": "What" }, { "disambiguated_name": "sheet", "index": "sheet/keyword", "actual_name": "sheet", "type": "Keyword", "relevance": 0.20416968108320477, "frequency": 1, "totalfrequency": 436, "doccount": 436, "dimension": "What" }, { "disambiguated_name": "total unrouted mwh", "index": "total unrouted mwh/keyword", "actual_name": "total unrouted mwh", "type": "Keyword", "relevance": 0.1141385057566826, "frequency": 1, "totalfrequency": 16, "doccount": 16, "dimension": "What" }, { "disambiguated_name": "target testing date today", "index": "target testing date today/keyword", "actual_name": "target testing date today", "type": "Keyword", "relevance": 0.18726422286448255, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "deals", "index": "deals/keyword", "actual_name": "deals", "type": "Keyword", "relevance": 0.34025706056156424, "frequency": 1, "totalfrequency": 5740, "doccount": 5261, "dimension": "What" }, { "disambiguated_name": "double checking build route", "index": "double checking build route/keyword", "actual_name": "double checking build route", "type": "Keyword", "relevance": 0.18886230001363824, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "path confirmation task", "index": "path confirmation task/keyword", "actual_name": "path confirmation task", "type": "Keyword", "relevance": 0.12326679747563907, "frequency": 1, "totalfrequency": 16, "doccount": 16, "dimension": "What" }, { "disambiguated_name": "routes", "index": "routes/keyword", "actual_name": "routes", "type": "Keyword", "relevance": 0.40825322818399834, "frequency": 1, "totalfrequency": 142, "doccount": 142, "dimension": "What" }, { "disambiguated_name": "west light load", "index": "west light load/keyword", "actual_name": "west light load", "type": "Keyword", "relevance": 0.11288042191103252, "frequency": 1, "totalfrequency": 16, "doccount": 16, "dimension": "What" }, { "disambiguated_name": "rows", "index": "rows/keyword", "actual_name": "rows", "type": "Keyword", "relevance": 0.2721919612854695, "frequency": 1, "totalfrequency": 72, "doccount": 72, "dimension": "What" }, { "disambiguated_name": "path confirmation", "index": "path confirmation/keyword", "actual_name": "path confirmation", "type": "Keyword", "relevance": 0.2124247462661659, "frequency": 1, "totalfrequency": 169, "doccount": 169, "dimension": "What" }, { "disambiguated_name": "month long bookouts", "index": "month long bookouts/keyword", "actual_name": "month long bookouts", "type": "Keyword", "relevance": 0.12514486683483175, "frequency": 1, "totalfrequency": 18, "doccount": 18, "dimension": "What" }, { "disambiguated_name": "deal number duplication", "index": "deal number duplication/keyword", "actual_name": "deal number duplication", "type": "Keyword", "relevance": 0.12910499876653425, "frequency": 1, "totalfrequency": 16, "doccount": 16, "dimension": "What" }, { "disambiguated_name": "minutes", "index": "minutes/keyword", "actual_name": "minutes", "type": "Keyword", "relevance": 0.13613482658399254, "frequency": 1, "totalfrequency": 1234, "doccount": 1172, "dimension": "What" }, { "disambiguated_name": "cara.semperger@enron.com", "index": "cara.semperger@enron.com/account", "actual_name": "cara.semperger@enron.com", "type": "Account", "relevance": 0, "frequency": 1, "totalfrequency": 3251, "doccount": 3251, "dimension": "What" }, { "disambiguated_name": "will.smith@enron.com", "index": "will.smith@enron.com/account", "actual_name": "will.smith@enron.com", "type": "Account", "relevance": 0, "frequency": 1, "totalfrequency": 408, "doccount": 408, "dimension": "What" } ], "tags": [ "enron", "email", "fraud" ], "communityId": [ "500df237e4b00e332fe993aa" ], "associations": [ { "entity1": "cara.semperger@enron.com", "entity1_index": "cara.semperger@enron.com/account", "verb": "emailed", "verb_category": "emailed/communicated", "entity2": "will.smith@enron.com", "entity2_index": "will.smith@enron.com/account", "time_start": "2001-07-09T14:33:32", "assoc_type": "Event" } ], "metadata": { "_FILE_METADATA_": [ [ { "metadata": { "Creation-Date": [ "2001-07-09T18:33:32Z" ], "subject": [ "RE: Testing Preschedule workspace" ], "Message-From": [ "cara.semperger@enron.com" ], "Author": [ "cara.semperger@enron.com" ], "Message-To": [ "will.smith@enron.com" ], "date": [ "2001-07-09T18:33:32Z" ], "Content-Type": [ "message/rfc822" ] } } ] ], "email_meta": [ [ { "Creation-Date": [ "2001-07-09T18:33:32Z" ], "Message-To": [ "will.smith@enron.com" ], "Content-Type": [ "message/rfc822" ], "subject": [ "RE: Testing Preschedule workspace" ], "date": [ "2001-07-09T18:33:32Z" ], "Author": [ "cara.semperger@enron.com" ], "Message-From": [ "cara.semperger@enron.com" ] } ] ] } }
Annex - Old Format Source
{ "description": "All of the Enron emails corpus with TextRank keyword extraction enabled.", "extractType": "File", "file": { "domain": "DOMAIN", "password": "PASSWORD", "username": "USER" }, "isPublic": true, "mediaType": "Email", "searchCycle_secs": -1, "structuredAnalysis": { "associations": [ { "associations": [ { "assoc_type": "Event", "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", "entity2": "$SCRIPT(return _value;)", "iterateOver": "Message-To", "time_start": "$SCRIPT( return _doc.publishedDate;)", "verb": "emailed", "verb_category": "emailed/communicated" } ], "iterateOver": "email_meta" } ], "entities": [ { "dimension": "What", "disambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", "type": "Account", "useDocGeo": false }, { "entities": [ { "dimension": "What", "disambiguated_name": "", "iterateOver": "Message-To", "type": "Account", "useDocGeo": false } ], "iterateOver": "email_meta" } ], "scriptEngine": "JavaScript", "title": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)" }, "tags": [ "enron", "email", "fraud" ], "title": "All Enron Emails (TextRank)", "unstructuredAnalysis": { "meta": [ { "context": "All", "fieldName": "email_meta", "flags": "m", "script": "var x=_metadata._FILE_METADATA_[0].metadata;x;", "scriptlang": "javascript" } ], "simpleTextCleanser": [ { "field": "fullText", "flags": "md", "replacement": " ", "script": "(?:\\[.*?\\])", "scriptlang": "regex" }, { "field": "description", "flags": "md", "replacement": " ", "script": "(?:\\[.*?\\])", "scriptlang": "regex" }, { "field": "fullText", "flags": "md", "replacement": ". ", "script": "<.*?>", "scriptlang": "regex" }, { "field": "description", "flags": "md", "replacement": ". ", "script": "<.*?>", "scriptlang": "regex" }, { "field": "fullText", "flags": "md", "replacement": ". ", "script": "(?:>|<)", "scriptlang": "regex" }, { "field": "description", "flags": "md", "replacement": ". ", "script": "(?:>|<)", "scriptlang": "regex" }, { "field": "fullText", "replacement": " ", "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", "scriptlang": "regex" }, { "field": "description", "replacement": " ", "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", "scriptlang": "regex" }, { "field": "fullText", "replacement": " ", "script": "(?:\\*{2,})", "scriptlang": "regex" }, { "field": "description", "replacement": " ", "script": "(?:\\*{2,})", "scriptlang": "regex" } ] }, "url": "smb://modus:139/enron/enron_mail_20110402/maildir/", "useExtractor": "textrank", "useTextExtractor": "none" }