Sample input document
Code Block | ||
---|---|---|
| ||
Message-ID: <32220443.1075841552668.JavaMail.evans@thyme> Date: Mon, 9 Jul 2001 11:33:32 -0700 (PDT) From: cara.semperger@enron.com To: will.smith@enron.com Subject: RE: Testing Preschedule workspace Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-From: Semperger, Cara </O=ENRON/OU=NA/CN=RECIPIENTS/CN=CSEMPER> X-To: Smith, Will </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Wsmith> X-cc: X-bcc: X-Folder: \ExMerge - Semperger, Cara\Deleted Items X-Origin: SEMPERGER-C X-FileName: cara semperger 6-26-02.PST I am trying to pull it up now, it's taking a long time -----Original Message----- From: =09Semperger, Cara =20 Sent:=09Monday, July 09, 2001 10:40 AM To:=09Smith, Will; Atta, Asem Cc:=09Bentley, Corry; Poston, David Subject:=09Testing Preschedule workspace Good Morning, My target testing date today is June 18th, I am running in Test P in Local= Enpower using actual data from our scheduling sheets re-arranged to meet t= he new guidelines. The daily deals I coded X in columns J and N, the Month long bookouts and = BOM bookouts I coded R. =20 What worked: I was able to retrieve my saved workspace with all data intact. I had previ= ously sucessfully copied and pasted my entire sheet from EXCEL to the PSW. I was able to run the build route report with the criteria of "Starting On-= June 18-PaloVerde-Day of week Mask Activated-Report Changes activated." A = check of deals actually scheduled vs. build route results showed that all d= eals were extracted correctly from Enpower. Because I am working on closed = dates, a cumulative test of this app will not be fully testable until produ= ction. We are expecting to see the same functionality as the current incarn= ation of Build route. The data extracted should be read only, time stamped,= and when run mulitple times additional data should be shown below previous= ly extracted data. The improvement we are expecting to see is the app shou= ld not duplicate deal strips on dates that have no physical power flow. (We= st Light Load currently does this in Start view, but not Active view) Navigating around the scheduling sheet itself I was able to accurately exec= ute the sort function on a single criteria at a time. Multiple sorting will= contunue to be done in excel, or we can do a series of single sorts in the= PSW to acheive the same result. Routing deals: Will had deleted all routes for June 18th, starting me with = a clean slate. I made every path be for DAY. I was unable to confirm total= unrouted MWH, as the real time position manager does not seem to be functi= oning in TESTP. The routing appeared to take 19 minutes with the status bar= showing steady progress during that time. This time is 15-17 minutes longe= r than current speed using the Excel Macro system we have now. The error li= st gave me a row by row description of what did not route, a very useful to= ol. OK was visible on all rows that the PSW believed that it had routed. I= had difficulty checking the routing results, as I kept getting BDE errors = in Scheduling after routing had occurred (Local Enpower). Scheduling kept s= tarting up in 1899. I was unable to login to TestP through Terminal server= 2, but was able to in Terminal Server 5. The results there were very encou= raging! Most routing was done, and a spot check of deals shows that they we= re routed properly. The deals that were not routed appear to be due to a us= er error of deal number duplication in the sheet. This is consistent with w= hat I would expect. I will further evaluate routing ability with our more c= omplicated paths later. This routing was very easy, a large point with on p= eak non shaped deals only. Things I did not expect that I liked: When I highlight a group of cells in Build Route, it stays highlighted when= I move up to the scheduling sheet to highlight a comparison group of cells= . This is very handy for double checking Build route against the scheduler= 's sheet. What does not appear to be working at this time: The physical or not physical flag of path does not seem to be showing up pr= operly in routing. Path Confirmation: The running time appeared to be over one hour for one s= heet, only 70 rows of the sheet being flagged for insertion into confirmati= on. This current speed will not be sufficient to work in production. Also, = many rows that were flagged for confirmation were not imported, and I canno= t find an error log to help understand why deals were not imported to path = confirmation. When the path confirmation task was finished, the application simply froze= . The status bar was no longer visible, leading me to believe that it was = done, however the app was not able to be saved or closed or examined furthe= r. My conclusions: The build route and routing functions work well enough to use in production= , the copy-paste function works well for the West desk per our connectivity= issues. Path Confirmation is not functioning at this point, and appears to be blowi= ng up the app. No data was visible for June 18th even after the PSW ran thr= ough its import function. Please let me know when the issues I have named have been addressed and are= ready for further testing. Thanks Cara 503/464-3814 |
Source
Code Block | ||
---|---|---|
| ||
{ "description": "All of the Enron emails corpus with TextRank keyword extraction enabled.", "extractTypeisPublic": "File"true, "filemediaType": "Email", { "searchCycle_secs": -1, "domaintags": "DOMAIN",[ "passwordenron": "PASSWORD", "usernameemail": "USER", }, "fraud" "isPublic": true], "mediaTypetitle": "Email",All Enron Emails "searchCycle_secs": -1(TextRank)", "structuredAnalysisprocessingPipeline": {[ "associations":{ [ "file": { "associationsdomain": ["DOMAIN", "password": "PASSWORD", { "username": "USER", "assoc_typeurl": "Event",smb://modus:139/enron/enron_mail_20110402/maildir/" } }, "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", { "harvest": { "entity2": "$SCRIPT(return _value;)", "searchCycle_secs": 1 } "iterateOver": "Message-To", }, { "time_startdocMetadata": { "title": "$SCRIPT( return _doc.publishedDatemetadata._FILE_METADATA_[0].metadata.subject[0];)", } }, "verb": "emailed", { "text": [ "verb_category": "emailed/communicated" { } "fieldName": "fullText", ], "iterateOverscript": "email_meta"(?:\\[.*?\\])", } ],"scriptlang": "regex", "entities": [ "flags": "md", { "dimensionreplacement": "What ", "disambiguated_name": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)",}, { "type": "Account", "useDocGeofieldName": false"description", }, "script": "(?:\\[.*?\\])", { "entitiesscriptlang": ["regex", { "flags": "md", "dimensionreplacement": "What", " }, "disambiguated_name": "", { "iterateOverfieldName": "Message-TofullText", "typescript": "Account<.*?>", "scriptlang": "regex", "useDocGeo": false "flags": "md", } ], "replacement": ". " "iterateOver": "email_meta" }, } { ], "scriptEngine": "JavaScript", "titlefieldName": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)""description", }, "tags": [ "enron", "emailscript": "<.*?>", "fraud" ], "titlescriptlang": "All Enron Emails (TextRank)regex", "unstructuredAnalysis": { "meta": [ "flags": "md", { "contextreplacement": "All. ", "fieldName": "email_meta",}, { "flags": "m", "scriptfieldName": "var x=_metadata._FILE_METADATA_[0].metadata;x;fullText", "scriptlangscript": "javascript"(?:>|<)", } ]"scriptlang": "regex", "simpleTextCleanser": [ "flags": {"md", "fieldreplacement": "fullText. ", "flags": "md",}, { "replacement": " ", "scriptfieldName": "(?:\\[.*?\\])description", "scriptlangscript": "regex"(?:>|<)", }, "scriptlang": "regex", { "fieldflags": "descriptionmd", "flagsreplacement": "md. ", }, "replacement": " ", { "script": "(?:\\[.*?\\])", "scriptlangfieldName": "regexfullText", }, "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", { "fieldscriptlang": "fullTextregex", "flagsreplacement": "md ", "replacement": ". ",}, { "script": "<.*?>", "scriptlangfieldName": "regexdescription", }, "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", { "fieldscriptlang": "descriptionregex", "flags": "md", "replacement": ". ", "script": "<.*?>"}, "scriptlang": "regex"{ }, "fieldName": "fullText", { "fieldscript": "fullText(?:\\*{2,})", "flagsscriptlang": "mdregex", "replacement": ". ", "script": "(?:>|<)"}, "scriptlang": "regex"{ }, "fieldName": "description", { "fieldscript": "description(?:\\*{2,})", "flagsscriptlang": "mdregex", "replacement": ". ", "script": "(?:>|<)",} ] "scriptlang": "regex" }, { }, "contentMetadata": {[ "field": "fullText",{ "replacementfieldName": " email_meta", "script": "(?:[-]{4,}(.*[-]{4,}|\\n))",var x=_metadata._FILE_METADATA_[0].metadata;x;", "scriptlang": "regexjavascript", }, "flags": "m" { } "field": "description", ] }, "replacement": " ", { "scriptfeatureEngine": "(?:[-]{4,}(.*[-]{4,}|\\n))", "scriptlangengineName": "regextextrank" }, }, { { "fieldentities": "fullText",[ { "replacement": " ", "scriptdimension": "(?:\\*{2,})What", "scriptlangdisambiguated_name": "regex"$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", }, "type": "Account", { "fielduseDocGeo": false "description", }, "replacement": " ", { "script": "(?:\\*{2,})", "scriptlangdimension": "regexWhat", } "disambiguated_name": "", ] }, "url "iterateOver": "smb://modus:139/enron/enron_mail_20110402/maildir/email_meta.Message-To", "useExtractor": "textrank", "useTextExtractor": "none" } |
Sample output document
Code Block | ||
---|---|---|
| ||
{ "_idtype": "5048efb0e4b01fd6455420eeAccount", "title": "RE: Testing Preschedule workspace", "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3", "createduseDocGeo": false "Sep 6, 2012 06:42:01 PM UTC", "modified": "Jul 24, 2012 01:13:02 AM UTC", } "publishedDate": "Jul 9,] 2001 06:33:32 PM UTC", "source": [}, { "Enron Emails (TextRank)" ], "sourceKeyassociations": [ "modus.139.enron.testing.." ], { "mediaType": [ "Email" ], "descriptionassoc_type": "Event"I, am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ", "entities": [ "entity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", "entity2": "$SCRIPT(return _value;)", "iterateOver": "email_meta.Message-To", "time_start": "$SCRIPT( return _doc.publishedDate;)", "verb": "emailed", "verb_category": "emailed/communicated" } ] } ] } |
Sample output document
Code Block | ||
---|---|---|
| ||
{ "_id": "5048efb0e4b01fd6455420ee", "title": "RE: Testing Preschedule workspace", "url": "smb://modus:139/enron/testing/semperger-c/deleted_items/37QTKE~3", "created": "Sep 6, 2012 06:42:01 PM UTC", "modified": "Jul 24, 2012 01:13:02 AM UTC", "publishedDate": "Jul 9, 2001 06:33:32 PM UTC", "source": [ "Enron Emails (TextRank)" ], "sourceKey": [ "modus.139.enron.testing.." ], "mediaType": [ "Email" ], "description": "I am trying to pull it up now, it's taking a long time\r\n\r\n \r\nFrom: \tSmith, Will \r\nSent:\tMonday, July 09, 2001 11:28 AM\r\nTo:\tSemperger, Cara\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nYes, but Vish made the changes in Table Edit. : - )\r\n\r\nWill\r\n\r\n \r\nFrom: \tSemperger, Cara \r\nSent:\tMonday, July 09, 2001 1:20 PM\r\nTo:\tSmith, Will\r\nSubject:\tRE: Testing Preschedule workspace\r\n\r\nSo, this table edit that Brett is asking me to test is really from ", "entities": [ { "disambiguated_name": "on- june 18-paloverde-day", "index": "on- june 18-paloverde-day/keyword", "actual_name": "on- june 18-paloverde-day", "type": "Keyword", "relevance": 0.10585404743253149, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "mulitple times additional data", "index": "mulitple times additional data/keyword", "actual_name": "mulitple times additional data", "type": "Keyword", "relevance": 0.18088061045762382, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "scheduling sheets", "index": "scheduling sheets/keyword", "actual_name": "scheduling sheets", "type": "Keyword", "relevance": 0.15086086188384693, "frequency": 1, "totalfrequency": 20, "doccount": 20, "dimension": "What" }, { "disambiguated_name": "app", "index": "app/keyword", "actual_name": "app", "type": "Keyword", "relevance": 0.20415634782171557, "frequency": 1, "totalfrequency": 58, "doccount": 58, "dimension": "What" }, { "disambiguated_name": "data", "index": "data/keyword", "actual_name": "data", "type": "Keyword", "relevance": 0.1361375118885727, "frequency": 1, "totalfrequency": 3323, "doccount": 3323, "dimension": "What" }, { "disambiguated_name": "paths", "index": "paths/keyword", "actual_name": "paths", "type": "Keyword", "relevance": 0.2041916488834702, "frequency": 1, "totalfrequency": 99, "doccount": 99, "dimension": "What" }, { "disambiguated_name": "build route report", "index": "build route report/keyword", "actual_name": "build route report", "type": "Keyword", "relevance": 0.11476307758997932, "frequency": 1, "totalfrequency": 36, "doccount": 36, "dimension": "What" }, { "disambiguated_name": "testing preschedule workspace cara", "index": "testing preschedule workspace cara/keyword", "actual_name": "testing preschedule workspace cara", "type": "Keyword", "relevance": 0.16803833041631702, "frequency": 1, "totalfrequency": 8, "doccount": 8, "dimension": "What" }, { "disambiguated_name": "physical power flow", "index": "physical power flow/keyword", "actual_name": "physical power flow", "type": "Keyword", "relevance": 0.11805512187037151, "frequency": 1, "totalfrequency": 17, "doccount": 17, "dimension": "What" }, { "disambiguated_name": "i", "index": "i/keyword", "actual_name": "i", "type": "Keyword", "relevance": 0.13651904141534263, "frequency": 1, "totalfrequency": 18162, "doccount": 18162, "dimension": "What" }, { "disambiguated_name": "total running time", "index": "total running time/keyword", "actual_name": "total running time", "type": "Keyword", "relevance": 0.11233232851584997, "frequency": 1, "totalfrequency": 10, "doccount": 10, "dimension": "What" }, { "disambiguated_name": "time", "index": "time/keyword", "actual_name": "time", "type": "Keyword", "relevance": 0.34020922533185516, "frequency": 1, "totalfrequency": 17102, "doccount": 17102, "dimension": "What" }, { "disambiguated_name": "psw", "index": "psw/keyword", "actual_name": "psw", "type": "Keyword", "relevance": 0.13625985262266815, "frequency": 1, "totalfrequency": 46, "doccount": 46, "dimension": "What" }, { "disambiguated_name": "semperger", "index": "semperger/keyword", "actual_name": "semperger", "type": "Keyword", "relevance": 0.2724417241053495, "frequency": 1, "totalfrequency": 226, "doccount": 226, "dimension": "What" }, { "disambiguated_name": "peak non shaped deals", "index": "peak non shaped deals/keyword", "actual_name": "peak non shaped deals", "type": "Keyword", "relevance": 0.19127581970645322, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "on- june 18-paloverde-daytable edit", "index": "on- june 18-paloverde-daytable edit/keyword", "actual_name": "on- june 18-paloverde-daytable edit", "type": "Keyword", "relevance": 0.1058540474325314921207334129182112, "frequency": 1, "totalfrequency": 1232, "doccount": 1232, "dimension": "What" }, { "disambiguated_name": "mulitpleweek timesmask additionalactivated-report datachanges", "index": "mulitpleweek timesmask additionalactivated-report datachanges/keyword", "actual_name": "mulitpleweek timesmask additionalactivated-report datachanges", "type": "Keyword", "relevance": 0.180880610457623821484580867667756, "frequency": 1, "totalfrequency": 12, "doccount": 12, "dimension": "What" }, { "disambiguated_name": "schedulingexcel macro sheetssystem", "index": "schedulingexcel macro sheetssystem/keyword", "actual_name": "schedulingexcel macro sheetssystem", "type": "Keyword", "relevance": 0.1508608618838469312208201691477336, "frequency": 1, "totalfrequency": 2012, "doccount": 2012, "dimension": "What" }, { "disambiguated_name": "appreal time position manager", "index": "appreal time position manager/keyword", "actual_name": "appreal time position manager", "type": "Keyword", "relevance": 0.2041563478217155719213464212989614, "frequency": 1, "totalfrequency": 5839, "doccount": 5839, "dimension": "What" }, { "disambiguated_name": "datatesting preschedule workspace", "index": "datatesting preschedule workspace/keyword", "actual_name": "datatesting preschedule workspace", "type": "Keyword", "relevance": 0.136137511888572717652180791002264, "frequency": 1, "totalfrequency": 332312, "doccount": 332312, "dimension": "What" }, { "disambiguated_name": "pathscara", "index": "pathscara/keyword", "actual_name": "pathscara", "type": "Keyword", "relevance": 0.204191648883470220414801224595303, "frequency": 1, "totalfrequency": 99736, "doccount": 99736, "dimension": "What" }, { "disambiguated_name": "build route reportsmith", "index": "build route reportsmith/keyword", "actual_name": "build route reportsmith", "type": "Keyword", "relevance": 0.1147630775899793227217844252943296, "frequency": 1, "totalfrequency": 36783, "doccount": 36783, "dimension": "What" }, { "disambiguated_name": "testing preschedule workspace caradavid subject", "index": "testing preschedule workspace caradavid subject/keyword", "actual_name": "testing preschedule workspace caradavid subject", "type": "Keyword", "relevance": 0.1680383304163170215139765579194864, "frequency": 1, "totalfrequency": 8930, "doccount": 8930, "dimension": "What" }, { "disambiguated_name": "physical power flowsheet", "index": "physical power flowsheet/keyword", "actual_name": "physical power flowsheet", "type": "Keyword", "relevance": 0.1180551218703715120416968108320477, "frequency": 1, "totalfrequency": 17436, "doccount": 17436, "dimension": "What" }, { "disambiguated_name": "itotal unrouted mwh", "index": "itotal unrouted mwh/keyword", "actual_name": "itotal unrouted mwh", "type": "Keyword", "relevance": 0.136519041415342631141385057566826, "frequency": 1, "totalfrequency": 1816216, "doccount": 1816216, "dimension": "What" }, { "disambiguated_name": "totaltarget testing runningdate timetoday", "index": "totaltarget testing runningdate timetoday/keyword", "actual_name": "totaltarget testing runningdate timetoday", "type": "Keyword", "relevance": 0.1123323285158499718726422286448255, "frequency": 1, "totalfrequency": 1012, "doccount": 1012, "dimension": "What" }, { "disambiguated_name": "timedeals", "index": "timedeals/keyword", "actual_name": "timedeals", "type": "Keyword", "relevance": 0.3402092253318551634025706056156424, "frequency": 1, "totalfrequency": 171025740, "doccount": 171025261, "dimension": "What" }, { "disambiguated_name": "psw"": "double checking build route", "index": "pswdouble checking build route/keyword", "actual_name": "pswdouble checking build route", "type": "Keyword", "relevance": 0.1362598526226681518886230001363824, "frequency": 1, "totalfrequency": 4612, "doccount": 4612, "dimension": "What" }, { "disambiguated_name": "sempergerpath confirmation task", "index": "sempergerpath confirmation task/keyword", "actual_name": "sempergerpath confirmation task", "type": "Keyword", "relevance": 0.272441724105349512326679747563907, "frequency": 1, "totalfrequency": 22616, "doccount": 22616, "dimension": "What" }, { "disambiguated_name": "peak non shaped dealsroutes", "index": "peak non shaped dealsroutes/keyword", "actual_name": "peak non shaped dealsroutes", "type": "Keyword", "relevance": 0.1912758197064532240825322818399834, "frequency": 1, "totalfrequency": 12142, "doccount": 12142, "dimension": "What" }, { "disambiguated_name": "tablewest light editload", "index": "tablewest light editload/keyword", "actual_name": "tablewest light editload", "type": "Keyword", "relevance": 0.2120733412918211211288042191103252, "frequency": 1, "totalfrequency": 3216, "doccount": 3216, "dimension": "What" }, { "disambiguated_name": "week mask activated-report changesrows", "index": "week mask activated-report changesrows/keyword", "actual_name": "week mask activated-report changes "rows", "type": "Keyword", "relevance": 0.14845808676677562721919612854695, "frequency": 1, "totalfrequency": 1272, "doccount": 1272, "dimension": "What" }, { "disambiguated_name": "excelpath macro systemconfirmation", "index": "excelpath macro systemconfirmation/keyword", "actual_name": "excelpath macro systemconfirmation", "type": "Keyword", "relevance": 0.122082016914773362124247462661659, "frequency": 1, "totalfrequency": 12169, "doccount": 12169, "dimension": "What" }, { "disambiguated_name": "realmonth timelong position managerbookouts", "index": "realmonth timelong position managerbookouts/keyword", "actual_name": "realmonth timelong position managerbookouts", "type": "Keyword", "relevance": 0.1921346421298961412514486683483175, "frequency": 1, "totalfrequency": 3918, "doccount": 3918, "dimension": "What" }, { "disambiguated_name": "testingdeal preschedulenumber workspaceduplication", "index": "testingdeal preschedulenumber workspaceduplication/keyword", "actual_name": "testingdeal preschedulenumber workspaceduplication", "type": "Keyword", "relevance": 0.1765218079100226412910499876653425, "frequency": 1, "totalfrequency": 1216, "doccount": 1216, "dimension": "What" }, { "disambiguated_name": "caraminutes", "index": "caraminutes/keyword", "actual_name": "caraminutes", "type": "Keyword", "relevance": 0.2041480122459530313613482658399254, "frequency": 1, "totalfrequency": 7361234, "doccount": 7361172, "dimension": "What" }, { "disambiguated_name": "smithcara.semperger@enron.com", "index": "smith/keywordcara.semperger@enron.com/account", "actual_name": "smithcara.semperger@enron.com", "type": "KeywordAccount", "relevance": 0.27217844252943296, "frequency": 1, "totalfrequency": 7833251, "doccount": 7833251, "dimension": "What" }, { "disambiguated_name": "david subjectwill.smith@enron.com", "index": "david subject/keywordwill.smith@enron.com/account", "actual_name": "david subjectwill.smith@enron.com", "type": "KeywordAccount", "relevance": 0.15139765579194864, "frequency": 1, "totalfrequency": 930408, "doccount": 930408, "dimension": "What" }, ], { "tags": [ "disambiguated_name": "sheet", enron", "index": "sheet/keyword"email", "fraud" "actual_name": "sheet" ], "communityId": [ "type": "Keyword500df237e4b00e332fe993aa", ], "relevanceassociations": 0.20416968108320477,[ { "frequency": 1, "totalfrequencyentity1": 436,"cara.semperger@enron.com", "entity1_index": "doccount": 436cara.semperger@enron.com/account", "dimensionverb": "Whatemailed", }, "verb_category": "emailed/communicated", { "disambiguated_name"entity2": "total unrouted mwhwill.smith@enron.com", "entity2_index": "total unrouted mwh/keywordwill.smith@enron.com/account", "actualtime_namestart": "total unrouted mwh2001-07-09T14:33:32", "assoc_type": "KeywordEvent", } "relevance": 0.1141385057566826 ], "metadata": { "frequency_FILE_METADATA_": 1,[ [ "totalfrequency": 16, { "doccount": 16, "dimension": "What" "metadata": { }, { "disambiguated_nameCreation-Date": "target[ testing date today", "index": "target testing date today/keyword", "2001-07-09T18:33:32Z" "actual_name": "target testing date today", "type": "Keyword", ], "relevance": 0.18726422286448255, "frequencysubject": 1,[ "totalfrequency": 12, "doccount"RE: 12,Testing Preschedule workspace" "dimension": "What" }, ], { "disambiguated_name": "deals", "index"Message-From": "deals/keyword",[ "actual_name": "deals", "type": "Keywordcara.semperger@enron.com", "relevance": 0.34025706056156424, "frequency": 1], "totalfrequency": 5740, "doccountAuthor": 5261,[ "dimension": "What" }, "cara.semperger@enron.com" { "disambiguated_name": "double checking build route", ], "index": "double checking build route/keyword", "actual_nameMessage-To": "double[ checking build route", "type": "Keyword", "relevance": 0.18886230001363824, will.smith@enron.com" "frequency": 1, "totalfrequency": 12, ], "doccount": 12, "dimensiondate": "What"[ }, { "2001-07-09T18:33:32Z" "disambiguated_name": "path confirmation task", "index": "path confirmation task/keyword", ], "actual_name": "path confirmation task", "typeContent-Type": "Keyword",[ "relevance": 0.12326679747563907, "frequency": 1,message/rfc822" "totalfrequency": 16, ] "doccount": 16, } "dimension": "What" }, } { ] "disambiguated_name": "routes", ], "indexemail_meta": "routes/keyword",[ "actual_name": "routes",[ "type": "Keyword", { "relevance": 0.40825322818399834, "frequencyCreation-Date": [ 1, "totalfrequency": 142, "2001-07-09T18:33:32Z" "doccount": 142, "dimension": "What" ], }, { "Message-To": [ "disambiguated_name": "west light load", "index": "west light load/keyword", "will.smith@enron.com" "actual_name": "west light load", ], "type": "Keyword", "relevanceContent-Type": 0.11288042191103252,[ "frequency": 1, "totalfrequencymessage/rfc822": 16, "doccount": 16, ], "dimension": "What" }, "subject": [ { "disambiguated_name": "rows", "RE: Testing Preschedule "indexworkspace": "rows/keyword", "actual_name": "rows", ], "type": "Keyword", "relevancedate": 0.2721919612854695, [ "frequency": 1, "2001-07-09T18:33:32Z" "totalfrequency": 72, "doccount": 72], "dimension": "What" "Author": [ }, { "disambiguated_name": "path confirmation", cara.semperger@enron.com" "index": "path confirmation/keyword", ], "actual_name": "path confirmation", "typeMessage-From": "Keyword",[ "relevance": 0.2124247462661659, "frequency": 1,"cara.semperger@enron.com" "totalfrequency": 169, ] "doccount": 169, "dimension": "What"} }, ] { ] } } |
Annex - Old Format Source
Code Block | ||
---|---|---|
| ||
{ "disambiguated_namedescription": "monthAll longof bookouts",the Enron emails corpus with TextRank keyword extraction enabled.", "indexextractType": "monthFile", long bookouts/keyword", "file": { "actual_namedomain": "month long bookoutsDOMAIN", "typepassword": "KeywordPASSWORD", "relevanceusername": 0.12514486683483175,"USER" }, "frequencyisPublic": 1true, "mediaType": "Email", "totalfrequencysearchCycle_secs": 18-1, "structuredAnalysis": { "doccountassociations": 18,[ "dimension": "What"{ }, "associations": [ { "disambiguated_name": "deal number duplication", { "index": "deal number duplication/keyword", "actualassoc_nametype": "deal number duplicationEvent", "type": "Keyword", "relevanceentity1": "$SCRIPT( return _doc.metadata._FILE_METADATA_[0.12910499876653425].metadata.Author[0];)", "frequency": 1, "totalfrequencyentity2": 16"$SCRIPT(return _value;)", "doccount": 16, "dimensioniterateOver": "What" Message-To", }, { "disambiguatedtime_namestart": "minutes$SCRIPT( return _doc.publishedDate;)", "index": "minutes/keyword", "actual_nameverb": "minutesemailed", "type": "Keyword", "relevanceverb_category": 0.13613482658399254,"emailed/communicated" "frequency": 1, } "totalfrequency": 1234, ], "doccount": 1172, "dimensioniterateOver": "Whatemail_meta" }, } { ], "disambiguated_nameentities": "cara.semperger@enron.com", [ { "indexdimension": "cara.semperger@enron.com/accountWhat", "actualdisambiguated_name": "cara.semperger@enron.com$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.Author[0];)", "type": "Account", "relevance": 0, "frequencyuseDocGeo": false 1, }, "totalfrequency": 3251, { "doccount": 3251, "dimensionentities": [ "What" }, { { "disambiguated_name": "will.smith@enron.com", "indexdimension": "will.smith@enron.com/accountWhat", "actual_name": "will.smith@enron.com", "typedisambiguated_name": "Account", "relevance": 0, "frequencyiterateOver": 1, "Message-To", "totalfrequency": 408, "doccounttype": 408"Account", "dimension": "What" } "useDocGeo": false ], "tags": [ "enron", } "email", "fraud" ], "communityId": [ "500df237e4b00e332fe993aaiterateOver": "email_meta" ], "associations": [ } { ], "entity1scriptEngine": "cara.semperger@enron.comJavaScript", "entity1_indextitle": "cara.semperger@enron.com/account"$SCRIPT( return _doc.metadata._FILE_METADATA_[0].metadata.subject[0];)" }, "tags": [ "verb": "emailedenron", "email", "verb_category": "emailed/communicated", "fraud" ], "entity2title": "will.smith@enron.comAll Enron Emails (TextRank)", "unstructuredAnalysis": { "entity2_indexmeta": "will.smith@enron.com/account",[ { "time_start": "2001-07-09T14:33:32", "assoc_typecontext": "EventAll", } ], "metadatafieldName": { "_FILE_METADATA_": ["email_meta", [ "flags": "m", { "script": "var x=_metadata._FILE_METADATA_[0].metadata;x;", "metadatascriptlang": {"javascript" } ], "Creation-DatesimpleTextCleanser": [ { "field"2001-07-09T18:33:32Z": "fullText", ], "flags": "md", "subject"replacement": [" ", "script": "(?:\\[.*?\\])", "RE: Testing Preschedule workspace" "scriptlang": "regex" }, ], { "field": "Message-Fromdescription":, [ "flags": "md", "cara.semperger@enron.com" "replacement": " ", "script": "(?:\\[.*?\\])", "scriptlang": "regex" "Author": [ }, { "cara.semperger@enron.com" "field": "fullText", ]"flags": "md", "replacement": ". ", "Message-To": [ "script": "<.*?>", "scriptlang": "will.smith@enron.comregex" }, ], { "field": "description", "date": [ "flags": "md", "replacement": "2001-07-09T18:33:32Z". ", "script": "<.*?>", ], "scriptlang": "regex" "Content-Type": [}, { "field": "message/rfc822fullText", "flags": "md", ] "replacement": ". ", } "script": "(?:>|<)", } "scriptlang": "regex" ] }, ], { "email_meta": [ [ "field": "description", { "flags": "md", "Creation-Datereplacement": ". [", "script": "(?:>|<)", "2001-07-09T18:33:32Z" "scriptlang": "regex" ]}, { "Message-To": [ "field": "fullText", "will.smith@enron.comreplacement": " ", ]"script": "(?:[-]{4,}(.*[-]{4,}|\\n))", "Content-Type"scriptlang": ["regex" }, "message/rfc822" { ], "field": "description", "subjectreplacement": [" ", "script": "(?:[-]{4,}(.*[-]{4,}|\\n))", "RE: Testing Preschedule workspace" "scriptlang": "regex" ]}, { "date": [ "field": "fullText", "2001-07-09T18:33:32Z" "replacement": " ", ], "script": "(?:\\*{2,})", "Authorscriptlang": ["regex" }, "cara.semperger@enron.com" { ]"field": "description", "Message-From"replacement": [" ", "cara.semperger@enron.com""script": "(?:\\*{2,})", "scriptlang": "regex" ] } } ] }, ]"url": "smb://modus:139/enron/enron_mail_20110402/maildir/", "useExtractor": "textrank", ] }"useTextExtractor": "none" } |