Examples
This section describes the configuration details for the supported extractors, and provides examples where applicable.
Alchemy API
Parameter | Description |
---|---|
postproc
| Possible values: "1","2","3" Default value is "3."
|
"1" does some post-processing of geographic entities (AlchemyAPI tends to prefer US results even when the context clearly indicates a US location),
| |
"2" does some post-processing of person entities (AlchemyAPI tends to prefer famous people even when the context does not support that) | |
"3" does both. | |
sentiment | Possible values: True or False. Default value is True. If enabled, a sentiment metric is attached to each extracted entity. Note that this results in use of an extra AlchemyAPI credit per document. |
concepts | Possible values: True or false. Default value is false. If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. Note that this results in use of an extra AlchemyAPI credit per document. |
Examples
The example below shows sample code which uses the Alchemy API to parse data from a RSS feed. The data can then be used to form some entities and associations. In the example, OpenCalais is also used as the featureEngine.
{ "description": "Article on Medical Issues", "harvestBadSource": false, "isApproved": true, "isPublic": true, "key": "http.www.mayoclinic.com.rss.blog.xml", "mediaType": "News", "modified": "Oct 19, 2010 11:31:59 AM", "tags": [ "topic:healthcare", "industry:healthcare", "mayo clinic", "health" ], "title": "MayoClinic: General Topics", "processingPipeline": [ { "feed": { "extraUrls": [ { "url": "http://www.mayoclinic.com/rss/blog.xml" } ] } }, { "textEngine": { "engineName": "AlchemyAPI" } }, { "featureEngine": { "engineName": "OpenCalais" } } ] }
The Alchemy API will then return an array of entities based on its default configuration, since engineConfig
was not used to specify any custom configuration parameters. For example,
{ "_id" : "4e1c8afa7d56bb818ed10f76", "created" : "1310493434159", "description" : "Clarify the role of carbohydrates in the Dr. Bernstein diet and find a healthy eating plan that works for you.", "entities" : [ { "actual_name" : "certified diabetes", "dimension" : "What", "disambiguous_name" : "certified diabetes", "doccount" : NumberLong(38), "frequency" : 3, "gazateer_index" : "certified diabetes/medicalcondition", "relevance" : "0.711", "totalfrequency" : NumberLong(114), "type" : "MedicalCondition" }, { "actual_name" : "Diabetes Unit", "dimension" : "Who", "disambiguous_name" : "Diabetes Unit", "doccount" : NumberLong(38), "frequency" : 1, "gazateer_index" : "diabetes unit/organization", "relevance" : "0.235", "totalfrequency" : NumberLong(38), "type" : "Organization" }, { "actual_name" : "Mayo Clinic", "dimension" : "What", "disambiguous_name" : "Mayo Clinic", "doccount" : NumberLong(514), "frequency" : 2, "gazateer_index" : "mayo clinic/facility", "relevance" : "0.305", "totalfrequency" : NumberLong(1033), "type" : "Facility" },
Alchemy API metadata
The Alchemy API can also perform feature extraction by configuring the metadata parameters.
Parameter | Description | Data Type |
---|---|---|
sentiment | Possible values: True or false False is default value. If enabled, a sentiment metric is attached to each extracted entity. Note that this results in use of an extra AlchemyAPI credit per document. | |
concepts | Possible values: True or false. True is default setting. If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. Note that this results in use of an extra AlchemyAPI credit per document. | |
batchSize | a string containing an integer, turned off by default. If turned on, the AlchemyAPI call goes out on a batch of documents (the specified number). This makes processing of small documents like tweets more economical (in return for a reduction in accuracy, eg the sentiment is calculated over the batch not each individual tweet). | string, integer |
numKeywords | a string containing an integer, uses the AlchemyAPI default (currently 50) if not specified. If specified, controls the number of keywords returned. If batching is enabled then the requested number is multiplied by the batch size. | string, integer |
strict | Possible values: True or False. False is default setting. If enabled, fewer high quality keywords are extracted.
|
Example
You can use the engineConfig
object to pass configuration parameters along to the feature engine.
In this example, the Alchemy API is configured to act on a batch of documents (100) and to return a maximum of 5 keywords per document. The strict setting will return more high quality keywords, and less keywords overall.
}, { "featureEngine": { "engineName": "AlchemyAPI-metadata", "engineConfig": { "app.alchemyapi-metadata.batchSize": 100, "app.alchemyapi-metadata.numKeywords": 5, "app.alchemyapi-metadata.strict": "true" } } },