place holder
Description
This section describes the configuration details for the supported extractors, and provides examples where applicable.
About Alchemy API
There are two Alchemy services that can be called:
- Alchemy API
- Alchemy API metadata*
*includes many of the same features of Alchemy API but also allows more advanced batching of documents and keyword control.
Both of these services can support both text extraction and feature extraction. However, if you only need to perform text extraction Alchemy API is specified.
Alchemy API
You can use engineConfig
to pass the parameters of the Alchemy API configuration, as described below
Parameter | Description |
---|---|
postproc
| Possible values: "1","2","3" Default value is "3."
|
"1" does some post-processing of geographic entities (AlchemyAPI tends to prefer US results even when the context clearly indicates a US location),
| |
"2" does some post-processing of person entities (AlchemyAPI tends to prefer famous people even when the context does not support that) | |
"3" does both. | |
sentiment | Possible values: True or False. Default value is True. If enabled, a sentiment metric is attached to each extracted entity. Note that this results in use of an extra AlchemyAPI credit per document. |
concepts | Possible values: True or false. Default value is false. If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. Note that this results in use of an extra AlchemyAPI credit per document. |
Alchemy API metadata
You can use engineConfig
to pass the parameters of the Alchemy API metadata configuration, as described below
Parameter | Description | Data Type |
---|---|---|
sentiment | Possible values: True or false False is default value. If enabled, a sentiment metric is attached to each extracted entity. Note that this results in use of an extra AlchemyAPI credit per document. | |
concepts | Possible values: True or false. True is default setting. If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. Note that this results in use of an extra AlchemyAPI credit per document. | |
batchSize | a string containing an integer, turned off by default. If turned on, the AlchemyAPI call goes out on a batch of documents (the specified number). This makes processing of small documents like tweets more economical (in return for a reduction in accuracy, eg the sentiment is calculated over the batch not each individual tweet). | string, integer |
numKeywords | a string containing an integer, uses the AlchemyAPI default (currently 50) if not specified. If specified, controls the number of keywords returned. If batching is enabled then the requested number is multiplied by the batch size. | string, integer |
strict | Possible values: True or False. False is default setting. If enabled, fewer high quality keywords are extracted.
|
Examples
Alchemy API
Using Alchemy API As A Text Extractor
In the example below, Alchemy API is only used as a text extractor. As such most of the configuration parameters are not applicable and the default settings can be taken. In this specific example, featureEngine
uses OpenCalais.
Source Configuration:
{ "description": "Article on Medical Issues", "harvestBadSource": false, "isApproved": true, "isPublic": true, "key": "http.www.mayoclinic.com.rss.blog.xml", "mediaType": "News", "modified": "Oct 19, 2010 11:31:59 AM", "tags": [ "topic:healthcare", "industry:healthcare", "mayo clinic", "health" ], "title": "MayoClinic: General Topics", "processingPipeline": [ { "feed": { "extraUrls": [ { "url": "http://www.mayoclinic.com/rss/blog.xml" } ] } }, { "textEngine": { "engineName": "AlchemyAPI" } }, { "featureEngine": { "engineName": "OpenCalais" } } ] }
Output:
The output contains the "description" and entities resulting from the textEngine
and featureEngine
settings.
{ "_id" : "4e1c8afa7d56bb818ed10f76", "created" : "1310493434159", "description" : "Clarify the role of carbohydrates in the Dr. Bernstein diet and find a healthy eating plan that works for you.", "entities" : [ { "actual_name" : "certified diabetes", "dimension" : "What", "disambiguous_name" : "certified diabetes", "doccount" : NumberLong(38), "frequency" : 3, "gazateer_index" : "certified diabetes/medicalcondition", "relevance" : "0.711", "totalfrequency" : NumberLong(114), "type" : "MedicalCondition" }, { "actual_name" : "Diabetes Unit", "dimension" : "Who", "disambiguous_name" : "Diabetes Unit", "doccount" : NumberLong(38), "frequency" : 1, "gazateer_index" : "diabetes unit/organization", "relevance" : "0.235", "totalfrequency" : NumberLong(38), "type" : "Organization" }, { "actual_name" : "Mayo Clinic", "dimension" : "What", "disambiguous_name" : "Mayo Clinic", "doccount" : NumberLong(514), "frequency" : 2, "gazateer_index" : "mayo clinic/facility", "relevance" : "0.305", "totalfrequency" : NumberLong(1033), "type" : "Facility" },
Alchemy API Metadata
Feature Extraction
In this example, Alchemy API metadata is used for feature extraction. It is configured to act on a batch of documents (100) and to return a maximum of 5 keywords per document. The strict setting will return more high quality keywords, and less keywords overall.
Source Configuration:
The source configuration shows how Alchemy API Metadata parameters can be used to set batch sizing and keywords settings. In addition, the beginning of the entities block is included to show how automatic feature extraction and manual entities can be combined to achieve highly customizable results.
}, { "featureEngine": { "engineName": "AlchemyAPI-metadata", "engineConfig": { "app.alchemyapi-metadata.batchSize": 100, "app.alchemyapi-metadata.numKeywords": 5, "app.alchemyapi-metadata.strict": "true" } } }, { "entities": [ { "actual_name": "$metadata.json.actor.displayName", "dimension": "Who", "disambiguated_name": "$metadata.json.actor.preferredUsername", "linkdata": "$metadata.json.actor.link", "type": "TwitterHandle" },
Output:
The output reveals the results of featureEngine
and entities
. The entities are returned indexed by keyword.
}, { "actual_name": "Amex Teams", "dimension": "What", "disambiguated_name": "Amex Teams", "doccount": -1, "frequency": 1, "index": "amex teams/keyword", "relevance": 0.758636, "sentiment": 0.160753, "totalfrequency": -1, "type": "Keyword" }, { "actual_name": "Halo", "dimension": "What", "disambiguated_name": "Halo", "doccount": -1, "frequency": 1, "index": "halo/keyword", "relevance": 0.461833, "sentiment": 0.168822, "totalfrequency": -1, "type": "Keyword" }, { "actual_name": "Master Chief Incentives", "dimension": "What", "disambiguated_name": "Master Chief Incentives", "doccount": -1, "frequency": 1, "index": "master chief incentives/keyword", "relevance": 0.981457, "sentiment": 0.168876, "totalfrequency": -1, "type": "Keyword" },