place holder

Description

This section describes the configuration details for the supported extractors, and provides examples where applicable.

About Alchemy API

There are two Alchemy services that can be called:

  • Alchemy API
  • Alchemy API metadata*

*includes many of the same features of Alchemy API but also allows more advanced batching of documents and keyword control.

Both of these services can support both text extraction and feature extraction.  However, if you only need to perform text extraction Alchemy API is specified. 

Alchemy API

You can use engineConfig to pass the parameters of the Alchemy API configuration, as described below

ParameterDescription
postproc

 

Possible values:

"1","2","3"

Default value is "3."

 

"1" does some post-processing of geographic entities (AlchemyAPI tends to prefer US results even when the context clearly indicates a US location),

 

"2" does some post-processing of person entities (AlchemyAPI tends to prefer famous people even when the context does not support that)
"3" does both.
sentiment

Possible values:

True or False.

Default value is True.

If enabled, a sentiment metric is attached to each extracted entity.

Note that this results in use of an extra AlchemyAPI credit per document.

concepts

Possible values:

True or false.

Default value is false.

If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document.

Note that this results in use of an extra AlchemyAPI credit per document.

Alchemy API metadata

You can use engineConfig to pass the parameters of the Alchemy API metadata configuration, as described below

ParameterDescriptionData Type
sentiment

Possible values:

True or false

False is default value.

If enabled, a sentiment metric is attached to each extracted entity.

Note that this results in use of an extra AlchemyAPI credit per document.

 
concepts

Possible values:

True or false.

True is default setting.

If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. 

Note that this results in use of an extra AlchemyAPI credit per document.

 
batchSizea string containing an integer, turned off by default. If turned on, the AlchemyAPI call goes out on a batch of documents (the specified number). This makes processing of small documents like tweets more economical (in return for a reduction in accuracy, eg the sentiment is calculated over the batch not each individual tweet).

string,

integer

numKeywordsa string containing an integer, uses the AlchemyAPI default (currently 50) if not specified. If specified, controls the number of keywords returned. If batching is enabled then the requested number is multiplied by the batch size.string, integer
strict

Possible values:

True or False.

False is default setting.

If enabled, fewer high quality keywords are extracted.

 

Examples

Alchemy API

Using Alchemy API As A Text Extractor

In the example below, Alchemy API is only used as a text extractor.  As such most of the configuration parameters are not applicable and the default settings can be taken.  In this specific example, featureEngine uses OpenCalais.

Source Configuration:

{
    "description": "Article on Medical Issues",
    "harvestBadSource": false,
    "isApproved": true,
    "isPublic": true,
    "key": "http.www.mayoclinic.com.rss.blog.xml",
    "mediaType": "News",
    "modified": "Oct 19, 2010 11:31:59 AM",
    "tags": [
        "topic:healthcare",
        "industry:healthcare",
        "mayo clinic",
        "health"
    ],
    "title": "MayoClinic: General Topics",
    "processingPipeline": [
        {
            "feed": {
                "extraUrls": [
                    {
                        "url": "http://www.mayoclinic.com/rss/blog.xml"
                    }
                ]
            }
        },
        {
            "textEngine": {
                "engineName": "AlchemyAPI"
            }
        },
        {
            "featureEngine": {
                "engineName": "OpenCalais"
            }
        }
    ]
}

Output:

The output contains the "description" and entities resulting from the textEngine and featureEngine settings.

{
    "_id" : "4e1c8afa7d56bb818ed10f76",
    "created" : "1310493434159",
    "description" : "Clarify the role of carbohydrates in the Dr. Bernstein diet and find a 
         healthy eating plan that works for you.",
    "entities" : [
    {
        "actual_name" : "certified diabetes",
        "dimension" : "What",
        "disambiguous_name" : "certified diabetes",
        "doccount" : NumberLong(38),
        "frequency" : 3,
        "gazateer_index" : "certified diabetes/medicalcondition",
        "relevance" : "0.711",
        "totalfrequency" : NumberLong(114),
        "type" : "MedicalCondition"
    },
    {
        "actual_name" : "Diabetes Unit",
        "dimension" : "Who",
        "disambiguous_name" : "Diabetes Unit",
        "doccount" : NumberLong(38),
        "frequency" : 1,
        "gazateer_index" : "diabetes unit/organization",
        "relevance" : "0.235",
        "totalfrequency" : NumberLong(38),
        "type" : "Organization"
    },
    {
        "actual_name" : "Mayo Clinic",
        "dimension" : "What",
        "disambiguous_name" : "Mayo Clinic",
        "doccount" : NumberLong(514),
        "frequency" : 2,
        "gazateer_index" : "mayo clinic/facility",
        "relevance" : "0.305",
        "totalfrequency" : NumberLong(1033),
        "type" : "Facility"
    },

 

 

Alchemy API Metadata

Feature Extraction

In this example, Alchemy API metadata is used for feature extraction.  It is configured to act on a batch of documents (100) and to return a maximum of 5 keywords per document.   The strict setting will return more high quality keywords, and less keywords overall.

Source Configuration:

The source configuration shows how Alchemy API Metadata parameters can be used to set batch sizing and keywords settings.  In addition, the beginning of the entities block is included to show how automatic feature extraction and manual entities can be combined to achieve highly customizable results.

 },
        {
            "featureEngine": {
                "engineName": "AlchemyAPI-metadata",
                "engineConfig": {
                    "app.alchemyapi-metadata.batchSize": 100,
                    "app.alchemyapi-metadata.numKeywords": 5,
                    "app.alchemyapi-metadata.strict": "true"
                }
            }
        },
        {
            "entities": [
                {
                    "actual_name": "$metadata.json.actor.displayName",
                    "dimension": "Who",
                    "disambiguated_name": "$metadata.json.actor.preferredUsername",
                    "linkdata": "$metadata.json.actor.link",
                    "type": "TwitterHandle"
                },

 

Output:

The output reveals the results of featureEngine and entities.  The entities are returned indexed by keyword.

   },
        {
            "actual_name": "Amex Teams",
            "dimension": "What",
            "disambiguated_name": "Amex Teams",
            "doccount": -1,
            "frequency": 1,
            "index": "amex teams/keyword",
            "relevance": 0.758636,
            "sentiment": 0.160753,
            "totalfrequency": -1,
            "type": "Keyword"
        },
        {
            "actual_name": "Halo",
            "dimension": "What",
            "disambiguated_name": "Halo",
            "doccount": -1,
            "frequency": 1,
            "index": "halo/keyword",
            "relevance": 0.461833,
            "sentiment": 0.168822,
            "totalfrequency": -1,
            "type": "Keyword"
        },
        {
            "actual_name": "Master Chief Incentives",
            "dimension": "What",
            "disambiguated_name": "Master Chief Incentives",
            "doccount": -1,
            "frequency": 1,
            "index": "master chief incentives/keyword",
            "relevance": 0.981457,
            "sentiment": 0.168876,
            "totalfrequency": -1,
            "type": "Keyword"
        },