Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Description

This section describes the configuration details for the supported extractors, and provides examples where applicable.

About Alchemy API

There are two Alchemy services that can be called:

  • Alchemy API
  • Alchemy API metadata*

*includes many of the same features of Alchemy API but also allows more advanced batching of documents and keyword control.

Both of these services can support both text extraction and feature extraction.  However, if you only need to perform text extraction Alchemy API is specified. 

Alchemy API

You can use engineConfig to pass the parameters of the Alchemy API configuration, as described below

ParameterDescription
postproc

 

Possible values:

"1","2","3"

Default value is "3."

 

"1" does some post-processing of geographic entities (AlchemyAPI tends to prefer US results even when the context clearly indicates a US location),

 

"2" does some post-processing of person entities (AlchemyAPI tends to prefer famous people even when the context does not support that)
"3" does both.
sentiment

Possible values:

True or False.

Default value is True.

If enabled, a sentiment metric is attached to each extracted entity.

Info

Note that this results in use of an extra AlchemyAPI credit per document.

concepts

Possible values:

True or false.

Default value is false.

If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document.

Info

Note that this results in use of an extra AlchemyAPI credit per document.

Alchemy API metadata

You can use engineConfig to pass the parameters of the Alchemy API metadata configuration, as described below

ParameterDescriptionData Type
sentiment

Possible values:

True or false

False is default value.

If enabled, a sentiment metric is attached to each extracted entity.

Info

Note that this results in use of an extra AlchemyAPI credit per document.

 
concepts

Possible values:

True or false.

True is default setting.

If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. 

Info

Note that this results in use of an extra AlchemyAPI credit per document.

 
batchSizea string containing an integer, turned off by default. If turned on, the AlchemyAPI call goes out on a batch of documents (the specified number). This makes processing of small documents like tweets more economical (in return for a reduction in accuracy, eg the sentiment is calculated over the batch not each individual tweet).

string,

integer

numKeywordsa string containing an integer, uses the AlchemyAPI default (currently 50) if not specified. If specified, controls the number of keywords returned. If batching is enabled then the requested number is multiplied by the batch size.string, integer
strict

Possible values:

True or False.

False is default setting.

If enabled, fewer high quality keywords are extracted.

 

Examples

...

The example below shows sample code which uses the Alchemy API to parse data from a RSS feed.  The data can then be used to form some entities and associations.  In the example, OpenCalais is also used as the featureEngine.

...

Alchemy API

Using Alchemy API As A Text Extractor

In the example below, Alchemy API is only used as a text extractor.  As such most of the configuration parameters are not applicable and the default settings can be taken.  In this specific example, featureEngine uses OpenCalais.

Source Configuration:

Code Block
{
    "description": "Article on Medical Issues",
    "harvestBadSource": false,
    "isApproved": true,
    "isPublic": true,
    "key": "http.www.mayoclinic.com.rss.blog.xml",
    "mediaType": "News",
    "modified": "Oct 19, 2010 11:31:59 AM",
    "tags": [
        "topic:healthcare",
        "industry:healthcare",
        "mayo clinic",
        "health"
    ],
    "title": "MayoClinic: General Topics",
    "processingPipeline": [
        {
            "feed": {
                "extraUrls": [
                    {
                        "url": "http://www.mayoclinic.com/rss/blog.xml"
                    }
                ]
            }
        },
        {
            "textEngine": {
                "engineName": "AlchemyAPI"
            }
        },
        {
            "featureEngine": {
                "engineName": "OpenCalais"
            }
        }
    ]
}

 

The Alchemy API will then return an array of entities based on its default configuration, since engineConfig was not used to specify any custom configuration parameters.  For example,Output:

The output contains the "description" and entities resulting from the textEngine and featureEngine settings.

Code Block
{
    "_id" : "4e1c8afa7d56bb818ed10f76",
    "created" : "1310493434159",
    "description" : "Clarify the role of carbohydrates in the Dr. Bernstein diet and find a 
         healthy eating plan that works for you.",
    "entities" : [
    {
        "actual_name" : "certified diabetes",
        "dimension" : "What",
        "disambiguous_name" : "certified diabetes",
        "doccount" : NumberLong(38),
        "frequency" : 3,
        "gazateer_index" : "certified diabetes/medicalcondition",
        "relevance" : "0.711",
        "totalfrequency" : NumberLong(114),
        "type" : "MedicalCondition"
    },
    {
        "actual_name" : "Diabetes Unit",
        "dimension" : "Who",
        "disambiguous_name" : "Diabetes Unit",
        "doccount" : NumberLong(38),
        "frequency" : 1,
        "gazateer_index" : "diabetes unit/organization",
        "relevance" : "0.235",
        "totalfrequency" : NumberLong(38),
        "type" : "Organization"
    },
    {
        "actual_name" : "Mayo Clinic",
        "dimension" : "What",
        "disambiguous_name" : "Mayo Clinic",
        "doccount" : NumberLong(514),
        "frequency" : 2,
        "gazateer_index" : "mayo clinic/facility",
        "relevance" : "0.305",
        "totalfrequency" : NumberLong(1033),
        "type" : "Facility"
    },

 

...

 The

Alchemy API

...

ParameterDescriptionData Type
sentiment

Possible values:

True or false

False is default value.

If enabled, a sentiment metric is attached to each extracted entity.

Info

Note that this results in use of an extra AlchemyAPI credit per document.

 
concepts

Possible values:

True or false.

True is default setting.

If enabled, a metadata field called "concepts" is tagged to the document containing Wiki titles that are related to the contents of the document. 

Info

Note that this results in use of an extra AlchemyAPI credit per document.

 
batchSizea string containing an integer, turned off by default. If turned on, the AlchemyAPI call goes out on a batch of documents (the specified number). This makes processing of small documents like tweets more economical (in return for a reduction in accuracy, eg the sentiment is calculated over the batch not each individual tweet).

string,

integer

numKeywordsa string containing an integer, uses the AlchemyAPI default (currently 50) if not specified. If specified, controls the number of keywords returned. If batching is enabled then the requested number is multiplied by the batch size.string, integer
strict

Possible values:

True or False.

False is default setting.

If enabled, fewer high quality keywords are extracted.

 

 

Example

You can use the engineConfig object to pass configuration parameters along to the feature engine.

...

Metadata

Feature Extraction

In this example, Alchemy API metadata is used for feature extraction.  It is configured to act on a batch of documents (100) and to return a maximum of 5 keywords per document.   The strict setting will return more high quality keywords, and less keywords overall.

Source Configuration:

The source configuration shows how Alchemy API Metadata parameters can be used to set batch sizing and keywords settings.  In addition, the beginning of the entities block is included to show how automatic feature extraction and manual entities can be combined to achieve highly customizable results.

Code Block

    },
        {
            "featureEngine": {
                "engineName": "AlchemyAPI-metadata",
                "engineConfig": {
                    "app.alchemyapi-metadata.batchSize": 100,
                    "app.alchemyapi-metadata.numKeywords": 5,
                    "app.alchemyapi-metadata.strict": "true"
                }
            }
        },
        {
            "entities": [
                {
                    "actual_name": "$metadata.json.actor.displayName",
                    "dimension": "Who",
                    "disambiguated_name": "$metadata.json.actor.preferredUsername",
                    "linkdata": "$metadata.json.actor.link",
                    "type": "TwitterHandle"
                },

 

Output:

The output reveals the results of featureEngine and entities.  The entities are returned indexed by keyword.

Code Block
   },
        {
            "actual_name": "Amex Teams",
            "dimension": "What",
            "disambiguated_name": "Amex Teams",
            "doccount": -1,
            "frequency": 1,
            "index": "amex teams/keyword",
            "relevance": 0.758636,
            "sentiment": 0.160753,
            "totalfrequency": -1,
            "type": "Keyword"
        },
        {
            "actual_name": "Halo",
            "dimension": "What",
            "disambiguated_name": "Halo",
            "doccount": -1,
            "frequency": 1,
            "index": "halo/keyword",
            "relevance": 0.461833,
            "sentiment": 0.168822,
            "totalfrequency": -1,
            "type": "Keyword"
        },
        {
            "actual_name": "Master Chief Incentives",
            "dimension": "What",
            "disambiguated_name": "Master Chief Incentives",
            "doccount": -1,
            "frequency": 1,
            "index": "master chief incentives/keyword",
            "relevance": 0.981457,
            "sentiment": 0.168876,
            "totalfrequency": -1,
            "type": "Keyword"
        },