Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Overview

todo

Entity Types

The following entity types are possible on the platform.

The set of values permitted by the "type" field depends on how the entity was extracted:Commercial third party extractors have set types, but some other entity extractors enable users to set custom entity types.

Some of the common entity types on the Infinit.e platform are defined below.  Sources are also indicated.

Entity TypeExampleDescriptionSource
Topic

Business, Sports, Politics, Health, War, Law, Crime, Automotive, Investing, Weather, "Software and Internet", Economics, Food, Science, Aviation, Education,"Video Games", Technology, Labor, Art, Travel etc

 A high level topic inferred from the contents by the Natural Language Processing.Salience
GenderMale, Mostly_Male, Female, Mostly_Female, UnisexObtained from the Datasift "gender" augmentation, an estimate of the gender of the document's author.Datasift
FacebookUser"Mark Zuckerberg", IKANOW, "Facebook Birdwatching Group"For Facebook documents, any of the people/companies/groups with Facebook accounts mentioned in a post (including the author).Datasift
TwitterUsertwitterHandle (ie without the leading'@')For tweets, any of the people/companies/groups with twitter accounts mentioned in a post (including the author).Datasift
RedditUserwitty_handle_hereFor reddit posts, the author's account name.Datasift
Person"John Stewart"

For any other document type (blogs, news, forums) the author is categorized as a Person. Note that the Person type is also used for names extracted from the content using NLP.

Datasift

(or Salience)

Hashtagiwanttotrend (ie without the leading '#')Hashtags in tweets.Datasift
City"New York, NY, United States"

Locations can be obtained in one of two ways: the registered location of the author (from Datasift), or places mentioned in the content (extracted using NLP). If the place can be geolocated by Infinit.e to a city, then this type is used.

Datasift/Salience
Region"Maryland, United States"Locations can be obtained in one of two ways: the registered location of the author (from Datasift), or places mentioned in the content (extracted using NLP). If the place can be geolocated by Infinit.e to a state or similar adminstrative partition, then this type is used.Datasift/Salience
Place"White House", "Arizona", "US"Locations can be obtained in one of two ways: the registered location of the author (from Datasift), or places mentioned in the content (extracted using NLP). If the place cannot be geolocated by Infinit.e then this "catch all" is used.

Datasift/Salience

URLhttp://www.ikanow.com/downloadsLinks in Facebook posts and tweets.Datasift
Person"Barack Obama"A name extracted from the content by Salience and believed to be the name of a person.

Salience

(or Datasift)

Job TitlePresident, CEOA job title extracted from the content using Natural Language Processing.Salience
Company/OrganizationMicrosoft, UNA name extracted from the content by Salience and believed to be the name of a company or organization. 
Quote"Ask not for whom the bell tolls"An unattributed quote extracted from the content.Salience
Keyword"american history", "domestic spying program"A word or phrase from the content that is statistically significant to the meaning of the post.Salience

 

Aliasing

TODO

 

 

 

  • No labels