Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

TODO

Format

Code Block
TODO

Legacy documentation:

  • TODO

TODO

Description

Legacy documentation:

  • TODO

TODODatabase extractor supports harvesting data from traditional RDBMS (Relational Database Management Systems) using JDBC (Java Database Connectivity) drivers.

You can use the Database extractor to connect to and extract data from a database.

Format

Code Block
{
	"databaseType" : "string", // String, type of DB to connect to
					// eg: mysql, db2, oracle, mssqlserver, sybase
	"hostname" : "string", // String, hostname of database server to connect to
	"port" : "string", // String, port database server is listening for connections on
	"databaseName" : "string", // String, name of database containing the data to be extracted
	"query" : "string", // String, SQL query used to retrieve full dataset from source (the first time the source is added)
	"deltaQuery" : "string", // String, SQL query used to retrieve updates from source after the first time through
	"deleteQuery" : "string", // String, not currently implemented
	"primaryKey" : "string", // String, primary key field in data set, used to help identify whether a record is new or previously harvested

	"title" : "string", // String, one of the columns from query/deltaQuery; populates the document's title field
	"snippet" : "string", // String, one of the columns from query/deltaQuery; populates the document's description field
	"publishedDate" : "string", // String, one of the columns from query/deltaQuery; populates the document's published date field
}

 

Description

FieldDescriptionNote
databaseTypeThe type of RDBMS to connect to. Valid values currently include: mysql, db2, oracle, mssqlserver, sybase. 
hostanmeThe hostname of the database server to connect to, i.e. "my.databaseserver.com" in the example above. 
 port The port that the database accepts incoming connections on. 
databaseNameThe name of the database to connect to. 
queryThe query field is used to specify the SQL used to perform a full extraction of data for the source. This is generally used the first time the harvester extracts data from a source with incremental extractions being specified using the deltaQuery below. 
deltaQueryThe deltaQuery field is used to specify the SQL that extracts data from the source RDBS based on one or more delta values, i.e. created or modified date for a record. 
deleteQueryNote: The deleteQuery functionality of the Database Harvester is not implemented in the Beta version of Infinit.e. 
primaryKeyPrimary key field in data set, used to help identify whether a record is new or previously harvested. 
titleRecord field used to populate the document's title field. 
snippetRecord field used to populate the document's description field. 
publishedDateRecord field used to populate the document's published date field. 

 

Examples

In the following source example, the database block is configured to pull data from an incident report provided the publishedDate matches the criteria of the deltaQuery mysql query.

The matching incident report can then be pulled from the data and metadata, entities and associations can be formed.

 

Code Block
{            "database": {
                "authentication": {
                    "password": "PASSWORD",
                    "username": "USERNAME"
                },
                "databaseName": "washingtondc",
                "databaseType": "mysql",
                "deleteQuery": "",
                "deltaQuery": "SELECT * FROM IncidentReport WHERE REPORTDATETIME >= (SELECT ADDDATE(CURDATE(),-7))",
                "hostname": "dbserver.ikanow.com",
                "port": "3306",
                "primaryKey": "NID",
                "publishedDate": "REPORTDATETIME",
                "query": "SELECT * FROM IncidentReport",
                "snippet": "OFFENSE",
                "title": "CCN",
                "url": "jdbc:mysql://ec2-184-72-206-97.compute-1.amazonaws.com/washingtondc/IncidentReport"
            }
        },

 

Examples

TODO

In the sample output, the record has been retrieved from the incident report database, based on the values of the mysql query.  This retrieved data can then be used to create metadata, entities and associations.

Code Block
{
    "_id": "4eb1a6532285a1eed2af759c",
    "title": "11012669",
    "url": "jdbc:mysql://dbserver.ikanow.com/washingtondc/IncidentReport/955778",
    "created": "Nov 2, 2011 08:21:39 PM UTC",
    "modified": "Nov 2, 2011 08:21:39 PM UTC",
    "publishedDate": "Jan 29, 2011 05:00:00 AM UTC",
    "source": [
        "data.dc.gov - Crime Incidents (ASP)"
    ],
    "sourceKey": [
        "Bergen.washingtondc.IncidentReport"
    ],
    "mediaType": [
        "Record"
    ],
    "description": "Jan 29, 2011 12:00:00 AM: THEFT,2 was reported at: 1300 B/O CONNECTICUT AVE NW",
Panel

Footnotes:

Legacy documentation:

Using the database harvester