Query Data

EdgeLake provides a unifies view of the distributed data. Using virtualization, users interact with the data as if the data is centralized.
Using the following commands, users retrieve the list of tables, select a table, retrieve the table’s column, and issue a query.

Retrieve the list of tables

The following command retrieves the list of tables:

 
get virtual tables [info] where company = [company name] and dbms = [dbms name] and table = [table name]

Examples:

 
get virtual tables
get virtual tables where table = ping_sensor

The following command details the edge nodes that host each table:

 
get data nodes where company = [company name] and dbms = [dbms name] and table = [table name] and sort = (columns IDs)

Examples:

 
get data nodes
get data nodes where table = ping_sensor
get data nodes where sort = (1,2)

Retrieve the list of columns

The following command lists the columns of a table:

 
get columns where dbms = [dbms name] and table = [table name] and format = [table/json]

Example:

 
get columns where table = ping_sensor and dbms = dmci

# Example output:

Schema for DBMS: 'my_dbms' and Table: 'printer_status'
Column Name      Column Type
----------------|---------------------------|
row_id          |integer                    |
insert_timestamp|timestamp without time zone|
tsd_name        |char(3)                    |
tsd_id          |int                        |
bed_temp        |decimal                    |
extruder_temp   |decimal                    |
current_file    |char(2)                    |
current_x       |decimal                    |
current_y       |decimal                    |
current_z       |decimal                    |
fan_speed       |float                      |

Note: the first 4 columns (row_id, insert_timestamp, tsd_name, tsd_id) are added by default when the table is created and are used for management and traceability of ingested data. Users can ignore these columns in their projection lists.

SQL supported functionality

In depth details can be found in the Query Nodes section.

Projection List:

increments function
Column name
Min
Max
Sum
Count
Avg
Count Distinct
Range
Time functions over Column values

Where Conditions:

period function
Greater than
Less than
Equal
Not Equal
Group By
Order By
Limit

Query Format

Sample Query:

# Query Format Breakdown 
run client () sql [db_name] format=[output_type] and stats=[true/false] [select statement] 

# Sample Query 
run client () sql litsanleandro format = table "select insert_timestamp, device_name, timestamp, value from ping_sensor limit 100"

run client () directs the query to the relevant nodes in the network. When executing via REST, the --headers "destination: network" replaces the run client () prefix.
format determines the structure of the returned data:
- format=json - The returned results are in JSON.
- format=table - The returned results are in a table format.
- format=json:output - The returned results are organized as rows whereas each row is a JSON structure - this format is identical to the data load structure.
- format=json;list- The returned results are organized as a list, every entry in the list represents a row (use this format with PowerBI).
The query result sets are extended by statistics that describe how the query was executed. Set stats=false to disable the statistics.

Built-in Query Functions

Period Function

The period function finds the first occurrence of data before or at a specified date; if a filter-criteria is specified, the occurrence needs also to satisfy the filter-criteria.
A period function, considers the readings in a period of time which is measured by the type of the time interval (Minutes, Hours, Days, Weeks, Months or Years) and the number of units of the time interval (i.e. 3 days - whereas time-interval is day and unit is 3).

Sample Call:

run client () sql edgex format=table "select min(timestamp), max(timestamp), count(value) from rand_data WHERE period(day, 1, now(), timestamp);"

Increments Function

The increments functions considers data in increments of time within a time range.
The function has 3 sections: time-interval, units and date-column.

time-interval is (Minutes, Hours, Days, Weeks, Months or Years)
units associated with the time interval (i.e. 2 weeks).
date-column is the name of the column that determines the date and time to consider.

An Increment function returns a single value for time intervals within a time range. For example, rather than returning all column values within a month, a query returns a smaller data set by returning the average, min and max values for every minute within the month.

Sample Call:

run client () sql edgex format=table "select increments(day, 1, timestamp), min(timestamp), max(timestamp), count(value) from rand_data  where timestamp >= now() - 1 month"