read Method

On This Page

Description

Reads (consumes) data from a TSDB table into pandas DataFrames.

Query Options

You can construct the read query using either of the following alternative approaches (but not both concurrently):

  1. Non-SQL query — set one or more of the following read parameters:

    • table — the path to the TSDB table to query (required).
    • columns — a list of metric names (default = all metrics).
    • filter — a query filter that restricts which information to return.
    • aggregators — a list of aggregation functions to apply to the query metrics (for an aggregation query).
    • group_by — a list of metric labels that determine how to group the read results.
  2. SQL query — set the query parameter to an SQL query string; see the parameter description for details.

For either approach, you can also optionally set one or more of the following read parameters:

Aggregation Queries

A TSDB query can include aggregation functions (“aggregators”) to apply to the sample metrics; for a list of the supported aggregation functions, see the description of the aggregators parameter. You can use one of the following way to define aggregators for a read query —

  • For a non-SQL query — set the aggregators parameter to a list of aggregation functions. Each function is applied to all metrics configured for the query (see the columns parameter).
  • For an SQL query — set the query parameter to an SQL query whose SELECT statement includes aggregation functions; the function calls specify the metrics to which to apply the aggregation. See details in the description of the query parameter.

You can use one of two types of aggregations, but you cannot combine both types in the same query. For both types, the aggregation is done at each aggregation step (a.k.a., aggregation interval) — the time interval for executing the aggregation functions over the query’s time range; the step determines the aggregation data points, starting at the query’s start time. The default step is the query’s time range (which can be configured via the start and end parameters). You can override the default aggregation step by setting the step parameter.

  • Over-time aggregation (default) — aggregates the data for unique metric label sets over time, and returns a separate aggregation time series for each label set. The aggregation is applied to all sample data within the query’s aggregation window. By default, the aggregation window is identical to the aggregation step and the aggregation is calculated for an aggregation window that starts at the aggregation step; for example, for an aggregation step and default aggregation window of 1 hour, the aggregation at step 10:00 is done for the period of 10:00–11:00. However, you can override the default aggregation window by setting the aggregationWindow parameter. When this parameter is set, the aggregation is calculated for an aggregation window of the configured size that ends at the aggregation step; for example, for a step of 1 hour and an aggregation window of 2 hours, aggregation at step 10:00 is done for the period 08:00–10:00.

    Pre-Aggregation Note

    When creating a TSDB table, you can optionally configure pre-aggregates that will be calculated for all metric samples as part of their ingestion into the TSDB table. For each aggregation request in an over-time aggregation query, if the TSDB table has matching pre-aggregated data (same aggregation function and the query’s aggregation window is a sufficient multiplier of the table’s aggregation granularity), the pre-aggregated data is used instead of performing a new aggregation calculation, which speeds up the query processing. For more information about pre-aggregation and how to configure it, see the description of the create method’s attrs aggregates argument.

  • Cross-series aggregation (a.k.a. label aggregation) — aggregates all the sample data for a given metric at each aggregation step — across all labels — and returns a single aggregation time series for the metric. This aggregation type is currently supported only for SQL queries. To perform cross-series aggregation, use aggregators whose names end with _all (for example, avg_all). For more information and examples, see the description of the query parameter.

Syntax

read(backend=''[, table='', query='', columns=None, filter='', group_by='',
    max_in_message=0, iterator=False, **kw])

The following syntax statement replaces the kw parameter with the additional keyword arguments that can be passed for the TSDB backend via this parameter:

read(backend=''[, table='', query='', columns=None, filter='', group_by='',
     max_in_message=0, iterator=False, start, end, aggregators,
     aggregationWindow, step, multi_index])
Note

The method has additional parameters that aren’t currently supported for the TSDB backend. Therefore, when calling the method, be sure to explicitly specify the names of all parameters after table.

Parameters

aggregators (kw argument) | aggregationWindow (kw argument) | backend | columns | end (kw argument) | filter | group_by | kw | multi_index (kw argument) | query | start (kw argument) | step (kw argument) | table

backend
The backend type — "tsdb" for the TSDB backend. See Backend Types.

  • Type: str
  • Requirement: Required
table
The relative path to the backend data — a directory in the target data container (as configured for the client object) that represents a TSDB table. For example, "mytable" or "examples/tsdb/my_metrics".

  • Type: str
  • Requirement: Required except when using the query parameter to perform an SQL query
iterator
Determines whether to return a pandas DataFrames iterator or a single DataFrame: True — return a DataFrames iterator; False (default) — return a single DataFrame.

  • Type: bool
  • Requirement: Optional
  • Valid Values: True | False
  • Default Value: False (return a single DataFrame)
columns

A list of metric names to which to apply the query. For example, ["cpu", "temperature", "disk"]. By default, the query is applied to all metrics in the TSDB table.

Note
  • You can restrict the metrics list for the query within the query filter, as explained for filter parameter.
  • You can alternatively specify the metric names for the query as part of the SELECT statement of an SQL query in the query parameter. The columns and query parameters cannot be set concurrently.

For more information, see Query Options.

  • Type: []str
  • Requirement: Optional
filter

A platform filter expression that restricts the information that will be returned. See Filter Expression for syntax details and examples.
The filter is typically applied to metric labels; for example, "os=='linux' AND arch=='amd64'".
You can also apply the filter to the _name attribute, which stores the metric name. This is less efficient than specifying the metric names in the columns parameter, but it might be useful in some cases. For example, if you have many “cpu<n>” metrics, you can use "starts(_name,'cpu')" in your filter expression to apply the query to all metrics (or all metrics specified in the columns parameter, if set) whose names begin with the string “cpu”.

Note
  • You can alternatively define a query filter as part of the WHERE clause of an SQL query in the query parameter. The filter and query parameters cannot be set concurrently. For more information, see Query Options.
  • Currently, only labels of type string are supported; see the Software Specifications and Restrictions. Therefore, ensure that you embed label attribute values in your filter expression within quotation marks even when the values represent a number (for example, node == '1' or “node = ‘1’” in an SQL query), and don’t apply arithmetic operators to such attributes (unless you want to perform a lexicographic string comparison).
  • Type: str
  • Requirement: Optional
group_by

A comma-separated list of metric label names, which determines how to group the returned read results. For example, "os,host".

Note

You can alternatively include a GROUP BY statement in an SQL query in the query parameter. The group_by and query parameters cannot be set concurrently. For more information, see Query Options.

  • Type: str
  • Requirement: Optional
query

A string that defines an SQL query.

Note
  • Type: str
  • Requirement: Optional

The SQL query string must be of the following format:

  select <data> from <table> [where <filter>] [group by <labels>]

SELECT <data> | FROM <table> | WHERE <filter> | GROUP BY <labels>

  • SELECT <data> — The object of the SELECT statement is a comma-separated list of either metric names or aggregation functions (“aggregators”) that are applied to specific metrics; note that you cannot use both in the same query.
    You can use the ‘*’ wildcard character to represent all metrics.

    • For metric names, the query returns the raw sample data of the specified metrics, or downsampled data when the step parameter of the read method is set.
      Examples:select mem”; “select cpu, disk”; “select *

    • For aggregators”, the query returns the requested aggregation data for the specified metrics; see Aggregation Queries.
      For a list of the supported aggregation functions, see the description of the aggregators parameter.
      You can perform one of two types of aggregation queries, but you cannot use both in the same query:

      • Use regular aggregation function names for performing over-time aggregation. You can use the ‘*’ wildcard character as the value of the metric-name parameter to apply the aggregation to all metrics.
        Examples:select avg(cpu), sum(cpu), sum(disk), max(*), min(*)

      • Use aggregation function names ending with _all for performing cross-series aggregation across all metric labels. In the current release, this option is supported only for single-metric queries.
        Examples:select count_all(cpu), sum_all(cpu)

    Note
    You can optionally use column aliases in your SELECT statement, provided you don't use a metric name as the alias.
    Examples: "select temp as temperature"; "select min(disk) as min_disk"; "select avg_all(mem) as cs_memory_average"
  • FROM <table> — the path to the TSDB table to query (required).
    Note that when the table path contains slashes (/), it must be embedded within quotation marks.
    Examples:from mytsdb”; “from '/tsdb/my_metrics'

  • WHERE <filter> — an optional platform filter expression that restricts the information that will be returned. For more information, see the description of the filter parameter, which is used for non-SQL queries.

    Note
    The filter expression in the query string uses an SQL syntax. Therefore, unlike when setting it in the filter parameter, the expression string isn’t embedded within quotation marks and the comparison operator is '=' and not '==' (as documented in the platform expression reference).
    Examples:where os='linux' AND arch='amd64'”; “where node != '0'

  • GROUP BY <labels> — an optional comma-separated list of label names, which determines how to group the returned results. For more information, see the description of the group_by parameter, which is used for non-SQL queries.
    Examples:group by os,host”; “group by min_disk

kw

This parameter is used for passing a variable-length list of additional keyword (named) arguments. See the following kw Arguments section for a list of additional arguments that are supported for the TSDB backend via the kw parameter.

  • Type: ** — variable-length keyword arguments list
  • Requirement: Optional

kw Arguments

The TSDB backend supports the following read arguments via the kw parameter for passing a variable-length list of additional keyword arguments:

start
The query’s start time — the earliest sample time to query: read only items whose data sample time is at or after (>=) the specified start time.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format "now" or "now-[0-9]+[mhd]" (where m = minutes, h = hours, and 'd' = days), or 0 for the earliest time. For example: "2016-01-02T15:34:26Z"; "1451748866"; "now-90m"; "0".
  • Default Value: <end time> - 1h
end
The query’s end time — the latest sample time to query: read only items whose data sample time is before or at (<=) the specified end time.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format "now" or "now-[0-9]+[mhd]" (where m = minutes, h = hours, and 'd' = days), or 0 for the earliest time. For example: "2018-09-26T14:10:20Z"; "1537971006000"; "now-3h"; "now-7d".
  • Default Value: now
aggregators

A list of aggregation functions (“aggregators”) to apply to the raw sample data of the configured query metrics (see the columns parameter) in order to perform an aggregation query. You can configure the aggregation step and/or window in the step and aggregationWindow parameters, respectively.

Note

You can alternatively define the aggregation functions as part of the SELECT statement of an SQL query in the query parameter, which also supports cross-series (label) aggregation and enables you to apply specific aggregation functions to different metrics. The aggregators and query parameters cannot be set concurrently. For more information, see Query Options.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string containing a comma-separated list of supported aggregation functions (“aggregators”); for example, "count,avg,min,max"; "count_all,avg_all,min_all,max_all". Functions whose names end with _all perform cross-series aggregation instead of the default over-time aggregation; note that you cannot mix these two aggregation types in the same query. The following aggregation functions are supported:
    • avg | avg_all — the average of the sample values.
    • count | count_all — the number of ingested samples.
    • last | last_all — the value of the last sample (i.e., the sample with the latest time).
    • max | max_all — the maximal sample value.
    • min | min_all — the minimal sample value.
    • rate | rate_all — the change rate of the sample values, which is calculated as <last sample value of the previous interval> - <last sample value of the current interval>) / <aggregation granularity>.
    • stddev | stddev_all — the standard deviance of the sample values.
    • stdvar | stdvar_all — the standard variance of the sample values.
    • sum | sum_all — the sum of the sample values.
step

The query step (interval), which determines the points over the query’s time range at which to perform aggregations (for an aggregation query) or downsample the data (for a query without aggregators). The default step is the query’s time range, which can be configured via the start and end parameters. For more information, see Aggregation Queries.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string of the format "[0-9]+[mhd]" — where ‘m’ = minutes, ‘h’ = hours, and ‘d’ = days. For example, "30m" (30 minutes), "2h" (2 hours), or "1d" (1 day).
aggregationWindow

For an over-time aggregation query — the query’s aggregation window, i.e., the time frame to which to apply the configured aggregation functions at each aggregation step. The default aggregation window is the query’s aggregation step. When using the default aggregation window, the aggregation window starts at the aggregation step; when the aggregationWindow parameter is set, the aggregation window ends at the aggregation step. For more information, see Aggregation Queries.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string of the format "[0-9]+[mhd]" — where ‘m’ = minutes, ‘h’ = hours, and ‘d’ = days. For example, "30m" (30 minutes), "2h" (2 hours), or "1d" (1 day).
multi_index
Determines the indexing of the returned DataFrames: True — return a multi-index DataFrame in which all metric-label attributes are defined as index columns in addition to the metric sample-time attribute (the primary-key attribute); False (default) — return a single-index DataFrame in which only the metric sample-time attribute is defined as an index column.

  • Type: bool
  • Requirement: Optional
  • Default Value: False (return a single-index DataFrame)

Return Value

  • When the value of the iterator parameter is True — returns a pandas DataFrames iterator.
  • When the value of the iterator parameter is False (default) — returns a single pandas DataFrame.

Examples

Following are some usage examples for the read method of the Frames TSDB backend. All of the examples set the read method’s multi_index parameter to True to display metric-label attributes as index columns (in addition to the sample-time attribute, which is always displayed as an index column). Except where otherwise specified, the examples return a single DataFrame (default iterator value = False).

  1. Read all items (rows) of a mytsdb table in the client’s data container (table) — start = "0" and default end ("now”) and columns (all metrics):

    tsdb_table = "mytsdb"
    df = client.read(backend="tsdb", table=tsdb_table, start="0", multi_index=True)
    display(df.head())
    display(df.tail())
  2. Issue an SQL query (query) to read from of a tsdb/my_metrics table in the client’s data container (set in the query’s FROM clause); return all items (default columns = all metrics, start = "0", and default end = "now”) in a DataFrames iterator (iterator = True):

    tsdb_table = "/tsdb/my_metrics"
    query_str = f"select * from '{tsdb_table}'"
    dfi = client.read("tsdb", query=query_str, start="0", iterator=True,
                      multi_index=True)
    for df in dfi:
        display(df.head())
        display(df.tail())
      

  3. Issue a non-SQL over-time aggregation query (aggregators) to a mytsdb table in the client’s data container (table) for the “cpu” metric (columns); use the default aggregation step and window (step and aggregationWindow not set), which is the query’s time range — 09:00–17:00 on 1 Jan 2019 (see start and end):

    tsdb_table = "mytsdb"
    df = client.read("tsdb", table=tsdb_table, start="2019-01-01T09:00:00Z",
                     end="2019-01-01T17:00:00Z", columns=["cpu"],
                     aggregators="avg,min,max", multi_index=True)
    display(df)
      

    The following variation explicitly sets the aggregation window for the query to 1 hour (aggregationWindow = '1h'):

    tsdb_table = "mytsdb"
    df = client.read("tsdb", table=tsdb_table, start="2019-01-01T09:00:00Z",
                     end="2019-01-01T17:00:00Z", columns=["cpu"],
                     aggregators="avg,min,max", aggregationWindow="1h",
                     multi_index=True)
    display(df)
      

  4. Issue an SQL over-time aggregation query (query) to a tsdb/my_metrics table in the client’s data container (set in the query’s FROM clause) for the last day (see start = "now-1d" and default end = "now"). Apply the count aggregator to all metrics and the avg aggregator only to the disk metric (query with “select count(*), avg(disk)”) with an aggregation step of 4 hours (step) and a similar default aggregation window (aggregationWindow not set), and only apply the query to samples with a “NY” site label (query with “where site='NY'”). Return a DataFrames iterator (iterator=True) and group the results by the host label (query with “group by host”).

    tsdb_table = "/tsdb/my_metrics"
    query_str = f"select count(*), avg(disk) from '{tsdb_table}' where site='NY'" \
        " group by host"
    dfi = client.read("tsdb", query=query_str, start="now-1d", step="4h",
                      iterator=True, multi_index=True)
    for df in dfi:
        display(df)
      

  5. Issue a non-SQL over-time aggregation query to a tsdb/my_metrics table in the client’s data container (table) for the previous two days ( start = "0" and end=now); apply the sum and avg aggregators (aggregators) to the “disk” and “cpu” metrics (columns) with a 12-hours aggregation step (step) and a similar default aggregation window (aggregationWindow not set), and only apply the query to samples with a “linux” os label (filter = os=='linux).

    tsdb_table = "/tsdb/my_metrics"
    df = client.read("tsdb", table=tsdb_table, columns=["disk", "memory"],
                     filter="os=='linux'", aggregators="sum,avg", step="12h",
                     group_by="site,host", start="now-2d", end="now-1d",
                     multi_index=True)
    display(df)
      

  6. Issue a 1-hour raw-data downsampling non-SQL query (step = "1h" and aggregators not set) to a mytsdb table in the client’s data container (table); apply the query to all metric samples (default columns) from 1 Jan 2019 (start = "2019-01-01T00:00:00Z" and end = "2019-02-01T00:00:00Z"):

    tsdb_table = "mytsdb"
    df = client.read("tsdb", table=tsdb_table, start="2019-01-01T00:00:00Z",
                     end="2019-02-01T00:00:00Z", step="1h", multi_index=True)
    display(df)
        

  7. Issue an SQL cross-series aggregation query to a tsdb/my_metrics table in the client’s data container (set in the query’s FROM clause) for the last day (start = "now-1d" and default end = "now"). Apply the count and avg aggregators to the “disk” metric (query with “select count_all(disk), avg_all(disk)”) with an aggregation step of 4 hours (step = "4h"), and only apply the query to samples with a “DC” site label (query with “where site='DC'”). Return a DataFrames iterator (iterator=True) and group the results by the site label (query with “group by site”).

    tsdb_table = "/tsdb/my_metrics"
    query_str = f"select count_all(disk), avg_all(disk) from '{tsdb_table}' " \
        "where site='DC' group by site"
    dfi = client.read("tsdb", query=query_str, start="now-1d", step="4h",
                      iterator=True, multi_index=True)
    for df in dfi:
        display(df)
        

See Also