read Method

On This Page

Description

Reads (consumes) data from a TSDB table into pandas DataFrames.

Aggregation Queries

A TSDB query can include aggregation functions (“aggregators”) to apply to the sample metrics; for a list of the supported aggregation functions, see the description of the aggregators parameter. To create an aggregation query, set the read method’s aggregators parameter to a list of aggregation functions. Each function is applied to all metrics configured for the query (see the columns parameter).

The Frames TSDB backend currently supports “over-time aggregation”, which aggregates the data for unique metric label sets over time, and returns a separate aggregation time series for each label set.

The aggregation is done at each aggregation step (a.k.a., aggregation interval) — the time interval for executing the aggregation functions over the query’s time range; the step determines the aggregation data points, starting at the query’s start time. The default step is the query’s time range (which can be configured via the start and end parameters). You can override the default aggregation step by setting the step parameter.

The aggregation is applied to all sample data within the query’s aggregation window, which currently always equals the query’s aggregation step. For example, for an aggregation step of 1 hour, the aggregation at step 10:00 is done for an aggregation window of 10:00–11:00.

Pre-Aggregation Note

When creating a TSDB table, you can optionally configure pre-aggregates that will be calculated for all metric samples as part of their ingestion into the TSDB table. For each aggregation request in an over-time aggregation query, if the TSDB table has matching pre-aggregated data (same aggregation function and the query’s aggregation window is a sufficient multiplier of the table’s aggregation granularity), the pre-aggregated data is used instead of performing a new aggregation calculation, which speeds up the query processing. For more information about pre-aggregation and how to configure it, see the description of the create method’s aggregates argument.

Syntax

read(backend=''[, table='', columns=None, filter='', max_rows_in_msg=0,
    iterator=False, **kw])

The following syntax statement replaces the kw parameter with the additional keyword arguments that can be passed for the TSDB backend via this parameter:

read(backend=''[, table='', columns=None, filter='', max_rows_in_msg=0,
    iterator=False, start, end, aggregators, step, multi_index])
Note

The method has additional parameters that aren’t currently supported for the TSDB backend. Therefore, when calling the method, be sure to explicitly specify the names of all parameters after table.

Parameters

aggregators (kw argument) | backend | columns | end (kw argument) | filter | kw | multi_index (kw argument) | start (kw argument) | step (kw argument) | table

backend

The backend type — "tsdb" for the TSDB backend. See Backend Types.

  • Type: str
  • Requirement: Required
table

The relative path to the backend data — a directory in the target data container (as configured for the client object) that represents a TSDB table. For example, "mytable" or "examples/tsdb/my_metrics".

  • Type: str
  • Requirement: Required
iterator

Determines whether to return a pandas DataFrames iterator or a single DataFrame: True — return a DataFrames iterator; False (default) — return a single DataFrame.

  • Type: bool
  • Requirement: Optional
  • Valid Values: True | False
  • Default Value: False (return a single DataFrame)
columns

A list of metric names to which to apply the query. For example, ["cpu", "temperature", "disk"]. By default, the query is applied to all metrics in the TSDB table.

Note
  • Queries with multiple metric names is currently supported only as Tech Preview.
  • You can restrict the metrics list for the query within the query filter, as explained for filter parameter.
  • Type: []str
  • Requirement: Optional
filter

A platform filter expression that restricts the information that will be returned. See Filter Expression for syntax details and examples.
The filter is typically applied to metric labels; for example, "os=='linux' AND arch=='amd64'".
You can also apply the filter to the _name attribute, which stores the metric name. This is less efficient than specifying the metric names in the columns parameter, but it might be useful in some cases. For example, if you have many “cpu<n>” metrics, you can use "starts(_name,'cpu')" in your filter expression to apply the query to all metrics (or all metrics specified in the columns parameter, if set) whose names begin with the string “cpu”.

Note
Currently, only labels of type string are supported; see the Software Specifications and Restrictions. Therefore, ensure that you embed label attribute values in your filter expression within quotation marks even when the values represent a number (for example, "node == '1'"), and don’t apply arithmetic operators to such attributes (unless you want to perform a lexicographic string comparison).
  • Type: str
  • Requirement: Optional
kw

This parameter is used for passing a variable-length list of additional keyword (named) arguments. See the following kw Arguments section for a list of additional arguments that are supported for the TSDB backend via the kw parameter.

  • Type: ** — variable-length keyword arguments list
  • Requirement: Optional

kw Arguments

The TSDB backend supports the following read arguments via the kw parameter for passing a variable-length list of additional keyword arguments:

start

The query’s start time — the earliest sample time to query: read only items whose data sample time is at or after (>=) the specified start time.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format "now" or "now-[0-9]+[mhd]" (where m = minutes, h = hours, and 'd' = days), or 0 for the earliest time. For example: "2016-01-02T15:34:26Z"; "1451748866"; "now-90m"; "0".
  • Default Value: <end time> - 1h
end

The query’s end time — the latest sample time to query: read only items whose data sample time is before or at (<=) the specified end time.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string containing an RFC 3339 time, a Unix timestamp in milliseconds, a relative time of the format "now" or "now-[0-9]+[mhd]" (where m = minutes, h = hours, and 'd' = days), or 0 for the earliest time. For example: "2018-09-26T14:10:20Z"; "1537971006000"; "now-3h"; "now-7d".
  • Default Value: now
aggregators

A list of aggregation functions (“aggregators”) to apply to the raw sample data of the configured query metrics (see the columns parameter) in order to perform an aggregation query. You can configure the aggregation step, which serves also as the aggregation window, in the step parameter.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string containing a comma-separated list of supported aggregation functions (“aggregators”); for example, "count,avg,min,max". The following aggregation functions are supported:

    • avg — the average of the sample values.
    • count — the number of ingested samples.
    • last — the value of the last sample (i.e., the sample with the latest time).
    • max — the maximal sample value.
    • min — the minimal sample value.
    • rate — the change rate of the sample values, which is calculated as <last sample value of the previous interval> - <last sample value of the current interval>) / <aggregation granularity>.
    • stddev — the standard deviance of the sample values.
    • stdvar — the standard variance of the sample values.
    • sum — the sum of the sample values.
step

The query step (interval), which determines the points over the query’s time range at which to perform aggregations (for an aggregation query) or downsample the data (for a query without aggregators). The default step is the query’s time range, which can be configured via the start and end parameters. In the current release, the aggregation step is also the aggregation window to which the aggregators are applies. For more information, see Aggregation Queries.

  • Type: str
  • Requirement: Optional
  • Valid Values: A string of the format "[0-9]+[mhd]" — where ‘m’ = minutes, ‘h’ = hours, and ‘d’ = days. For example, "30m" (30 minutes), "2h" (2 hours), or "1d" (1 day).
multi_index

Determines the indexing of the returned DataFrames: True — return a multi-index DataFrame in which all metric-label attributes are defined as index columns in addition to the metric sample-time attribute (the primary-key attribute); False (default) — return a single-index DataFrame in which only the metric sample-time attribute is defined as an index column.

  • Type: bool
  • Requirement: Optional
  • Default Value: False (return a single-index DataFrame)

Return Value

  • When the value of the iterator parameter is True — returns a pandas DataFrames iterator.
  • When the value of the iterator parameter is False (default) — returns a single pandas DataFrame.

Examples

Following are some usage examples for the read method of the Frames TSDB backend. All of the examples set the read method’s multi_index parameter to True to display metric-label attributes as index columns (in addition to the sample-time attribute, which is always displayed as an index column). Except where otherwise specified, the examples return a single DataFrame (default iterator value = False).

  1. Read all items (rows) of a mytsdb table in the client’s data container (table) — start = "0" and default end ("now”) and columns (all metrics):

    tsdb_table = "mytsdb"
    df = client.read(backend="tsdb", table=tsdb_table, start="0", multi_index=True)
    display(df.head())
    display(df.tail())
    
  2. Issue an aggregation query (aggregators) to a mytsdb table in the client’s data container (table) for the “cpu” metric (columns); use the default aggregation step (step not set), which is the query’s time range — 09:00–17:00 on 1 Jan 2019 (see start and end):

    tsdb_table = "mytsdb"
    df = client.read("tsdb", table=tsdb_table, start="2019-01-01T09:00:00Z",
                     end="2019-01-01T17:00:00Z", columns=["cpu"],
                     aggregators="avg,min,max", multi_index=True)
    display(df)
    
  3. Issue an aggregation query to a tsdb/my_metrics table in the client’s data container (table) for the previous two days (start = "now-2d" and end = "now-1d"); apply the sum and avg aggregators (aggregators) to the “disk” and “cpu” metrics (columns) with a 12-hours aggregation step (step), and only apply the query to samples with a “linux” os label (filter = os=='linux).

    tsdb_table = "/tsdb/my_metrics"
    df = client.read("tsdb", table=tsdb_table, columns=["disk", "memory"],
                     filter="os=='linux'", aggregators="sum,avg", step="12h",
                     start="now-2d", end="now-1d", multi_index=True)
    display(df)
    
  4. Issue a 1-hour raw-data downsampling query (step = "1h" and aggregators not set) to a mytsdb table in the client’s data container (table); apply the query to all metric samples (default columns) from 1 Jan 2019 (start = "2019-01-01T00:00:00Z" and end = "2019-02-01T00:00:00Z"):

    tsdb_table = "mytsdb"
    df = client.read("tsdb", table=tsdb_table, start="2019-01-01T00:00:00Z",
                     end="2019-02-01T00:00:00Z", step="1h", multi_index=True)
    display(df)
    

See Also