write Method

On This Page

Description

Writes (ingests) data from pandas DataFrames to a TSDB table.

Syntax

write(backend, table, dfs, [labels=None, max_in_message=0, index_cols=None])
Note

The method has additional parameters that aren’t currently supported for the TSDB backend. Therefore, when calling the method, be sure to explicitly specify the names of all parameters after dfs.

Parameters

backend | dfs | index_cols | labels | max_in_message | table

backend
The backend type — "tsdb" for the TSDB backend. See Backend Types.

  • Type: str
  • Requirement: Required
table
The relative path to the backend data — a directory in the target data container (as configured for the client object) that represents a TSDB table. For example, "mytable" or "examples/tsdb/my_metrics".

  • Type: str
  • Requirement: Required
dfs

One or more DataFrames containing the data to write.

Note
  • DataFrame index columns —
    • You must define one or more non-index DataFrame columns that represent the sample metrics; the name of the column is the metric name and its values is the sample data (i.e., the ingested metric). See also TSDB metric-samples sofware specifications and restrictions.
    • You must define a single index column whose value is the value the sample time of the data. Note that a TSDB DataFrame cannot have more than one index column of a time data type.
    • You can optionally define string index columns that represent metric labels for the current DataFrame row. See also TSDB-labels sofware specifications and restrictions. Note that you can also define labels for all DataFrame rows by using the labels parameter (in addition or instead of using column indexes to apply labels to a specific row).
    • You can either include the index columns as part of the DataFrame definition (as typically done with pandas DataFrames) or by using the index_cols parameter of the write method, which overrides any index-column definitions in the DataFrame.
  • See the maximum write DataFrame size restriction. If you need to write a larger amount of data, use multiple DataFrames.
  • Type: A single DataFrame, a list of DataFrames, or a DataFrames iterator
  • Requirement: Required
labels

A dictionary of metric labels of the format {<label>: <value>[, <label>: <value>, ...]}, which will be applied to all the DataFrame rows (i.e., to all the ingested metric samples). For example, {"os": "linux", "arch": "x86"}. See also TSDB-labels sofware specifications and restrictions. Note that you can also define labels for a specific DataFrame row by adding a string index column to the row (in addition or instead of using the labels parameter to define labels for all rows), as explained in the description of the dfs parameter.

  • Type: dict with str keys
  • Requirement: Optional
  • Default Value: None
index_cols

A list of column (attribute) names to be used as index columns for the write operation, for all ingested DataFrames (as set in the dfs parameter) regardless of any index-column definitions in the DataFrames. By default, the DataFrames’ index columns are used. As explained for the dfs parameter, the TSDB backend supports a single mandatory time index column that represents the sample time of the data and multiple optional string index columns that represent metric labels.

  • Type: []str
  • Requirement: Optional
  • Default Value: None

max_in_message
The maximum number of DataFrame rows to write in each message (i.e., the size of the write chunks). When the value of this parameter is 0 (default), each DataFrame is written in a single message.

  • Type: int
  • Requirement: Optional

  • Default Value: 0

Examples

Following are some usage examples for the write method of the Frames TSDB backend. See the DataFrame index-columns note in the description of the dfs parameter for information regarding the alternative methods for defining the metric sample-time and label attributes for the write operation, which are demonstrates in the examples.

Both examples use the following functions to generate random metrics data:

import numpy as np


# Generate a matrix of random floating-point numbers between 0 and `max`
def gen_floats(d0=1, d1=1, max=100):
    return np.random.rand(d0, d1) * max
  1. Write a DataFrame with time-series sample metrics to a mytsdb TSDB table in the client’s data container (table); define the metric sample-time and label attributes as DataFrame index columns:

    from datetime import datetime
    
    metrics = ["cpu", "temperature"]
    times = pd.date_range(freq="1S", start=datetime(2019, 1, 1, 0, 0, 0, 0),
                          end = datetime(2019, 1, 1, 23, 59, 59, 0))
    df = pd.DataFrame(data=gen_floats(num_items, len(metrics)),
                      index=[times, ["1"] * len(times)], columns=metrics)
    df.index.names = ["time", "node"]
    
    tsdb_table = "mytsdb"
    client.write(backend="tsdb", table=tsdb_table, dfs=df)
      

  2. Write time-series metric samples to a tsdb/my_metrics table in the client’s data container (table), using one of two alternative variations; both variations use the following code for generating the data set:

    from datetime import datetime, timedelta
    
    metrics = ["cpu", "memory", "disk"]
    num_metrics = len(metrics)
    label_sets = [
        {"site": "DC", "host": "1", "os": "linux"},
        {"site": "DC", "host": "3", "os": "windows"},
        {"site": "NY", "host": "2", "os": "windows"},
        {"site": "SC", "host": "4", "os": "linux"},
        {"site": "SC", "host": "5", "os": "windows"},
        {"site": "NY", "host": "6", "os": "linux"}
    ]
    end_t = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
    start_t = end_t - timedelta(days=2)
    times = pd.date_range(freq="5min", start=start_t, end=end_t)
    num_items = len(times)
      

    Variation 1 — set the sample-time attribute as a DataFrame index column and use the write method’s labels parameter to define the metric-label attributes; use multiple single-DataFrame (dfs) write calls:

    tsdb_table = "/tsdb/my_metrics"
    for label_set in label_sets:
        df = pd.DataFrame(data=gen_floats(num_items, num_metrics), index=times,
                          columns=metrics)
        df.index.name = "time"
        client.write("tsdb", table=tsdb_table, dfs=df, labels=label_set)
      

    Variation 2 — define the metric sample-time and label attributes by using the write method’s index_cols parameter; use a single write call with multiple DataFrames (dfs):

    dfs = []
    for label_set in label_sets:
        df = pd.DataFrame(data=gen_floats(num_items, num_metrics),
                          index=[times, [label_set["site"]] * num_items,
                                 [label_set["host"]] * num_items,
                                 [label_set["os"]] * num_items], columns=metrics)
        df.index.names = index_cols
        df.reset_index(inplace=True)
        dfs.append(df)
    
    tsdb_table = "/tsdb/my_metrics"
    index_cols = ["time", "site", "host", "os"]
    client.write("tsdb", table=tsdb_table, dfs=df, index_cols=index_cols)
      

See Also