write Method

On This Page

Description

Writes (ingests) data from pandas DataFrames to a NoSQL table.

Note
  • NoSQL tables in the platform don’t need to be created prior to ingestion. When writing (ingesting) data into a table that doesn’t exist, the table and all directories in the specified path are automatically created.
  • When updating a table item (i.e., when writing an item with the same primary-key attribute value as an existing table item), the existing item is overwritten and replaced with the written item.
  • All items that are written to a given table must conform to the same schema. For more information, see the Table Schema overview.

Syntax

write(backend, table, dfs, [condition="", max_rows_in_msg=0, index_cols=None])
Note

The method has additional parameters that aren’t currently supported for the NoSQL backend. Therefore, when calling the method, be sure to explicitly specify the names of all parameters after dfs.

Parameters

backend | condition | dfs | index_cols | max_rows_in_msg | table

backend
The backend type — "nosql" or "kv" for the NoSQL backend. See Backend Types.

  • Type: str
  • Requirement: Required
table
The relative path to the backend data — a directory in the target data container (as configured for the client object) that represents a table. For example, mytable.

  • Type: str
  • Requirement: Required
dfs

One or more DataFrames containing the data to write.

Note
  • The written DataFrames must include a single index column whose value is the value of the item’s primary-key attribute (name); see Item Name and Primary Key. You can either include the index column as part of the DataFrame definition (as typically done with pandas DataFrames) or by using the index_cols parameter of the write method, which overrides any index-column definitions in the DataFrame. If you don’t define an index column using either of these methods, Frames assigns the name idx to the auto-generated pandas range-index column (pandas.RangeIndex) and uses it as the item’s primary-key attribute. You should therefore refrain from using the name idx for non-index columns (regular user attributes) in NoSQL items.
  • See the maximum write DataFrame size restriction. If you need to write a larger amount of data, use multiple DataFrames.
  • Type: A single DataFrame, a list of DataFrames, or a DataFrames iterator
  • Requirement: Required
index_cols

A list of column (attribute) names to be used as index columns for the write operation, for all ingested DataFrames (as set in the dfs parameter) regardless of any index-column definitions in the DataFrames. By default, the DataFrames’ index columns are used. The NoSQL backend supports a single mandatory index column, which represents the item’s primary-key attribute. See details in the index-column note for the dfs parameter.

  • Type: []str
  • Requirement: Optional
  • Default Value: None
condition

A Boolean condition expression that defines a conditional logic for executing the write operation. See Condition Expression for syntax details and examples. The condition is applied to each written DataFrame row (item).

To reference an item attribute in the target table from within the expression, use the attribute name — <attribute name>; for example, "is_init==true" references an is_init attribute in the table.
To reference an attribute (column) in the written DataFrame, embed the attribute name within curly braces — {<column name>}; for example, "{age}>18" references an age attribute in the DataFrame.
For example, an "{is_stable} == true AND "{version} > version)" condition indicates that each DataFrame item (row) should be written to the table only if it has an is_stable attribute whose value is true and the target table has a matching item (with the same primary-key attribute value) whose current version attribute value is lower than the value of the version attribute (column) in the DataFrame.

  • Requirement: Optional
  • Default Value: None

max_rows_in_msg
The maximum number of DataFrame rows to write in each message (i.e., the size of the write chunks). When the value of this parameter is 0 (default), each DataFrame is written in a single message.

  • Type: int
  • Requirement: Optional

  • Default Value: 0

Examples

Following are some usage examples for the write method of the Frames NoSQL backend:

  1. Write a single DataFrame with several rows (items) to a mytable table in the client’s data container (table). Use the pandas set_index fuction to define username as the DataFrame’s index column, which identifies the table’s primary-key attribute.

    table = "mytable"
    data = [
            ["lisaa", "Lisa", "Andrews", 11, "US", "DC"],
            ["toms", "Tom", "Stein", 10, "Israel", "TLV"],
            ["nickw", "Nickolas", "Weber", 15, "Germany", "Berlin"],
            ["georgec", "George", "Costas", 13, "Greece", "Athens"],
            ["christ", "Chris", "Thompson", 10, "UK", "London"],
            ["julyj", "July", "Johnes", 14, "US", "NY"]
    ]
    attrs = ["username", "first_name", "last_name", "age", "country", "city"]
    df = pd.DataFrame(data, columns=attrs)
    df.set_index("username", inplace=True)
    client.write(backend="nosql", table=table, dfs=df)
      

  2. Write a DataFrames iterator to a my_tables/students_11-15 table in the client’s data container (table). Use the read method’s index_cols parameter to set the index column (primary-key attribute) for all of the iterator DataFrames to the username attribute. Use a similar data set to that used in Example 1, but write only the DataFrames rows (items) for which the value of the age column (attribute) is larger or equal to 11 and smaller than 15 (condition).

    table = "/my_tables/students_11-15"
    attrs = ["username", "first_name", "last_name", "age", "country", "city"]
    dfs = [
           pd.DataFrame([["lisaa", "Lisa", "Andrews", 11, "US", "DC"]],
                        columns=attrs),
           pd.DataFrame([["toms", "Tom", "Stein", 10, "Israel", "TLV"]],
                        columns=attrs),
           pd.DataFrame([["nickw", "Nickolas", "Weber", 15, "Germany", "Berlin"]],
                        columns=attrs),
           pd.DataFrame([["georgec", "George", "Costas", 13, "Greece", "Athens"]],
                        columns=attrs),
           pd.DataFrame([["christ", "Chris", "Thompson", 10, "UK", "London"]],
                        columns=attrs),
           pd.DataFrame([["julyj", "July", "Johnes", 14, "US", "NY"]],
                        columns=attrs)
    ]
    client.write("nosql", table=table, dfs=dfs,
                 condition="{age}>=11 AND {age}<15", index_cols=["username"])

See Also