write Method
Description
Writes (ingests) data from pandas DataFrames to a NoSQL table.
- NoSQL tables in the platform don't need to be created prior to ingestion. When writing (ingesting) data into a table that doesn't exist, the table and all directories in the specified path are automatically created.
- When updating a table item (i.e., when writing an item with the same primary-key attribute value as an existing table item), the existing item is overwritten and replaced with the written item.
- All items that are written to a given table must conform to the same schema. For more information, see the Table Schema overview.
Syntax
write(backend, table, dfs, [condition="", max_rows_in_msg=0, index_cols=None])
The method has additional parameters that aren't currently supported for the NoSQL backend. Therefore, when calling the method, be sure to explicitly specify the names of all parameters after
Parameters
- backend
The backend type —
"nosql"
or"kv"
for the NoSQL backend. See Backend Types.- Type:
str
- Requirement: Required
- Type:
- table
The relative path to the backend data — a directory in the target data container (as configured for the client object) that represents a NoSQL table. For example,
"mytable"
or"examples/nosql/my_table"
.- Type:
str
- Requirement: Required
- Type:
- dfs
One or more DataFrames containing the data to write.
Note- The written DataFrames must include a single index column whose value is the value of the item's primary-key attribute (name); see Item Name and Primary Key.
You can either include the index column as part of the DataFrame definition (as typically done with pandas DataFrames) or by using the
index_cols parameter of thewrite method, which overrides any index-column definitions in the DataFrame. If you don't define an index column using either of these methods, Frames assigns the nameidx to the auto-generated pandas range-index column (pandas.RangeIndex ) and uses it as the item's primary-key attribute. You should therefore refrain from using the nameidx for non-index columns (regular user attributes) in NoSQL items. - See the maximum write DataFrame size restriction. If you need to write a larger amount of data, use multiple DataFrames.
- Type: A single DataFrame, a list of DataFrames, or a DataFrames iterator
- Requirement: Required
- The written DataFrames must include a single index column whose value is the value of the item's primary-key attribute (name); see Item Name and Primary Key.
You can either include the index column as part of the DataFrame definition (as typically done with pandas DataFrames) or by using the
- index_cols
A list of column (attribute) names to be used as index columns for the write operation, for all ingested DataFrames (as set in the
dfs parameter) regardless of any index-column definitions in the DataFrames. By default, the DataFrames' index columns are used. The NoSQL backend supports a single mandatory index column, which represents the item's primary-key attribute. See details in the index-column note for thedfs parameter.- Type:
[]str
- Requirement: Optional
- Default Value:
None
- Type:
- condition
A Boolean condition expression that defines a conditional logic for executing the write operation. See Condition Expression for syntax details and examples. The condition is applied to each written DataFrame row (item).
To reference an item attribute in the target table from within the expression, use the attribute name —
<attribute name>
; for example,"is_init==true"
references anis_init attribute in the table.
To reference an attribute (column) in the written DataFrame, embed the attribute name within curly braces — ; for example,{<column name>}
"{age}>18"
references anage attribute in the DataFrame.
For example, an"{is_stable} == true AND "{version} > version)"
condition indicates that each DataFrame item (row) should be written to the table only if it has anis_stable attribute whose value istrue
and the target table has a matching item (with the same primary-key attribute value) whose currentversion attribute value is lower than the value of theversion attribute (column) in the DataFrame.- Requirement: Optional
- Default Value:
None
- max_rows_in_msg
The maximum number of DataFrame rows to write in each message (i.e., the size of the write chunks). When the value of this parameter is 0 (default), each DataFrame is written in a single message.
- Type:
int
- Requirement: Optional
- Default Value:
0
- Type:
Examples
Following are some usage examples for the
-
Write a single DataFrame with several rows (items) to a
mytable table in the client's data container (table ). Use the pandasset_index fuction to defineusername as the DataFrame's index column, which identifies the table's primary-key attribute.table = "mytable" data = [ ["lisaa", "Lisa", "Andrews", 11, "US", "DC"], ["toms", "Tom", "Stein", 10, "Israel", "TLV"], ["nickw", "Nickolas", "Weber", 15, "Germany", "Berlin"], ["georgec", "George", "Costas", 13, "Greece", "Athens"], ["christ", "Chris", "Thompson", 10, "UK", "London"], ["julyj", "July", "Johnes", 14, "US", "NY"] ] attrs = ["username", "first_name", "last_name", "age", "country", "city"] df = pd.DataFrame(data, columns=attrs) df.set_index("username", inplace=True) client.write(backend="nosql", table=table, dfs=df)
-
Write a DataFrames iterator to a
my_tables/students_11-15 table in the client's data container (table ). Use theread method'sindex_cols parameter to set the index column (primary-key attribute) for all of the iterator DataFrames to theusername attribute. Use a similar data set to that used in Example 1, but write only the DataFrames rows (items) for which the value of theage column (attribute) is larger or equal to 11 and smaller than 15 (condition ).table = "/my_tables/students_11-15" attrs = ["username", "first_name", "last_name", "age", "country", "city"] dfs = [ pd.DataFrame([["lisaa", "Lisa", "Andrews", 11, "US", "DC"]], columns=attrs), pd.DataFrame([["toms", "Tom", "Stein", 10, "Israel", "TLV"]], columns=attrs), pd.DataFrame([["nickw", "Nickolas", "Weber", 15, "Germany", "Berlin"]], columns=attrs), pd.DataFrame([["georgec", "George", "Costas", 13, "Greece", "Athens"]], columns=attrs), pd.DataFrame([["christ", "Chris", "Thompson", 10, "UK", "London"]], columns=attrs), pd.DataFrame([["julyj", "July", "Johnes", 14, "US", "NY"]], columns=attrs) ] client.write("nosql", table=table, dfs=dfs, condition="{age}>=11 AND {age}<15", index_cols=["username"])