NoSQL Table Schema Reference

On This Page

Overview

To support reading and writing NoSQL data using structured-data interfaces — such as Spark DataFrames, Presto, and V3IO Frames (“Frames”) — the platform uses a schema file that defines the schema of the data structure. When writing NoSQL data in the platform using a Spark or Frames DataFrame, the schema of the data table is automatically identified and saved and then retrieved when using a structure-data interface to read data from the same table (unless you explicitly define the schema for the read operation). However, to use a structure-data interface to read NoSQL data that was not written in this manner, you first need to define the table schema. The schema is stored as a JSON file (.#schema). You don’t typically need to manually create or edit this file. Instead, use any of the supported methods to define or update the schema of a NoSQL table:

  • Spark — do one of the following as part of a NoSQL Spark DataFrame read operation. For more information, see Defining the Table Schema in the Spark NoSQL DataFrame reference:

    • Use the custom inferSchema option to infer the schema (recommended).
    • Define the schema programmatically.
      Note
      Programmatically created table schemas don’t support range-scan or even-distribution table queries.
  • Presto — use the custom v3io.schema.infer Presto CLI command to generate a schema file. For more information, see Defining the NoSQL Table Schema in the Presto reference.

  • Frames — use the infer_schema or infer command of the NoSQL backend’s execute client method to generate a schema file.

The Item-Attributes Schema Object (‘fields’)

The NoSQL-table schema JSON file contains a fields array object with one or more objects that describe the table’s attributes (columns). The attribute object has three fields:

name

The name of the attribute (column). For example, "id" or "age".

  • Type: String
type

The attribute’s data type (i.e., the type of the data that is stored in the column). The type can be one of the following string values — "boolean", "double", "long", "null", "string", or "timestamp". The platform implicitly converts integer and short values to long values ("long") and floating-point values to double-precision values ("double").

  • Type: String in the schema file; Spark SQL data type when defining the schema programmatically using a Spark DataFrame
Spark DataFrame Programmatic Schema Definition Note
When defining the table shcema programmatically as part of a Spark DataFrame read operation, use the Spark SQL data types that match the supported schema-file attribute types (such as StringType for "string" or LongType for "long"). When writing the data to the NoSQL table, the platform will translate the Spark data types into the relevant attribute data types and perform any necessary type conversions.
nullable

Indicates whether the value is “nullable”. If true, the attribute value can be null.

  • Type: Boolean

The Item-Key Schema Objects (‘key’ and ‘sortingKey’)

The NoSQL-table schema JSON file contains a key object that identifies the table’s sharding-key attribute, and optionally also a sortingKey object that identifies the table’s sorting-key attribute. These attributes are used to determine an item’s name and primary-key value, which uniquely identifies items in the table. See Object Names and Primary Keys.

key

The name of the table’s sharding-key attribute, which together with the sorting-key attribute (sortingKey), if defined, determines the primary-key values of the table items. For example, "id".

  • Type: String
sortingKey

The name of the table’s sorting-key attribute, if defined, which together with the sharding-key attribute (key) determines the primary-key values of the table items. For example, "date".

  • Type: String
Faster Primary-Key Queries
The schema’s key objects enable supporting faster NoSQL table reads (queries). See the Presto read-optimization and the NoSQL Spark DataFrame range-scans reference documentation.

See Also