Object Names and Primary Keys

On This Page

Introduction

When adding a new data object to the platform, you provide the object's name or the required components of the name. The platform stores the object name in the __name system attribute, which is automatically created for each object, and uses it as the value of the object's primary key, which uniquely identifies the object within a collection (such as a NoSQL table).

Sharding and Sorting Keys

Primary keys affect the way that objects are stored in the platform, which in turn affects performance. The platform supports two types of object primary keys:

Simple primary key
A simple primary key is composed of a single logical key whose value uniquely identifies the object. This key is known as the object's sharding key. For example, a collection with a simple username primary key might have an object with the primary-key value "johnd", which is also the value of the object's sharding key.
Compound primary key
A compound primary key is composed of two logical keys — a sharding key and a sorting key — whose combined values uniquely identify the object. The value of a compound primary key is of the format <sharding-key value>.<sorting-key value>. All characters before the leftmost period in an object's primary-key value define the object's sharding-key value, and the characters to the right of this period (if exist) define its sorting-key value. For example, a collection with a compound primary key that is made up of a username sharding key and a date sorting key might have an object with the sharding-key value "johnd", the sorting-key value "20180602", and the combined unique compound primary-key value "johnd.20180602".

The platform divides the physical data storage into multiple units — data slices (also known as data shards or buckets). When a new data object is added, a hash function is applied to the value of its sharding key and the result determines on which slice the object is stored. All objects with the same sharding-key value are stored in a cluster on the same slice, sorted in ascending lexicographic order according to their sorting-key values (if exist). This design enables the support for faster NoSQL table queries that include a sharding-key and optionally also a sorting-key filter (see NoSQL read optimization).

For best-practice guidelines for defining primary keys, optimizing data and workload distribution, and improving performance, see Best Practices for Defining Primary Keys and Distributing Data Workloads.

Note
  • The value of a sharding key cannot contain periods, because the leftmost period in an object's primary-key value (name) is assumed to be a separator between sharding and sorting keys.

  • To work with a NoSQL table using Spark DataFrames or Presto, the table items must have a sharding-key user attribute, and in the case of a compound primary-key also a sorting-key user attribute; for more efficient range scans, use a sorting-key attribute of type string (see Best Practices for Defining Primary Keys and Distributing Data Workloads for more information). To work with a NoSQL table using V3IO Frames, the table items must have a primary-key user attribute. The values of such key user attributes must match the value of the item's primary key (name) and shouldn't be modified after the initial item ingestion. (The NoSQL Web API doesn't require such attributes and doesn't attach any special meaning to them if they exist.) To change an item's primary key, delete the existing item and create a new item with the desired combination of sharding and sorting keys and matching user key attributes, if required.

Object-Name Restrictions

The names of all data objects in the platform (such as items and files) are subject to the general file-system naming restrictions, including a maximum length of 255 characters. In addition —

  • A period in an object name indicates a compound name of the format <sharding key>.<sorting key>. See Sharding and Sorting Keys.

See Also