API Data Paths

On This Page

Overview

The data containers and their contents are referenced differently depending on the programming interface. You need to know how to set the data paths for each interface, as outlined in this guide:

Predefined Environment Variables

The platform's command-line services (Jupyter Notebook and the web shell) predefine the following environment variables for simplifying access to the running-user directory of the predefined "users" container:

  • V3IO_USERNAME — set to the username of the running user of the Jupyter Notebook service.
  • V3IO_HOME — set to the running-user directory in the "users" container — users/<running user>.
  • V3IO_HOME_URL — set to the fully qualified v3io path to the running-user directory — v3io://users/<running user>.

RESTful Web and Management API Data Paths

To refer to a data container or to a specific directory or file in a container from a RESTful web or cluster-management API request, specify the path as part of the URL in the request header:

<API-endpoint URL>/<container name>[/<path to file or directory>]

For example, the following web-API request URL references the "projects" container:

https://default-tenant.app.mycluster.iguazio.com:8443/projects/

And this is a similar cluster-management API ("management API") request URL:

https://dashboard.default-tenant.app.mycluster.iguazio.com/projects/

The following web-API request URL references a mytable table directory in an iguazio running-user directory in the "users" container:

https://default-tenant.app.mycluster.iguazio.com:8443/users/iguazio/mytable

When using the platform's data-service web APIs, you can optionally set the relative file or directory path within the configured container in the request's JSON body. For example, for a NoSQL Web API request, you can end the URL path in the previous example with the container name (users) and set the TableName data parameter in the request's JSON body to "mydata/mytable".

For full details and examples, see the data-service web-API reference documentation.

Frames API Data Paths

When using the V3IO Frames (Frames) Python API, you create a client object for a specific data container; the container name is specified in the container parameter of the client constructor API. For example:

import v3io_frames as v3f
# Create a client object for the "users" container:
client = v3f.Client("framesd:8081", container="users", token="e8bd4ca2-537b-4175-bf01-8c74963e90bf")

To refer to a specific data collection — such as a NoSQL or TSDB table or a stream — you specify in the relevant Client method parameter the relative data path within the container of the parent client object. In most cases, the data path is set in the table parameter. For example:

# Read from a "mytable" table in the root directory of the `client` object's
# container:
df = client.read(backend="kv", table="mytable")

# Read from a "mytable" table in the running-user directory (`V3IO_USERNAME`)
# of the `client` object's container (typically for the "users" container):
tsdb_table = os.path.join(os.getenv("V3IO_USERNAME"), "mytable")
df = client.read(backend="tsdb", table=tsdb_table)

# Read from a "drivers" stream in a "my_streams" directory in the `client`
# object's container:
stream = "/my_streams/drivers"
df = client.read(backend="stream", table="/my_streams/drivers", seek="earliest")

For detailed information and examples, see the Frames API reference.

Spark API Data Paths

To refer to data in a data container from Spark API code, such as Spark DataFrames, specify the data path as a fully qualified v3io path of the following format — where <container name> is the name of the parent data container and <data path> is the relative path to the data within the specified container:

v3io://<container name>/<data path>

When using a NoSQL DataFrame, you set the data source to "io.iguaz.v3io.spark.sql.kv".

For example:

    val nosql_source = "io.iguaz.v3io.spark.sql.kv"
    
    // Read from a "mytable" NoSQL table in a "mydata" directory in the "projects" container:
    var table_path = "v3io://projects/mydata/mytable/"
    var readDF = spark.read.format(nosql_source).load(table_path)
    
    // Read from a "mytable" table in the running-user directory of the "users" container.
    // The table_path assignments demonstrate alternative methods for setting the same path
    // for running-user "iguazio" (specified explicitly only in the first example):
    table_path = "v3io://users/iguazio/mytable"
    table_path = "v3io://users/" + System.getenv("V3IO_USERNAME") + "/mytable"
    table_path = "v3io://" + System.getenv("V3IO_HOME") + "/mytable"
    table_path = System.getenv("V3IO_HOME_URL") + "/mytable"
    readDF = spark.read.format(nosql_source).load(table_path)
    
    import os
    nosql_source = "io.iguaz.v3io.spark.sql.kv"
    
    # Read from a NoSQL table "mytable" in a "mydata" directory in the "projects" container:
    table_path = "v3io://projects/mydata/mytable/"
    df = spark.read.format(nosql_source).load(table_path)
    
    # Read from a "mytable" table in the running-user directory of the "users" container.
    # The table_path assignments demonstrate alternative methods for setting the same path
    # for running-user "iguazio" (specified explicitly only in the first example):
    table_path = "v3io://users/iguazio/mytable"
    table_path = "v3io://users/" + os.getenv("V3IO_USERNAME") + "/mytable"
    table_path = "v3io://" + os.getenv("V3IO_HOME") + "/mytable"
    table_path = os.getenv("V3IO_HOME_URL") + "/mytable"
    readDF = spark.read.format(nosql_source).load(table_path)
    

    For detailed information and examples, see the Spark datasets reference — and especially the Data Paths overview and the Table Paths NoSQL DataFrame sections; and the Spark examples in the platform's tutorial Jupyter notebooks.

    Trino Data Paths

    To refer to a table in a data container from a Trino query, specify the table path using the following format — where <catalog> is the name of the Trino connector catalog (v3io for the Iguazio Trino connector, <container name> is the name of the table's parent data container (the Trino schema), and <table path> is the relative path to the table within the specified container:

    [<catalog>.][<container name>.]<table path>
    

    To specify a path to a nested table, use the following syntax:

    [<catalog>.][<container name>.]"/path/to/table"
    

    The catalog and container (schema) names are marked as optional ([]) because you can select to configure default values for these parameters when starting the Trino CLI. For example, the trino wrapper that's available in the platform command-line service environments preconfigures v3io as the default catalog.

    For example, following are Trino CLI queries that reference NoSQL tables in the platform's data containers:

    # Query a "mytable" table in the "projects" container:
    SELECT * FROM v3io.projects.mytable;
    
    # Query a "mytable" table in the "iguazio" running-user directory of the "users" container:
    SELECT * FROM v3io.users."/iguazio/mytable";
    
    Note
    • When using the trino wrapper instead of the native Trino CLI, you can omit "v3io." from the path:

      SELECT * FROM projects.mytable;
      SELECT * FROM users."/iguazio/mytable";
      
    • You can use a bash table-path variable and the Trino CLI's execute option to replace the hardcoded running-user directory name in the second example ("iguazio") with the V3IO_USERNAME environment variable:

      trino_table_path="v3io.users.\"/$V3IO_USERNAME/mytable\""
      trino --execute "SELECT * FROM $trino_table_path"
      

    Following is an example of an SQL query in a Python Jupyter Notebook, which uses Trino to query a "mytable" table in the running-user directory of the "users" container:

    trino_table_path = os.path.join('v3io.users."/' + os.getenv("V3IO_USERNAME") + '/mytable"')
    print("SELECT * FROM " + trino_table_path)
    %sql SELECT * FROM $trino_table_path
    

    For detailed information and examples, see Using Trino, and especially the Table Paths overview and the similar Trino CLI guide that it references.

    File-System Data Paths

    Local File-System Data Paths

    To refer to data in the platform from a local file-system command, use the predefined "v3io" data mount:

    /v3io[/<container name>][/<path to file or directory>]
    

    To refer to the running-user directory in the "users" container, you can select to use the predefined "User" mount to this directory:

    /User/[<path to file or directory in the users/<username> directory>]
    

    For example:

    # List all data-container directories
    ls /v3io
    # List the contents of the "projects" container
    ls /v3io/projects/
    # List the contents of the "mydata" directory in the "projects" container
    ls -lF /v3io/projects/mydata/
    
    # Copy a myfile.txt file from a "mydata" directory in the "projects" container
    # to the running-user directory of the "users" container for user "iguazio".
    # All of the following syntax variations evaluate to the same copy command:
    cp /v3io/projects/mydata/myfile.txt /v3io/users/iguazio/
    cp /v3io/projects/mydata/myfile.txt /v3io/users/$V3IO_USERNAME
    cp /v3io/projects/mydata/myfile.txt /v3io/$V3IO_HOME
    cp /v3io/projects/mydata/myfile.txt /User
    

    Hadoop FS File-System Data Paths

    To refer to a data container or its contents from an Hadoop FS command, specify the data path as a fully qualified v3io path of the following format:

    v3io://<container name>/[<data path>]
    

    For example:

    # List the contents of the "projects" container
    hadoop fs -ls v3io://projects/
    # List the contents of the "mydata" directory in the "projects" container
    hadoop fs -ls -lF v3io://projects/mydata/
    
    # Copy a myfile.txt file from a "mydata" directory in the "projects" container
    # to the running-user directory of the "users" container for user "iguazio"
    # All of the following syntax variations evaluate to the same copy command:
    hadoop fs -cp v3io://projects/mydata/myfile.txt v3io://users/iguazio/
    hadoop fs -cp v3io://projects/mydata/myfile.txt v3io://users/$V3IO_USERNAME
    hadoop fs -cp v3io://projects/mydata/myfile.txt v3io://$V3IO_HOME
    hadoop fs -cp v3io://projects/mydata/myfile.txt $V3IO_HOME_URL
    
    Note
    The URI generic-syntax specification requires that fully qualified paths contain at least three forward slashes (/). Therefore, to list the contents of a container's root directory you must end the path with a slash, as demonstrated in the examples.

    See Also