Platform Fundamentals

On This Page

Overview

This guide provides basic information for using the different platform interfaces, which will help you better understand the examples in the other tutorials and guides.

Using Application Services

The Services page of the platform dashboard displays information about running application services — which include both default services that are created as part of the platform’s deployment (such as the Web APIs or Presto services) and user-defined services (such as Spark or Jupyter Notebook). A service administrator can create, delete, enable, disable, restart, and configure application services and view service logs. A user-defined service is typically assigned to a specific running user and can also optionally be shared with all other users in the same tenant. Except for service administrators, who can view all services, users can see and use only the services that they’re running and shared services (including the default services). For more information about the platform’s application services, see Components, Services, and Development Ecosystem. For information about the required permissions for viewing, running, and managing application services in the platform, see Security.

Note
  • When running an application service from another service, the service is executed using the permissions of the running user of the parent service. For example, if user “c” logs into a shared web-shell service for running-user “a”, which is configured to work with a Spark service for running-user “b”, and runs Spark from the web shell — Spark is executed with the permissions of user “a” (the running user of the parent web-shell service), and not as user “c” (the running user of the Spark service) or user “b” (the logged-in user of the shell service).

  • Check the product software specifications and restrictions and release notes for additional restrictions and known issues related to application services.

Creating a New Service

Follow these steps to create a new application service from the dashboard; note that you must have the Service Admin management policy to create a service:

  1. In the side navigation menu, select Services.

  2. On the Services page, select the New Service option from the top action toolbar.

  3. Select the desired service type, configure the required parameters (including the service’s name and running user), and optionally configure additional parameters. You can also select to share the service with all users of the parent tenant.

    Note
    • Service resources — You can configure the memory and CPU resources for the service from the Resources section of the Common Parameters dashboard tab. In addition, for services such as Spark and Presto, you can configure the number of replicas (workers) for the service from the Custom Parameters tab. For empty parameter fields, the platform uses default system values. Note that the platform doesn’t perform logical validation of your configuration. It’s up to you to balance the resource needs of your application services, taking into account the available resources of your platform environment. When setting the resource limits, consider that an insufficient limit might result in the termination of the service. If you’re using the Iguazio trial, remember that your environment has limited resources (see the trial evaluation overview).

    • Dependent services — Some services allow you to configure related services to be used by the new service, from among the services that are accessible to the selected running user. For example, you can configure a Spark service to be used for running Spark jobs from a web-based shell, Jupyter Notebook, or Zeppelin service. (The Zeppelin service requires that you configure a related Spark service; the Jupyter Notebook and web-shell services use an internal Spark service by default.)

    • Configuration changes — You can change the service’s configuration at a later stage, after its initial deployment.

  4. Optionally repeat the previous steps to create as many services as you wish.

  5. When you’re done defining services, remember to select Apply Changes from the top action toolbar, and wait for confirmation of a successful deployment. Note that the deployment might take a while to complete, depending on the amount and type of services that you create.

Setting Data Paths

To view and access data containers, you need to know how to set the data paths for each platform interface:

Using Environment Variables
The platform’s web-based shell, Jupyter Notebook, and Zeppelin services have some useful predefined environment variables that you can use to set the data paths in your code or file-system commands — such as V3IO_USERNAME for the running user of the service, V3IO_HOME for the running-user directory in the “users” container, and V3IO_HOME_URL for a fully qualified v3io:// path to this running-user directory. The Spark, Presto, and file-system examples in the following sections demonstrate how to use these variables to define data paths.

RESTful Web and Management API Data Paths

To refer to a data container or to a specific directory or file in a container from a RESTful web or management API request, specify the path as part of the URL in the request header:

<API-endpoint URL>/<container name>[/<path to file or directory>]

For example, the following web-API request URL references the “bigdata” container:

https://default-tenant.app.mycluster.iguazio.com:8443/bigdata/

And this is a similar management-API request URL:

%!s(&lt;nil&gt;)/bigdata/

The following web-API request URL references a mytable table directory in an iguazio running-user directory in the “users” container:

https://default-tenant.app.mycluster.iguazio.com:8443/users/iguazio/mytable

When using the platform’s data-service web APIs, you can optionally set the relative file or directory path within the configured container in the request’s JSON body. For example, for a NoSQL Web API request, you can end the URL path in the previous example with the container name (users) and set the TableName data parameter in the request’s JSON body to "mydata/mytable".

For full details and examples, see the data-service web-API and management-API reference documentation.

Spark API Data Paths

To refer to data in a data container from Spark API code, such as Spark DataFrames, specify the data path as a fully qualified v3io path of the following format — where <container name> is the name of the parent data container and <data path> is the relative path to the data within the specified container:

v3io://<container name>/<data path>

When using a NoSQL DataFrame, you set the data source to "io.iguaz.v3io.spark.sql.kv".

For example:

    val nosql_source = "io.iguaz.v3io.spark.sql.kv"
    
    // Read from a "mytable" NoSQL table in a "mydata" directory in the "bigdata" container:
    var table_path = "v3io://bigdata/mydata/mytable/"
    var readDF = spark.read.format(nosql_source).load(table_path)
    
    // Read from a "mytable" table in the running-user directory of the "users" container.
    // The table_path assignments demonstrate alternative methods for setting the same path
    // for running-user "iguazio" (specified explicitly only in the first example):
    table_path = "v3io://users/iguazio/mytable"
    table_path = "v3io://users/" + System.getenv("V3IO_USERNAME") + "/mytable"
    table_path = "v3io://" + System.getenv("V3IO_HOME") + "/mytable"
    table_path = System.getenv("V3IO_HOME_URL") + "/mytable"
    readDF = spark.read.format(nosql_source).load(table_path)

    import os
    nosql_source = "io.iguaz.v3io.spark.sql.kv"
    
    # Read from a NoSQL table "mytable" in a "mydata" directory in the "bigdata" container:
    table_path = "v3io://bigdata/mydata/mytable/"
    df = spark.read.format(nosql_source).load(table_path)
    
    # Read from a "mytable" table in the running-user directory of the "users" container.
    # The table_path assignments demonstrate alternative methods for setting the same path
    # for running-user "iguazio" (specified explicitly only in the first example):
    table_path = "v3io://users/iguazio/mytable"
    table_path = "v3io://users/" + os.getenv("V3IO_USERNAME") + "/mytable"
    table_path = "v3io://" + os.getenv("V3IO_HOME") + "/mytable"
    table_path = os.getenv("V3IO_HOME_URL") + "/mytable"
    readDF = spark.read.format(nosql_source).load(table_path)

    For detailed information and examples, see the Spark datasets reference — and especially the Data Paths overview section and the Table Paths NoSQL DataFrame section — and the Getting Started with Data Ingestion Using Spark tutorial.

    Presto Data Paths

    To refer to a table in a data container from a Presto query, specify the table path using the following format — where <catalog> is the name of the Presto connector catalog (v3io for the Iguazio Presto connector), <container name> is the name of the table’s parent data container (the Presto schema), and <table path> is the relative path to the table within the specified container:

    [<catalog>.][<container name>.]<table path>

    To specify a path to a nested table, use the following syntax:

    [<catalog>.][<container name>.]"/path/to/table"

    The catalog and container (schema) names are marked as optional ([]) because you can select to configure default values for these parameters when starting the Presto CLI. For example, the presto wrapper that’s available in the platform command-line service environments preconfigures v3io as the default catalog.

    For example, following are Presto CLI queries that reference NoSQL tables in the platform’s data containers:

    # Query a "mytable" table in the "bigdata" container:
    SELECT * FROM v3io.bigdata.mytable;
    
    # Query a "mytable" table in the "iguazio" running-user directory of the "users" container:
    SELECT * FROM v3io.users."/iguazio/mytable";
    Note
    • When using the presto wrapper instead of the native Presto CLI, you can omit “v3io.” from the path:

      SELECT * FROM bigdata.mytable;
      SELECT * FROM users."/iguazio/mytable";
    • You can use a bash table-path variable and the Presto CLI’s execute option to replace the hardcoded running-user directory name in the second example (“iguazio”) with the V3IO_USERNAME environment variable:

      presto_table_path="v3io.users.\"/$V3IO_USERNAME/mytable\""
      presto --execute "SELECT * FROM $presto_table_path"
      

    Following is an example of an SQL query in a Python Jupyter Notebook, which uses Presto to query a “mytable” table in the running-user directory of the “users” container:

    presto_table_path = os.path.join('v3io.users."/' + os.getenv("V3IO_USERNAME") + '/mytable"')
    print("SELECT * FROM " + presto_table_path)
    %sql SELECT * FROM $presto_table_path

    For detailed information and examples, see the Presto reference,and especially the Table Paths overview section and the similar Presto CLI section that it references.

    File-System Data Paths

    Local File-System Data Paths

    To refer to data in the platform from a local file-system command, use the predefined “v3io” data mount:

    /v3io[/<container name>][/<path to file or directory>]

    To refer to the running-user directory in the “users” container, you can select to use the predefined “User” mount to this directory:

    /User/[<path to file or directory in the users/&lt;username&gt; directory>]

    For example:

    # List all data-container directories
    ls /v3io
    # List the contents of the "bigdata" container
    ls /v3io/bigdata/
    # List the contents of the "mydata" directory in the "bigdata" container
    ls -lF /v3io/bigdata/mydata/
    
    # Copy a myfile.txt file from a "mydata" directory in the "bigdata" container
    # to the running-user directory of the "users" container for user "iguazio".
    # All of the following syntax variations evaluate to the same copy command:
    cp /v3io/bigdata/mydata/myfile.txt /v3io/users/iguazio/
    cp /v3io/bigdata/mydata/myfile.txt /v3io/users/$V3IO_USERNAME
    cp /v3io/bigdata/mydata/myfile.txt /v3io/$V3IO_HOME
    cp /v3io/bigdata/mydata/myfile.txt /User

    Hadoop FS File-System Data Paths

    To refer to a data container or its contents from an Hadoop FS command, specify the data path as a fully qualified v3io path of the following format:

    v3io://<container name>/[<data path>]

    For example:

    # List the contents of the "bigdata" container
    hadoop fs -ls v3io://bigdata/
    # List the contents of the "mydata" directory in the "bigdata" container
    hadoop fs -ls -lF v3io://bigdata/mydata/
    
    # Copy a myfile.txt file from a "mydata" directory in the "bigdata" container
    # to the running-user directory of the "users" container for user "iguazio"
    # All of the following syntax variations evaluate to the same copy command:
    hadoop fs -cp v3io://bigdata/mydata/myfile.txt v3io://users/iguazio/
    hadoop fs -cp v3io://bigdata/mydata/myfile.txt v3io://users/$V3IO_USERNAME
    hadoop fs -cp v3io://bigdata/mydata/myfile.txt v3io://$V3IO_HOME
    hadoop fs -cp v3io://bigdata/mydata/myfile.txt $V3IO_HOME_URL

    Note
    • The URI generic-syntax specification requires that fully qualified paths contain at least three forward slashes (/). Therefore, to list the contents of a container’s root directory you must end the path with a slash, as demonstrated in the examples.

    • An Hadoop FS ls command on v3io:// or v3io:///, without referencing a specific container, lists the contents of the default “bigdata” container. (It doesn’t list all of the tenant containers like a local file-system ls command on /v3io.)

    Sending HTTP Requests

    The getting-started tutorials demonstrate how to issue HTTP requests using the platform’s RESTful APIs. The examples include specific instructions for sending requests with Postman, but you can use any other standard method for sending HTTP requests, such as using curl or Python code (as demonstrated, for example, in the API reference documentation).

    Sending Web-API Requests

    You can send HTTP web-API requests to the URL of a platform tenant’s web-APIs (web-gateway) service. Get this URL by copying the API URL of the “Web APIs” service from the Services dashboard page. You can select between two types of URLs:

    • HTTPS_Direct (recommended) — a URL of the format https://<tenant IP>:<web-APIs port>; for example, https://default-tenant.app.mycluster.iguazio.com:8443. Requests of this format are assigned to web-APIs servers by the DNS server of the web-APIs service: the DNS cache contains a random list of web-APIs servers, and the server attempts to assign each request to the first server on the list; if the first server becomes unavailable, the request is assigned to the next server in the list, and so forth. This is the recommended method in most cases, as it’s typically more efficient; a possible exception is a single web-APIs client.
    • HTTPS — a URL of the format https://webapi.<tenant IP>; for example, https://webapi.default-tenant.app.mycluster.iguazio.com. Requests of this format are redirected to web-APIs servers by the Kubernetes ingress of the web-APIs service, which selects a server per request.

    For an explanation on how to set data paths in the request, see the RESTful Web and Management API Data Paths section in this guide. For detailed information on how to structure the web-API requests, see the web-APIs reference, and especially Data-Service Web-API General Structure.

    Web-APIs Authentication

    A platform web-API request must include authentication of the sender’s identity; see Securing Your Web-API Requests. As explained in the reference documentation, you can select between several alternative authentication methods. For the methods that require an access key, you can get the access key from the Access Keys window that’s available from the dashboard user-profile menu, or by copying the value of the V3IO_ACCESS_KEY environment variable in a web-shell or Jupyter Notebook service.

    • Add an X-v3io-session-key header and set its value to a valid platform access key:

      X-v3io-session-key: <access key>
        

    • Add an Authorization header with an Amazon Simple Storage (S3) AWS signature authentication syntax, and just replace the AWS S3 signature with a platform access key:

    • Add an Authorization header with the Basic authentication-scheme token followed by a Base64 string that encodes the username and password login credentials; (note that Postman handles the encoding for you, as demonstrated in the getting-started tutorials):

      Authorization: Basic <Base64-encoded credentials>
        

    Sending Management-API Requests [Beta]

    You can send HTTP management-API requests [Beta] to port 8001 of your platform dashboard’s IP address or resolvable host domain name; for example, http://192.168.1.100:8001. In cloud environments, send the request, instead, to the HTTPS URL of the dashboard — for example, https://dashboard.default-tenant.app.mycluster.iguazio.com.

    For an explanation on how to set data paths in the request, see the RESTful Web and Management API Data Paths section in this guide. For detailed information on how to structure the management-API requests, see the management-APIs reference, and especially General Management-API Structure.

    Management-APIs Authentication

    All management-API operations other than Create Session require a session cookie to authenticate the identity of the sender. You acquire this cookie by issuing a Create Session request. You can do this from Postman by following these steps:

    1. Create a new request and set the request method to POST.

    2. In the request URL field, enter the following; replace <dashboard IP> with the IP address or host name of your dashboard:

      http://<dashboard IP>:8001/api/sessions/
          

      For example:

      http://192.168.1.100:8001/api/sessions/
          

    3. In the Headers tab, add a Content-Type header (Key) and set its value to application/json.

    4. In the Body tab, select the raw format and add the following JSON code; replace the <username> and <password> placeholders with your platform login credentials:

      {
          "data": {
              "attributes": {
                  "username": "<username>",
                  "password": "<password>"
              },
              "type": "session"
          }
      }
    5. Select Send to send the request, and then check the response. In the case of a successful request —

      • The Headers response tab contains a Set-Cookie header with a session element whose value is the session cookie (session=<cookie>). You can also see the cookie in the Cookies response tab (for example, j%3A%7B%22sid%22%3A%20%22a9ce242a-670f-47a8-9c8b-c6730f2794dc%22%7D). Copy and save this cookie. You’ll need to pass it as the value of the session parameter of the Cookie header in other management-API requests.

      • The Set-Cookie header also contains a max-age element, which contains the session’s time-to-live (TTL) period, in seconds; when this period elapses, the session expires and the cookie is no longer valid. The same value is also returned in the data.attributes.ttl response-body data element, which you can see in the Body tab.

      • In the Body tab, you can see the full JSON response data. Among the returned response-data attributes is a ttl attribute that contains the same session TTL value that’s returned in the max-age header parameter, and an expires_at attribute that contains the session’s expiration time as a Unix timestamp in seconds. The expiration time can also be seen as a date format in the Expires column of the Cookies response tab.

    What’s Next?