Working with Data Containers

On This Page

Overview

Data in the platform is stored in data containers, which serve as the platform’s data store (see Data Containers, Collections, and Objects ). There are two predefined containers — the default “bigdata” container and the “users” container — and you can also create additional custom containers. Within containers, you can create directories to organize your data. You’re encouraged to create custom data containers and container directories, according to your development needs.

You can easily view (list) the data containers in a given tenant (“your tenant”), create and delete containers and container directories, and browse the containers’ contents — all from a variety of alternative interfaces, as outlined in this tutorial.

For more information about data containers and the methods for referencing data within containers, see the Platform Fundamentals tutorial.

Before You Begin

This tutorial demonstrates how to perform basic tasks using different platform interfaces — dashboard, file system, web APIs, and management APIs. Unless you plan to use only the dashboard at this stage, it’s recommended that you first read the Platform Fundamentals guide, and specifically the sections that pertain to the interfaces that you wish to try out. For example:

  • To use the file-system interface, you need to create a command-line service from which to run your code — such as a web-based shell, Jupyter Notebook, or Zeppelin. See Creating a New Service.
    To understand how data paths are set in file-system commands, see File-System Data Paths.
  • To send web-API requests, you need to have the URL of your tenant’s web-APIs service (webapi), and either a platform username and password or an access key for authentication. To learn more and to understand how to structure the web-API requests, see Sending Web-API Requests. (The tutorial Postman examples use the username-password authentication method, but if you prefer, you can replace this with access-key authentication, as explained in the documentation.)
  • To send management-API requests, you need to have the URL of the management-API service and a session cookie for authentication. To learn more and understand how to structure management-API requests, see Sending Management-API Requests.

Listing Containers

You can view information about your tenant’s containers and related metadata from the dashboard, from the file-system interface, or by using the RESTful Container Management API or Simple-Object Web API. The dashboard also displays performance statistics for each container, which can also be retrieved with the management API [Beta].

Using the Dashboard

To view information about the data containers of your tenant, in the dashboard’s side navigation menu, select Data. The Data page displays a containers table that includes the name and performance-statistics summary of each container, as demonstrated in the following image:

Dashboard containers table

When you select a container from the table, the Browse tab, which is selected by default, allows you to browse the contents of the container; see Browsing a Container.

The Overview tab provides more detailed information about the selected container, as demonstrated for the “bigdata” container in the following image:

Dashboard containers table

The Data-Access Policy tab allows you to define data-access policy rules that restrict access to data in the container based on different criteria. For more information, see the Data-Access Policy Rules.

Using the File-System Interface

You can list the data containers of your tenant from a command-line shell by running a local file-system ls command on the tenant’s data root using the v3io data mount:

ls /v3io

Using the Container Management API

You can list the data containers of your tenant and see related metadata by using the List Containers operation of the platform’s RESTful Containers Management API [Beta].

For example, to send the request from Postman, do the following:

  1. Create a new request and set the request method to GET.

  2. In the request URL field, enter the following; replace <management-APIs URL> with the HTTPS URL of the platform dashboard:

    <management-APIs URL>/api/containers/
        

    For example:

    https://dashboard.default-tenant.app.mycluster.iguazio.com/api/containers/
        

  3. In the Headers tab, add the following headers:

    Key Value
    Content-Type application/json
    Cookie session=<cookie>; replace <cookie> with a session cookie returned in the Set-Cookie response header of a previous Create Session request.
  4. Select Send to send the request, and then check the response Body tab. Following is an example response body for a tenant that has only the two predefined containers (“bigdata” and “users”):

    {
        "included": [],
        "meta": {
            "ctx": "09348042905611315558"
        },
        "data": [
            {
                "attributes": {
                    "name": "bigdata",
                    "imports": [],
                    "created_at": "2019-03-17T12:41:10.258000+00:00",
                    "operational_status": "up",
                    "id": 1024,
                    "admin_status": "up",
                    "data_lifecycle_layers_order": [],
                    "data_policy_layers_order": [],
                    "properties": []
                },
                "type": "container",
                "id": 1024
            },
            {
                "attributes": {
                    "name": "users",
                    "imports": [],
                    "created_at": "2019-03-17T12:41:10.959000+00:00",
                    "operational_status": "up",
                    "id": 1025,
                    "admin_status": "up",
                    "data_lifecycle_layers_order": [],
                    "data_policy_layers_order": [],
                    "properties": []
                },
                "type": "container",
                "id": 1025
            }
        ]
    }

Using the Simple-Object Web API

You can list the data containers of your tenant by using the GET Service operation of the platform’s Simple-Object Web API, which resembles the Amazon Web Services S3 API.

For example, to send the request from Postman, do the following:

  1. Create a new request and set the request method to GET.

  2. In the request URL field, enter the URL of your tenant’s web-APIs service (webapi).

    For example:

    https://default-tenant.app.mycluster.iguazio.com:8443
        

  3. In the Authorization tab, set the authorization type to "Basic Auth" and enter your username and password in the respective credential fields.
  4. Select Send to send the request, and then check the response Body tab. Following is an example response body (in XML format) for a tenant that has only the two predefined containers (“bigdata” and “users”):

    <?xml version="1.0" encoding="UTF-8"?>
    <ListAllMyBucketsResult>
    <Owner>
        <ID>000000000000000000000000000000</ID>
        <DisplayName>iguazio</DisplayName>
    </Owner>
    <Buckets>
        <Bucket>
            <Name>bigdata</Name>
            <CreationDate>2019-03-17T12:41:10.258000+00:00</CreationDate>
            <Id>1024</Id>
        </Bucket>
        <Bucket>
            <Name>users</Name>
            <CreationDate>2019-03-17T12:41:10.959000+00:00</CreationDate>
            <Id>1025</Id>
        </Bucket>
    </Buckets>
    </ListAllMyBucketsResult>

Creating and Deleting Containers

You can create a new data container or delete an existing container from the dashboard or by using the RESTful Container Management API.

Note that to use a local file-system command to reference a new data container, you need to update the v3io data mount by restarting the command-line service (such as a web shell or Jupyter Notebook) after creating the container.

Warning
Take extra care when deleting containers, to avoid data loss or other undesirable consequences. It's recommended that you close all open handles to the container before you delete it. For example, deleting a container without first deleting a Nuclio V3IO volume that references the container might result in consumption of extra Kubernetes resources.

Using the Dashboard

Follow these steps to create a new container from the dashboard:

  1. Navigate to the Data page and select the New Container button:

    Dashboard - creae a new mycontainer container
  2. In the new-container window, enter a name and description for your new container. The following image demonstrates how to create a new container named “mycontainer”:

    Dashboard - create a new mycontainer container

After you create a container, you can see it in the containers table on the Data page. For example, the following image shows a table with the default “bigdata” container and a “mycontainer” container:

Dashboard - create a new mycontainer container

To delete containers from the dashboard, navigate to the Data page. In the containers table, check the check boxes next to the containers that you want to delete and then select the delete icon () from the action toolbar and verify the delete operation when prompted. The following image demonstrates how to delete a “mycontainer” container:

Dashboard - select container for deletion

Using the Container Management API

You can create and delete a container by using the Create Container and Delete Container operations of the platform’s RESTful Containers Management API [Beta], respectively.

For example, to send the requests from Postman, do the following:

Send a Create Container Request

  1. Create a new request and set the request method to POST.

  2. In the request URL field, enter the following; replace <management-APIs URL> with the HTTPS URL of the platform dashboard:

    <management-APIs URL>/api/containers/
        

    For example:

    https://dashboard.default-tenant.app.mycluster.iguazio.com/api/containers/
        

  3. In the Headers tab, add the following headers:

    Key Value
    Content-Type application/json
    Cookie session=<cookie>; replace <cookie> with a session cookie returned in the Set-Cookie response header of a previous Create Session request.
  4. In the Body tab, select the raw format and add the following JSON code; you can change the name and descriptions in the example, subject to the container naming restrictions:

    {
        "data": {
            "attributes":
            {
                "name":         "mycontainer",
                "description":  "Test container"
            },
            "type": "container"
        }
    }
  5. Select Send to send the request, and then check the response Body tab. For a successful request, the ID of the new container is returned in the response-data id element. Copy this ID, as it’s required by some of the other management operations, such as Delete Container. Following is an example Create Container response body for a new container with ID 1030:

    {
        "included": [],
        "meta": {
            "ctx": "13053711680930917876"
        },
        "data": {
            "relationships": {
                "mappings": {
                    "data": [
                        {
                            "type": "container_map",
                            "id": "2244cd09-5e7e-4957-a38b-c99d97c946a2"
                        }
                    ]
                },
                "created_by": {
                    "data": {
                        "type": "user",
                        "id": "6e040a9a-9403-44bd-8f90-a61e079c6c45"
                    }
                },
                "storage_class": {
                    "data": {
                        "type": "storage_class",
                        "id": "f8bec94d-9151-475d-98f6-962827ca49ad"
                    }
                },
                "owner": {
                    "data": {
                        "type": "user",
                        "id": "6e040a9a-9403-44bd-8f90-a61e079c6c45"
                    }
                },
                "tenant": {
                    "data": {
                        "type": "tenant",
                        "id": "b7c663b1-a8ee-49a9-ad62-ceae7e751ec8"
                    }
                },
                "active_mapping": {
                    "data": {
                        "type": "container_map",
                        "id": "2244cd09-5e7e-4957-a38b-c99d97c946a2"
                    }
                }
            },
            "attributes": {
                "description": "Test container",
                "imports": [],
                "created_at": "2019-03-18T10:05:54.400000+00:00",
                "operational_status": "up",
                "id": 1030,
                "admin_status": "up",
                "data_lifecycle_layers_order": [],
                "data_policy_layers_order": [],
                "properties": [],
                "name": "mycontainer"
            },
            "type": "container",
            "id": 1030
        }
    }

If you list your tenant’s containers, you can now see your container in the list, and you can browse and modify its contents in the same way as for the predefined containers. For example, you can see and browse your container from the dashboard’s Data page.

Send a Delete Container Request

  1. Create a new request and set the request method to DELETE.

  2. In the request URL field, enter the following; replace <management-APIs URL> with the HTTPS URL of the platform dashboard and <container ID> with the ID of the new container that you created in the previous step :

    <management-APIs URL>/api/containers/<container ID>
        

    For example, the following URL uses the container ID from the example in the previous step — 1030:

    https://dashboard.default-tenant.app.mycluster.iguazio.com/api/containers/1030
        

  3. In the Headers tab, add a Cookie header (Key) and set its value to session=<cookie>; replace <cookie> with a session cookie returned in the Set-Cookie response header of a previous Create Session request.

  4. Select Send to send the request, and then check the response. For a successful request, the response-data type is container_deletion and the container_id attribute shows the ID of the deleted container. Following is an example Delete Container response body for a container with ID 1030:

    {
        "included": [],
        "meta": {
            "ctx": "11505092891179116359"
        },
        "data": {
            "attributes": {
                "container_id": 1030,
                "job_id": "6e7f9bf5-4a7c-4efb-83b7-31785fa82183"
            },
            "type": "container_deletion",
            "id": 0
        }
    }

You can also confirm the container deletion from the dashboard: in the side navigation menu, select Data. The deleted container should no longer appear in the containers table on the Data page.

Creating and Deleting Container Directories

You can create and delete container directories from the dashboard or by using the file-system interface. You can use the same procedure to create a directory (folder) either as a direct child of the parent container or as a nested child of another directory.

Note that the dashboard allows you to delete only empty directories. Therefore, to delete a directory with content, it’s typically better to use the file-system interface to run a delete recursive command.

Table and Stream Directories

NoSQL tables and streams that you create in the platform are stored in container directories; for more information, see the NoSQL Databases and Streams concepts documentation. Note:

  • When you create a new table or stream, all directories within the target container in the specified path are created automatically if they don’t already exist.

  • To delete a table or stream, simply delete the respective directory, as outlined in this tutorial.

Warning
Take extra care when performing a recursive delete operation, to avoid losing valuable data.

Using the Dashboard

Follow these steps to create a new container directory from the dashboard:

  1. Navigate to the Data page and select a container from the containers table.

  2. In the Browse tab (selected by default), select the new-folder icon () from the action toolbar and then enter the name of the new directory (folder) when prompted.

    Dashboard - creat a new container directory

To delete a container directory from the dashboard, you must first delete all files and subdirectories in the directory. You cannot delete a directory with content from the dashboard. To delete an empty container directory, select the directory from the browse table in the Browse container data tab. Check all items in the directory and then select the delete icon () from the action toolbar and confirm the delete operation when prompted. After you delete all items in the directory, return to the browse table and repeat the procedure for any non-empty directory that you want to delete. When you’re done, check the check boxes next to the directories that you want to delete in the container browse table, select the delete icon from the action toolbar, and confirm the delete operation. The following image demonstrates how to delete an empty container directory:

Dashboard containers table

Using the File-System Interface

You can create and delete (remove) directories by running a [mkdir], [rm -r], or [rmdir] file-system command, respectively, from a command-line shell.

For example:

  • The following commands create a bigdata directory in the default “bigdata” container and a directory by the same name in the running-user directory of the “users” container:

    Local file-system —

    mkdir -p /v3io/bigdata/mydata
    mkdir -p /User/mydata
        

    Hadoop FS —

    hadoop fs -mkdir v3io://bigdata/mydata
    hadoop fs -mkdir $V3IO_HOME_URL/mydata
        

  • The following commands remove (delete) the container directories created in the previous examples:

    Local file-system —

    rm -r /v3io/bigdata/mydata
    rm -r /User/mydata
        

    Hadoop FS —

    hadoop fs -rm -r v3io://bigdata/mydata
    hadoop fs -rm -r $V3IO_HOME_URL/mydata
        

Note
You can replace /User in the local-file system examples with /v3io/$V3IO_HOME, /v3io/users/$V3IO_USERNAME, or /v3io/users/<username> (for example, /v3io/users/iguazio for running-user “iguazio”).
You can replace $V3IO_HOME_URL in the Hadoop FS examples with v3io://$V3IO_HOME, v3io://users/$V3IO_USERNAME, or v3io://users/<username> (for example, v3io://users/iguazio for running-user “iguazio”).
See File-System Data Paths.

Browsing a Container

You can browse the contents of a container and its directories from the dashboard, by using the file-system interface, or by using the Simple-Object Web API.

Using the Dashboard

To browse the contents of a container from the dashboard, navigate to the Data page and select a container from the containers table. The Browse tab is selected by default and displays the contents of the container. See, for example, the containers-table image in the new-container creation instructions.

To view a directory’s metadata, select the directory (folder) in the container’s browse table by single-clicking the directory name or checking the adjacent check box. You can then also perform directory actions (such as delete) by selecting the relevant icon from the action toolbar. The following image demonstrates a selected mydata directory in the default “bigdata” container:

Dashboard - select a directory in the containers table

To view a directory’s contents, double-click the directory in the browse table or select the directory from the container’s navigation tree, as demonstrated in the following image:

Dashboard - select a directory from the container navigtation tree

Using the File-System Interface

You can run a file-system list-directory command (ls) to browse the contents of a container or specific container directory from a command-line shell.

For example:

  • List the contents of the root directory of the default “bigdata” container:

    Local file-system —

    ls -lF /v3io/bigdata/
        

    Hadoop FS —

    hadoop fs -ls v3io://bigdata/
        

  • List the contents of a users/iguazio/mydata container directory where “iguazio” is the running user of the command-line shell service ($V3IO_USERNAME). All of the following syntax variations execute the same copy command:

    Local file-system —

    ls -lFA /User/mydata
    ls -lFA /v3io/$V3IO_HOME/mydata
    ls -lFA /v3io/users/$V3IO_USERNAME/mydata
    ls -lFA /v3io/users/iguazio/mydata/
        

    Hadoop FS —

    hadoop fs -ls $V3IO_HOME_URL/mydata
    hadoop fs -ls v3io://$V3IO_HOME/mydata/
    hadoop fs -ls v3io://users/$V3IO_USERNAME/mydata/
    hadoop fs -ls v3io://users/iguazio/mydata/
        

Using the Simple-Object Web API

You can list the objects in a container by using the GET Container operation of the platform’s Simple-Object Web API, which resembles the Amazon Web Services S3 API.

For example, to send the request from Postman, do the following:

  1. Create a new request and set the request method to GET.

  2. In the request URL field, enter the following; replace <web-APIs URL> with the URL of the parent tenant’s web-APIs service (webapi), and replace <container name> with the name of the container that you want to browse:

    <web-APIs URL>/<container name>
        

    For example, the following URL sends a request to web-API service URL https://default-tenant.app.mycluster.iguazio.com:8443 to browse the “bigdata” container:

    https://default-tenant.app.mycluster.iguazio.com:8443/bigdata
        

  3. In the Authorization tab, set the authorization type to "Basic Auth" and enter your username and password in the respective credential fields.
  4. Select Send to send the request, and then check the response Body tab. Following is an example response body (in XML format):

    <?xml version="1.0" encoding="UTF-8"?>
    <ListBucketResult>
    <Name>bigdata</Name>
    <Prefix/>
    <Marker/>
    <Delimiter>/</Delimiter>
    <NextMarker>tmp</NextMarker>
    <MaxKeys>1000</MaxKeys>
    <IsTruncated>false</IsTruncated>
    <CommonPrefixes>
        <Prefix>mydata/</Prefix>
    </CommonPrefixes>
    <CommonPrefixes>
        <Prefix>naipi_files/</Prefix>
    </CommonPrefixes>
    <CommonPrefixes>
        <Prefix>tmp/</Prefix>
    </CommonPrefixes>
    </ListBucketResult>

What’s Next?

See the Ingesting and Preparing Data tutorial to learn about the different ways of ingesting data into the platform and preparing the data for the next step in your pipeline.