Security

On This Page

Overview

The platform implements multiple mechanisms to secure access to resources and keep your data safe. All user and security management is done under a single pane of glass: the configuration is done in one place, in the user-friendly graphical platform dashboard, and applied across all platform interfaces. The result is a significantly simplified, yet robust, data-security solution that helps organizations meet their compliance objectives.

The platform allows you to define local users and import users from an external identity provider (IdP), authenticate user identities, and control users’ access to resources, including the ability to define fine-grained data-access policies. To ensure proper security, the platform uses time-limited sessions and supports the HTTP Secure (HTTPS) protocol. You can view logs for security events and user actions (such as a failed login or deletion of a data container) on the Events | Audit dashboard tab.

HTTP Secure Data Transmission

For enhanced security, the platform’s RESTful web and management APIs support the HTTP Secure (HTTPS) protocol (also known as HTTP over TLS), as defined in the RFC 2818 specification. See HTTPS Requests in the Securing Your Web-API Requests reference.

Authentication

Authentication is the process of validating a user’s identity before granting the user access to a specific resource. Before granting a user access to resources, the platform verifies (authenticates) the identity of the user and then ensures that the user has the required permissions to perform the requested operation (see Authorization). To support authentication, the platform uses time-limited sessions. The default session time-to-live (TTL) period is 24 hours but you can configure a different duration. When this period elapses, the session expires and a new session must be created.

Authentication of data-access requests is done using data sessions, which are handled transparently by the platform.

Authentication of management requests is done using management sessions, which are created transparently when performing operations from the platform dashboard, or handled by the user using the platform’s RESTful management APIs [Beta]. These APIs use a session-based HTTP scheme to support user authentication and authorization: access to management API resources requires a time-limited session cookie that is used to authenticate the sender of the request with a username and password, and determine the sender’s authorization to perform the requested operation. See the Overview of the Sessions Management API [Beta].

In addition, the platform’s web APIs support user authentication by using either the username/password Basic HTTP authentication scheme or custom access-key (session-key) authentication. With either method, the user provides an authentication header with user credentials that are verified by the platform as a condition for sending the request. See HTTP User Authentication in the Securing Your Web-API Requests reference.

The authentication of the user credentials can be done locally, using the platform’s built-in user management, or using an external Identity Provider (IdP) — currently Microsoft Active Directory (AD). When an IdP is configured, it is used to authenticate the identity of all its imported users in the platform. This doesn’t prevent you from also defining local users and using the platform to authenticate them. For more information about using an IdP, see Using an External Identity Provider (IdP).

Note
In the event of a change in the management policies of an authenticated user, the authentication token is revoked and the session expires.

Authorization

Authorization is the process of granting a user permission to perform a specific action or access a specific resource based on predefined authorization rules. To support authorization, the platform uses policies, which are as set of permissions that govern the ability to access resources. There are two types of policies:

  • Management policies, which are assigned to users and user groups to determine management-related permissions. For example, the permission to create a storage pool or restart a cluster is reserved to users who have the IT Admin management policy, and the permission to access the data is reserved to users who have the Data management policy.
  • Data-access policies, which are used to define fine-grained rules for determining data-access permissions. These policies are used as part of a multi-layered data-access authorization scheme, which also involves the Data management policy and POSIX ACLs.
User-Group Permissions Inheritance
A user inherits the management policies and POSIX permissions of all user groups to which the user belongs. However, user-group permissions in data-access policy rules are checked only against a user’s primary group.

Management Policies

Every user and user group (whether locally created or imported) must be assigned one or more of the predefined management policies. These policies define resource-access permissions with management aspects that are applicable globally throughout the platform. The management policies are assigned by a security administrator, which is any user with the Security Admin management policy, including the predefined security_admin user. For more information about user management in the platform, see Platform Users.

Predefined Management Policies

These are the predefined management policies that a security administrator can assign to users and user groups:

  • Application Admin — responsible for all container operations, such as creating data containers, and for defining data-access policies; can view and use application services for the current user; can view the pipelines dashboard.
    All locally created and imported users in the platform (but not the predefined users) are automatically assigned this policy.

  • Application Read Only — can view all reports without editing; can view and use application services for the current user; can view the pipelines dashboard.

  • Data — can access data and run application services. The specific access level is derived from the data-access policies and POSIX ACLs.
    This policy enables the implicit creation of data sessions, which are used for securing access to data.
    All locally created and imported users in the platform (but not the predefined users) are automatically assigned this policy.

  • Function Admin — responsible for managing and developing Nuclio serverless functions.

  • IT Admin — responsible for all IT operations, such as defining storage pools or stopping and starting a cluster. This policy includes permissions for viewing event logs and for managing cluster support logs; for more information, see Logging, Monitoring, and Debugging.
    The predefined tenancy_admin user is assigned this policy together with the Tenant Admin policy.

  • Security Admin — responsible for managing users and user groups. This includes creating and deleting users and user groups, assigning management policies, and integrating the platform with a supported identity provider (see Using an External Identity Provider (IdP)). (Note that all users can view information for their own user profile and edit some of the properties, including the password. For more information, see Platform Users.) This policy also includes permissions for viewing audit event logs; for more information, see Logging, Monitoring, and Debugging.
    The predefined security_admin user is assigned this policy.

  • Service Admin — responsible for managing application services, including creating, configuring, restarting, and deleting user-defined services, configuring and restarting relevant default services, and managing service logs; can view the pipelines dashboard. This policy also includes permissions for viewing the pipelines dashboard and for viewing application-service logs from the log-forwarder service; for more information, see Logging, Monitoring, and Debugging.

  • Tenant Admin — responsible for managing tenants, including creating and deleting tenants.
    The predefined tenancy_admin user is assigned this policy together with the IT Admin policy.

Services Management Policies
  • To view the Services dashboard page, a user must have the Service Admin, Application Admin, or Application Read Only management policy.
    The application policies enable viewing services that are owned by or shared with the logged-in user — i.e., services for which the user is the running user, shared services, and tenant-wide services without a running user.
    The Service Admin policy enables viewing all services of the parent tenant.
  • To run services and be assigned as the running user of a service, a user must have the Data management policy.
  • To manage (administer) services, a user must have the Service Admin management policy. A service administrator can create, delete, disable, enable, or restart services, change service configurations, and view service logs for all users.

Data-Access Authorization

The platform allows you to define fine-grained policies for restricting access to the data. For example, you can restrict the right to read a payments table that contains sensitive data to members of your organization’s finance group, or limit the write privileges for updating the online transactions stream to members of the operational team.

Multi-Layered Authorization

The platform uses a multi-layered data-authorization scheme: each data-service operation — read, write, update, delete, etc. — is processed and examined in three layers, to ensure that the environment is protected and secured. Each layer can add to the restrictions of the previous layer:

The "Data" Management Policy
As a preliminary step to accessing data in the platform, a user must have the Data management policy. This policy enables the implicit creation of data sessions, which are used for securing access to data. (The tenancy_admin and security_admin predefined users don’t have this policy and therefore cannot access data or view the dashboard’s Data page.)
Data-Access Policies
Data-access policies allow defining a set of advanced rules that are used by the platform to determine whether to grant or restrict access to a specific data resource and to what extent. You can use data-access policies, for example, to create a subnetwork (subnet) whitelist, define interface data-access eligibility, restrict access to a table only to specific user groups, or give only some users read-only permissions for a specific file. See additional information in the Data-Access Policy Rules section.
POSIX ACLs
You can use portable operating-system interface access control lists (POSIX ACLs) to define file-system permissions that further restrict user or user-group access to specific files and directories.

The following diagram illustrates the platform’s multi-layered data-authorization scheme:

Multi-layered authorization diagaram

In most solutions, too many policy rules that need to inspect every data operation will come at a cost and may cause a performance degradation and low throughput, as the inspection takes time. However, in the Iguazio Data Science Platform, the data-access policy rules are compiled and stored in an optimized binary format on every policy change — rule addition, removal, or update. This allows the platform to process the rules in a fast and effective manner, resulting in high-performance processing for each data request, in line rate, while keeping the environment highly secured.

Platform security-rules procsesing diagram

Note
For symbolic links, the platform requires that both the data-access permissions of the source and destination locations are met.

Data-Access Policy Rules

Users with the Application Admin management policy (such as the predefined security_admin user) can define a set of fine-grained data-access policy rules. These rules are checked for each data operation and are used to determine whether to allow or deny the data request. Data-access policy rules are defined in the context of a specific data container and apply to all data objects in the container, regardless of their type.

Defining Rules

Data-access policy rules are managed from the Data | <container> | Data-Access Policy dashboard tab, which displays the predefined data-access policy rules and layers and options for adding and editing rules, groups, and layers, as demonstrated for the “users” data container in the following image:

Dashboard Data-Access Policy tab for the users container

A rule must belong to a data-access layer. You can either add rules to one of the predefined layers for the parent data container or create your own layer: select the New Layer option from the New Rule drop-down menu in the top action toolbar; in the new-layer dialog window, enter the layer name and select Create. The following image demonstrates creation of a new layer named “Default layer”:

Create new 'Default layer' data-access policy layer

You can add rules directly to a layer or group multiple rules into one or more rule groups within a layer. To add a new group, select the New Group option from the New Rule drop-down menu in the action toolbar; in the new-group dialog window, enter the group name, select a parent layer, and select Create.

The purpose of the layers and groups is to help you manage your rules and easily reorder rules to change the processing logic, as explained in the Rules Processing section. You can rename a layer or group by selecting and editing the name in the rules table, and you can delete it by selecting the delete icon () for the relevant table entry.

To add a new data-access policy rule, select the New Rule option from the action toolbar; in the new-rule dialog window, enter the rule name, select a parent layer and optionally a parent group, and select Create. The following image demonstrates creation of a new rule named “Rule 1” in a layer named “Default layer”:

Create new data-access policy rule

Note
Remember to select Apply Changes from the pending-changes toolbar to save your changes.

After you create a rule, select it from the rules table to display the rule pane and define the permissions for accessing the data based on one or more of the following characteristics (match criteria).

Note
All match-criteria rule sections, whether defined in the same tab or in different tabs, are accumulative (“AND”), but the values in each section are alternative (“OR”), except where otherwise specified. For more information, see the Rules Processing explanation and the Examples.
Sources

A rule can be restricted to specific sources.

Dashboard data-access policy rule - Sources tab

Currently, the platform support an Interfaces source type, which is an interface for accessing the data:

  • Web APIs &mdasb; the platform’s web-APIs, which are available via the web-APIs service (webapi)
  • V3io Daemon — the platform’s core daemon service (v3io-daemon), which connects application services (such as Spark and Presto) to the platform’s data layer.
  • File system — Linux file-system operations.
Users

A rule can be restricted to a specific list of predefined users or user groups. Note that user-group match criteria in data-access policy rules are applicable only to the primary group of the user who attempts to access the data.

Dashboard data-access policy rule - Users tab

Resources

A rule can be restricted to specific data resources.

Dashboard data-access policy rule - Resources tab

A resource can be defined as a path within the container, such as the path to a table or stream or to a subdirectory or file.

A resource can also be defined as a logical category of data — such as audio, video, logs, or documents. For a list of all resource data categories and the file extensions that they represent, see Data Categories.

After defining the match criteria for the rule, you define the data-access permissions to be applied when there’s a full match. You can select whether to allow or deny access to the data and to what extent. For example, you can grant only read permissions, deny only the create and delete permissions, or allow or deny full access. The following image demonstrates full data-access permissions:

Dashboard data-access policy rule - Permissions tab, allow all example, allow all

Note
You can disable or enable, duplicate, or delete a rule, at any time, from the rule action menu ().

Rules Processing

The rules are processed for each data operation according to the order in which they appear in the dashboard. You can change the processing order, at any time, by changing the order of the data-access policy rules in the dashboard: you can change the order of the rules and rule groups within each container data-access layer; change the order of rules within each group; and change the order of the layers.

When a full match between the operation and a policy rule is found, the processing stops and the data accessibility is set according to the permissions of the first-matched rule. A match is identified by checking all components of the rule. All match-criteria rule sections are accumulative (“AND”) but the values in each section are alternative (“OR”), except where otherwise specified. See the examples for a better understanding.

Note
The platform’s default data-access policy is fully permissive: users with data-access permissions — i.e., users with a Data management policy for the parent container — aren’t restricted in their access, subject to the optional definition of POSIX rules. It’s therefore recommended that you safeguard your data by always defining a deny-all rule as the last data-access policy rule, as demonstrated in the examples.

Predefined Rules and Layers

The platform predefines the following data-access policy layers and rules for each data container, except where otherwise specified:

  • System layer — A system-administration layer that has the following predefined rule:

    • Backup — This rules grants the predefined “sys” backup user full data access, to support data backups. It’s recommended that you keep this rule as the first rule in your processing order.

      Predefined system-layer backup data-access policy rule
  • Monitoring layer — A monitoring-service layer that has the following predefined rules:

    • Monitoring — This rule is defined only for the predefined “users” container and grants the predefined “monitoring” user full data access to the monitoring directory, which is automatically created in the root directory of this container for use by the monitoring service.

      Predefined monitoring-layer monitoring data-access policy rule

    • No access — This rule denies the predefined “monitoring” user all data access. Note that on the “users” container, this rule must not precede the “Monitoring” rule, as the first rule takes precedence (see the rules processing order).

      Predefined monitoring-layer no-access data-access policy rule

Examples

The predefined data-access policy rules provide examples of granting and restricting data access for a specific user and/or resource (data directory). Following is a step-by-step example of adding your own custom data-access policy rules from the dashboard Data-Access Policy tab.

  1. Create a new “Default layer” layer: from the top action toolbar, select the drop-down arrow on the New Rule button and select the New Layer option from the menu. In the Create new layer dialog window, enter your selected layer name — “Default layer” for this example:

    Create new 'Default layer' data-access policy layer

    Keep the new layer after the predefined layers in the rules table (default).

  2. Define a custom “IT Logs” rule that grants members of the “it-admins” user group full permissions to access any log or document file in either the system/logs or it directories in the parent container:

    Note
    To define and test this rule, you need to create an “it-admins” group from the Identity | Groups dashboard tab, assign users to this group, and create the directories that are specified in the match criteria. Alternatively, you can change the match criteria to accommodate your environment and needs.

    1. From the top action toolbar, select the New Rule option. In the Create new rule dialog window, enter your selected rule name — “IT Logs” for this example.

      Create new 'IT Logs' rule

    2. Select the Users/Groups cell of the “IT Logs” rule in the rules table to display the Users tab in the rule pane on the right. In the Users/Groups input box, start typing “it-admins” and select this group from the list.

      'IT Logs' users

    3. Select the Resources tab in the “IT Logs” rule pane. In the Paths section, select the plus sign (), enter /system/logs in the input box, and select Apply. Repeat this step but this time enter the path /it.

      'IT Logs' resources

    4. In the Permissions tab, keep the default allow-all permissions.

  3. Define a custom “Deny All” rule that denies all data access, as recommended in the rule-processing section:

    1. Create a new rule in the “Default layer” layer and name it “Deny All”.

    2. Select the Permissions cell of the “Deny All” rule in the rules table. In the Permissions rule tab, select the Deny option from the permissions drop-down box and keep all permission check boxes checked to deny all data-access permissions.

      'Deny All' permissions

    Note
    • The deny-all rule must be the last rule in the data-access policy rules table; any rules that appear after it will be ignored. You can move the rule to another layer, if you wish.
    • You might want to disable this rule during the initial stages of your development and testing, as it blocks all data access that isn’t explicitly permitted in other (preceding) data-access policy rules.
  4. Select Apply Changes from the pending-changes toolbar to save your changes:

    Apply changes

You can now see your new layer and rules in the data-access policy rules table:

'Default layer' and rules in the data-access policy rules table

Data Categories

The following table lists the supported data categories, which can be used to define a resource for a data-access policy rule, and the file extensions that each category represents:

Resource Category File Extensions
Archives 7Z, ACE, AR, ARC, ARJ, B1, BAGIT, BZIP2, CABINET, CFS, COMPRESS, CPIO, CPT, DGCA, DMG, EGG, GZIP, ISO, KGB, LBR, LHA, LZIP, LZMA, LZOP, LZX, MPQ, PEA, RAR, RZIP, SHAR, SIT, SQ, SQX, TAR, TAR.GZ, UDA, WAD, XAR, XZ, Z, ZIP, ZIPX, ZOO, ZPAQ
Audio AIFF, AIFCDA, M4A, M4B, MID, MIDI, MP3, MPA, OGG, WAV, WMA, WPL
Data AVRO, CSV, DAT, DATA, JSON, MDB, ORC, PARQUET, RC, SAV, TSV, XML
Documents DOC, DOCX, KEY, ODT, ODP, PDF, PPS, PPT, PPTX, RTF, TEX, TXT, WKS, WPS, WPD, XLS, XLSX
Logs LOG
Pictures ANI, ANIM, APNG, ART, BMP, BPG, BSAVE, CAL, CIN, CPC, CPT, CUR, DDS, DPX, ECW, EXR, FITS, FLIC, FLIF, FPX, GIF, HDRI, HEVC, ICER, ICNS, ICO, ICS, ILBM, J2K, JBIG, JBIG2, JLS, JNG, JP2, JPEG, JPF, JPG, JPM, JPX, JXR, KRA, LOGLUV, MJ2, MNG, MIFF, NRRD, ORA, PAM, PBM, PCX, PGF, PGM, PICTOR, PPM, PNM, PNG, PSB, PSD, PSP, QTVR, RAS, RBE, SGI, TGA, TIF, TIFF, UFO, UFP, WBMP, WEBP, XBM, XCF, XPM, XR, XWD
Programs/Binaries BIN, CER, CFM, CGI, CLASS, COM, CPP, CSS, DLL, EXE, H, HTM, HTML, JAVA, JS, JSP, PART, PHP, PL, PY, RSS, SH, SWIFT, VB, XHTML
Software Packaging APK, DEB, EAR, JAR, JAVA, MSI, RAR, RPM, VCD, WAR
System Files BAK, CAB, CFG, CPL, CUR, DMP, DRV, ICN, INI, LNK, SYS, TMP
Video 3G2, 3GP, AVI, FLV, H264M4V, MKV, MOV, MP4, MPG, RM, SWF, VOB, WMV
Virtual-Machine (VM) Images NVRAM, VMDK, VMSD, VMSN, VMSS, VMTM, VMX, VMXF

See Also