Backing Up the Platform

On This Page

Overview

Caution
To ensure safety of your MLOps Platform data, you must periodically back up your data and configuration.

The Tool used is named Bakpak (invoked as bkp).
It collects V3IO data, platform configuration (e.g. users, projects, services), MLRun DB, Pipelines DB, and Kubernetes entities.

Prerequisites

  • An available storage device. Cloud services (such as EFS for AWS, Filestore for GCP, etc.) are preferred.
  • A running MLOps Platform.

Standard Setup Example

Recommendation
To optimize network and reduce data transfer costs, provision your storage instance in the same region as your MLOps Platform.
  1. Provision storage and allow access (Network, Security groups, Access points, etc.) to your K8S Application nodes.
    Storage must support K8S ReadWriteMany access mode, thus, for example, AWS EBS is not recommended.
    Preferred services:
  2. In the platform dashboard, activate the backup user (sys by default) as described here.
  3. In your K8S Application Cluster:
    • Create the backup namespace (iguazio-backup by default).

    • In that namespace, create a secret (api-credentials-system by default):

      USERNAME: "sys"
      PASSWORD: "PASSWORD"
      
    • In that namespace, create a configmap user-settings with the minimum required settings:

      SYSTEM_USER_SECRET: "api-credentials-system"  # The secret name you created above
      SCHEDULE: "0 0 * * *"                         # Backup schedule. Default is daily at midnight
      RETENTION_PERIOD: "3 days"                    # How long to keep each backup instance. Default is 3 days
      STORAGE_PROVIDER: "some-provider"             # aws / gcp / azure ...
      CSI_DRIVER: "some-driver"                     # efs.csi.aws.com / filestore.csi.storage.gke.io ...
                                                    # Must be mapped in kubernetes-csi.github.io/docs/drivers.html 
                                                    # Must be supported by your k8s setup
      REGION: "some-region"                         # us-east-1 / eu-north-1 ...
                                                    # Region of your storage device if applicable
      VOLUME_HANDLE: "fs-1234567890"                # Your filesystem id
      
    • [Optional] Browse Bakpak manifest and modify the configmap to customize as needed.
      For example, if you set your own PVC (in the backup namespace) with an attached NFS and want to use it instead, add:

      SKIP_PERSISTENT_VOLUME_CREATION: "True"
      PERSISTENT_VOLUME_CLAIM_NAME: "YOUR_PVC_NAME"
      
  4. Contact Iguazio support so they can configure an initial (one-time) setup. Then the ClusterBackup events are visible in the UI Events page (event icon).

Bakpak Manifest

apiVersion: 0.1.0
kind: bakpakManifest
metadata:
  name: "template-bakpak-manifest"
spec:
  description: "CLI for High Level Operations on Iguazio tools like Manof, Gibby etc"
  apiCredentials:                                 # On Restore, credentials may differ between source and target
    system:
      username: "USERNAME"
      password: "PASSWORD"
#########################################################################################################
#  backupRootFolder: "/full/path/to/backup/root"  # MANDATORY! Please validate r+w permission
#########################################################################################################
#  kubeconfig: "~/.kube/config"
#  nuctlPath: "/home/iguazio/IGZ_VERSION/platform/"
#  staticServeHelmChartsPath: "/home/iguazio/IGZ_VERSION/platform/static_serve/helm/v3io-stable"
#  igzPlatformPathsRethinkdbDataMount: '/mnt/platform/rethinkdb'
#  gibctlPath: "/full/path/to/gibby/executable"   # override bundle with absolute path to run gibby as standalone option
#  gibbyImage: "gcr.io/iguazio/gibby:0.8.31"      # image to run gibby as a kubernetes job
#  gibbyJobMountPath: "/full/path/to/mount"       # auto-creation permissions vary. For example on EFS it's enabled by
                                                  # default, but on dedicated bare-metal nodes not. Please validate r+w
#  gibbyRestartPolicy: "OnFailure"                # kubernetes job restart policy
#  namespace: "gibby-backup"
#  gibbyTimeout: "23 hours"                       # Kill the data backup process on timeout
#  skipPersistentVolumeCreation: None             # Set to True when using a dynamic volume provisioner
#  persistentVolumeClaimDesiredState: "Bound"     # adjust according to your dynamic volume provisioner
#  persistentVolumeName: started-TIMESTAMP-gibby  # manual override for persistent volume
#  persistentVolumeClaimName: ^^^^^^^^^^^^^^^^^^  # manual override for persistent volume claim
#  accessModes:
#    - "ReadWriteMany"                            # For storage devices able to bind to many hosts, such as EFS
#    - "ReadWriteOnce"                            # For storage devices able to only bind to one at a time, such as EBS
#  reclaimPolicy: "Delete"                        # persistent volume reclaim policy for kubernetes Gibby job
#  storageClass: "gibby-backup"                   # doesn't exist by default, please roll your own
#  nodeNames:                                     # Specify app nodes for Gibby job, preferably worker nodes
#    - "k8s-node1"
#  dataNodeIp: "127.0.0.1"                        # specify if running from outside the data node
#  igzVersion: "/home/iguazio/igz/version.txt)"   # iguazio system version. Set to literal value to amend
#  nuctlDefaultServiceType: "NodePort"            # default service type for nuctl commands
#  linkLatestPrefix: "latest-"                    # latest backup instance symlink for a given preset
#  preset: "default"                              # which components to backup. Run `bkp backup presets` for more info
  backupSpec:
#    mode: "dry-run"                              # Run mode. "normal" will execute
#    sendEvents: True                             # Send events to the platform
#    components:                                  # Or specify the components and execution order yourself
#      - "rethinkdb"
#      - "nuclio"
#      - "mlrundb"
#      - ...
#    cacheUpperBound: "100GiB"                    # Estimated size of system cache
#    standalone: False                            # Run Gibby as a standalone executable, rather than as a k8s job
#    retentionPeriod: "3 days"                    # Retention policy for backup instances
#    rotateOnBackup: False                        # Rotate according to retentionPeriod above
#    archive: False                               # Archive backup to a tar.gz file
#    compressLevel: 5                             # 1 is fastest, 9 is best compression
#    deleteSource: True                           # Delete backup instance after archiving
#    instanceFolderPrefix: "started-"             # Prefix for backup instance folder, archive and rotation
#    diskRequiredMultiplier: 2                    # Multiplier for backup disk requirements (for compression overhead)
#    linkLatest: True                             # Create a symlink to the latest backup instance
#    gibbyCommands:
#      - "create"
#      - "snapshot"
#    gibbyOptions:
#      --logger-no-color: ""  # binary flags pass as dict keys with empty values
#      --backup-name: "gibby-backup"
#      --data-plane-url: "https://webapi.default-tenant.app.your-system.iguazeng.com"
#      --control-plane-url: "https://dashboard.default-tenant.app.your-system.iguazeng.com"
#      --data-access-key: "DATA_SESSION_ID"
#      --control-access-key: "CONTROL_SESSION_ID"
#      --path: "FROM_BACKUP_FOLDER"
#      --logger-file-path: "FROM_BACKUP_FOLDER/gibby.log"
#      --log-level: "FROM_CLI_FLAG"
#      --backup-config-spec: "INLINE-JSON-CONFIG"
#  checkAppServices: True  # If True, will check for overall app services state == ready
#  kubeconfig: "~/.kube/config"  # Hardcoded in Python k8s lib. Set the KUBECONFIG os environment variable to override
#  nuctlPath: "/home/iguazio/IGZ_VERSION/platform/"
#  staticServeHelmChartsPath: "/home/iguazio/IGZ_VERSION/platform/static_serve/helm/v3io-stable"
#  gibctlPath: "/full/path/to/gibby/executable"  # override bundle with absolute path to run gibby as standalone option
#  gibbyImage: "gcr.io/iguazio/gibby:0.8.31"  # image to run gibby as a kubernetes job
  preamble:
    banner: |-
      echo "Welcome to Bakpak"
      date
      pwd
#    customPreamble1: write your own preamble
#    customPreamble2: beware of YAML compatability with your shell script syntax
#  postamble:
#    customPostamble1: echo "Thank you for using Bakpak"
#    customPostamble1: will run after the main components
  description: "CLI for High Level Operations on Iguazio tools like Manof, Gibby etc"