Backup Using Gibby

On This Page

Overview

The Gibby utility allows you to back up all of your data that is stored on Iguazio's MLOps Platform. Gibby can run on an application node as a Kubernetes job, or as a standalone utility on a dedicated server.

Note
It is strongly recommended that you attach a stroage device greater than 2TB to the location of the Gibby backup utility. This avoids running out of storage space on your OS partition.

Gibby as a Kubernetes Job

The following section describes how to deploy and use Gibby as a Kubernetes job.

Backing Up

STEP 1 - Create a Persistent Volume and Claim

1.Create a directory named pv under /home/iguazio on k8s-node1 (first app node).

2.Use the following yaml to create a persistent volume on app node 1 by running kubectl apply -f <yaml name>: (edit the size and path as needed)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gibby-pv
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /tmp/pv
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - k8s-node1
  1. Use the following yaml to create a volume claim by running kubectl apply -f <yaml name>: (edit the size as needed)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gibby-pv-claim
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 99Gi

STEP 2 - Create Control Access Key and Data Access Key (In this step, run the commands on the data node)

  1. Run - http --session sys --auth sys:IgZBackup! post http://127.0.0.1:8001/api/sessions
  2. Run - echo '{"data": {"type": "access_key", "attributes": {"plane": "control"}}}' | http --session sys post http://127.0.0.1:8001/api/access_keys and from the output, save the value of the ID key (located above relationships) for future use.
  3. Run - echo '{"data": {"type": "access_key", "attributes": {"plane": "data"}}}' | http --session sys post http://127.0.0.1:8001/api/access_keys and from the output, save the value of the ID key (located above relationships) for future use.

STEP 3 - Run the backup

1.Fill in the below yaml with all the relevant details and start the backup by running kubectl apply -f <yaml name>:. Change path as needed, and fill in the --control-access-key and --data-access-key with the saved ID key from the previous step (items 2 and 3). Modify --control-plane-url and --data-plane-url based on your system paths. Verify that your URL starts with https://.

apiVersion: batch/v1
kind: Job
metadata:
  name: gibbybackupjob
# Optional: 
# namespace: <ANY NAMESPACE>
spec:
  template:
    spec:
      containers:
        - name: gibby
          image: gcr.io/iguazio/gibby:0.8.4
          volumeMounts:
            - mountPath: /tmp/bak
              name: backups-volume
          args:
            - "create"
            - "snapshot"
            - "--control-plane-url=<DASHBOARD URL (example -  https://dashboard.default-tenant.app.dev84.lab.iguazeng.com)>"
            - "--data-plane-url=<WEBAPI URL (example - https://webapi.default-tenant.app.dev84.lab.iguazeng.com)>"
            - "--control-access-key=<CONTROL ACCESS KEY EXTRACTED IN STEP 2>"
            - "--data-access-key=<DATA ACCESS KEY EXTRACTED IN STEP 2>"
            - "--backup-name=<BACKUP NAME>"
            - "--path=/tmp/bak"
# Optional: 
#         Comma separated list of containers to backup        
#         - "--containers=users,bigdata"
#         Split size threshold for backup files [MB]
#         - "--file-size-limit=512"
#         Max data size to include along with object attributes [Bytes] (<256KB) (v3.0.1+ only)
#         - "--object-scanner-max-included-data-size=131072"
#         Number of objects scanners replicas
#         - "--object-scanner-replicas=<N>"
#         Comma separated list of object types to skip during backup [objects,tiny_objects,streams]
#         - "--skip-object-types=tiny_objects,streams"
#         Enable/disable recovery after failure [enabled, disabled]
#         - "--recovery-mode=enabled"
#         Comma separated list of backuppers to run [v3io,platform_resources]
#         - "--backupppers=v3io,platform_resources"
#         List of platform resources to backup separated by semicolon and including URL query
#         - "--platform-resources=users?include=user_groups,primary_group,data_policy_layers,data_lifecycle_layers;user_groups;data_policy_groups;data_policy_layers;data_lifecycle_layers"
#         Outputs logger to a file
#         - "--logger-file-path=/backups/<LOG-FILENAME>"
#         Logger won't output with colors
#         - "--logger-no-color"
      restartPolicy: Never
      volumes:
        - name: backups-volume
          persistentVolumeClaim:
            claimName: <PVC NAME CREATED IN STEP 1>

Restore

Step 1 - Run Restore

1.Fill the below yaml with all the relevant details and start restore by running kubectl apply -f <yaml name>:

apiVersion: batch/v1
kind: Job
metadata:
  name: gibbyrestorejob
# Optional: 
# namespace: <ANY NAMESPACE>
spec:
  template:
    spec:
      containers:
        - name: gibby
          image: quay.io/iguazio/gibby:0.7.3
          volumeMounts:
            - mountPath: /tmp/bak
              name: backups-volume
          args:
            - "restore"
            - "backup"
            - "--control-plane-url=<DASHBOARD URL (example - https://dashboard.default-tenant.app.dev84.lab.iguazeng.com>)"
			- "--data-plane-url=<WEBAPI URL (example - https://webapi.default-tenant.app.dev84.lab.iguazeng.com)>"           
            - "--control-access-key=<CONTROL ACCESS KEY EXTRACTED IN STEP 2>"
            - "--data-access-key=<DATA ACCESS KEY EXTRACTED IN STEP 2>"
            - "--backup-name=<BACKUP NAME>"
            - "--path=/tmp/bak"
# Optional: 
#         Comma separated list of containers to restore        
#         - "--containers=users,bigdata"
#         Comma separated list of containers to restore under different name
#         - "--target-containers=original_container_name1:target_container_name1,original_container_name2:target_container_name2"
#         Specific snapshot to restore
#         - "--snapshot-id=<snapshotID>"
#         Number of objects restorers replicas
#         - "--object-restorers-replicas=<N>"
#         Comma separated list of object types to skip during restore [objects,tiny_objects,streams]
#         - "--skip-object-types=tiny_objects,streams"
#         Enable/disable recovery after failure [enabled, disabled]
#         - "--recovery-mode=enabled"
#         Outputs logger to a file
#         - "--logger-file-path=/backups/<LOG-FILENAME>"
#         Logger won't output with colors
#         - "--logger-no-color"
      restartPolicy: Never
      volumes:
        - name: backups-volume
          persistentVolumeClaim:
            claimName: <PVC NAME CREATED IN STEP 1>

Gibby as a Stand-Alone Utility

Gibby can be run as a standalone backup/restore utility.

Backup

  1. Contact Iguazio customer support for the latest binary package of Gibby.
  2. Run the following command adding your values for the data-access-key and control-access-key fields.
./gibctl-<version>-linux-amd64 create snapshot \
--data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
--data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
--backup-name <BACKUP NAME> --path <PATH TO GIBBY BACKUP LOCATION>

Example

--control-plane-url https://dashboard.default-tenant.app.satsdl.satsnet.com.sg --data-access-key 40b59ba2-0e59-4ab7-8843-1bcdb1fb79ef 
--control-access-key 1eec7eaa-9064-4699-8c9b-b2327943b0ae --backup-name efi-test --path /home/iguazio/backup

Optional Backup Run Arguments

Use the following optional arguments (options) when running your backup.

Options Description
-c, --containers=users,bigdata Comma separated list of containers to backup
--skip-object-types=tiny_objects,streamsComma separated list of object types to skip during backup [objects,tiny_objects,streams]
--recovery-mode=enabledEnable/disable recovery after failure [enabled, disabled]
--backupppers=v3io,platform_resourcesComma separated list of backuppers to run [v3io,platform_resources]
--platform-resources=users?include=user_groups,
primary_group,data_policy_layers,data_lifecycle_layers;
user_groups;data_policy_groups;data_policy_layers;
data_lifecycle_layers
List of platform resources to backup separated by semicolon and including URL query
--attribute-scattering=enabledEnable/disable attribute scattering [enabled/disabled]
--available-storage-space-margin=25Additional storage space overhead required for backup [percentage]. Set to 0 to disable storage space validation
--backup-config-spec=Path to yaml file with backup config spec. Yaml parameters will override command line arguments
--v3io-retries=5Number of retries in case of v3io request failures
--v3io-retry-interval=3Interval between v3io retries [sec]
--request-timeout=60REST request timeout (against v3io and internal REST servers) [sec]
--file-size-limit=512Split size threshold for backup (pak) files [MB]
--object-scanner-task-listen-addr=:48541Listening port for object task server
--report-max-failed-items-limit=50Maximum number of failed item details to include in the restore report
--report-server-listen-addr string=:58541Listening port for report server
--logger-file-path=/backups/Outputs logger to a file
--logger-no-colorLogger won't output with colors
--object-scanner-max-included-data-size=131072Max data size to include along with object attributes [Bytes] (<256KB) (v3.0.1+ only)
--object-scanner-replicas=Number of objects scanners replicas. By will be equal to number of VNs or according to available memory size
--object-scanner-max-concurrent-writes=Maximum number of concurrent object body write tasks per each object scanner (object body throttling). Set to negative number for unlimited
--object-scanner-max-pending-object-bodies=500 Max number of pending object bodies write tasks per each object scanner
--object-scanner-max-failed-object-bodies=Maximum number of failed object bodies before backup abort is triggered
--object-scanner-max-pending-object-attributes=250Max number of object attributes pending for packing per each object scanner
--object-writer-ack-retries=5Number of retries in case of object writer ACK request failures against object task server
--object-writer-ack-retry-interval=3Interval between object writer ACK retries against object task server [sec]
--object-writer-replicas=24Number of replicas to perform object writes
--tree-scanner-container-content-max-nodes-limit=1000Max number of tree nodes in get container contents response
--tree-scanner-max-pending-tree-nodes=500Max number of pending tree node indexing task
--tree-scanner-num-of-tree-node-directory-scanners=3Number of tree node directory fetchers per each tree node scanner
--tree-scanner-replicas=3Number of replicas to perform tree node scanners
--max-unused-buffers-pool-size=36Maximum number of unused buffers in buffer pool
--packing-buffer-size=10The size of buffers used for packing/unpacking objects [MB]
--profiling=disabledEnable/disable profiling
--profiling-port=6060Port for profiling (pprof) web server

Restore

Use the following command to restore your backup.

./gibctl-<version>-linux-amd64 restore backup \
--data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
--data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
--backup-name <BACKUP NAME> --path <PATH TO GIBBY BACKUP LOCATION>

Example

https://dashboard.default-tenant.app.satsdl.satsnet.com.sg --data-access-key 40b59ba2-0e59-4ab7-8843-1bcdb1fb79ef --control-access-key 
1eec7eaa-9064-4699-8c9b-b2327943b0ae --backup-name backup-test --path /home/iguazio/backup

Optional Restore Run Arguments Use the following optional arguments (options) when running your restore.

Options Description
--containers=users,bigdataComma separated list of containers to restore
--target-containers=original_container_name1:
target_container_name1,original_container_name2:
target_container_name2:
Comma separated list of containers to restore under different name
--snapshot-id=Specific snapshot to restore
--skip-object-types=tiny_objects,streamsComma separated list of object types to skip during restore [objects,tiny_objects,streams]
--recovery-mode=enabledEnable/disable recovery after failure [enabled, disabled]
--scattering-attributes-limit=1700000Attributes will be split into chunks during restore if their total size is above the limit
--backup-config-spec=Path to yaml file with backup config spec. Yaml parameters will override command line arguments
--v3io-retries=5Number of retries in case of v3io request failures
--v3io-retry-interval=3Interval between v3io retries [sec]
--request-timeout=60REST request timeout (against v3io and internal REST servers) [sec]
--report-max-failed-items-limit=50Maximum number of failed item details to include in the restore report
--report-server-listen-addr string=:58541Listening port for report server
--logger-file-path=/backups/Outputs logger to a file
--logger-no-colorLogger won't output with colors
--object-restorers-replicas=Number of objects restorers replicas
--object-restorer-num-of-object-writers=1Number of object writers per each object restorer
--tree-restorer-replicas=3Number of replicas to perform tree node restore
--tree-restorer-max-pending-tree-node-writes=100Max number of pending tree node write tasks
--max-unused-buffers-pool-size=36Maximum number of unused buffers in buffer pool
--packing-buffer-size=10The size of buffers used for packing/unpacking objects [MB]
--profiling=disabledEnable/disable profiling
--profiling-port=6060Port for profiling (pprof) web server
Note
If the cluster that the backup and restore are being ran on, does not have a valid certificate, a --verify false flag must be added to the backup and restore yaml files.