Backup Using Gibby

On This Page

Overview

The Gibby utility allows you to back up all of your data that is stored on Iguazio's MLOps Platform. Gibby can run on an application node as a Kubernetes job, or as a standalone utility on a dedicated server.

Note
It is strongly recommended that you attach a storage device greater than 2TB to the location of the Gibby backup utility. This prevents running out of storage space on your OS partition.

Gibby as a Kubernetes Job (preferred method)

This section describes how to deploy and use Gibby as a Kubernetes job.

Note
If the cluster that the backup and restore are being run on does not have a valid certificate, a --verify false flag must be added to the backup and restore yaml files.

Backing Up

STEP 1 - Create a Persistent Volume and Claim

  1. Create a directory named pv under /home/iguazio on k8s-node1 (first app node).

  2. Use the following yaml to create a persistent volume on app node 1 by running (editing the size and path as needed)
    kubectl apply -f <yaml name>:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gibby-pv
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /tmp/pv
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - k8s-node1
  1. Use the following yaml to create a volume claim by running (edit the size as needed) kubectl apply -f <yaml name>:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gibby-pv-claim
spec:
  storageClassName: local-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 99Gi

STEP 2 - Create Control Access Key and Data Access Key

Run these commands on the data node.

Note
By default, the sys user is disabled, and has a random password (that was be generated during tenant creation). Enable the user and update its PW before starting this procedure.

  1. Run
    http --session sys --auth sys:<sys PW> post http://127.0.0.1:8001/api/sessions
  2. Run
    echo '{"data": {"type": "access_key", "attributes": {"plane": "control"}}}' | http --session sys post http://127.0.0.1:8001/api/access_keys
    and from the output, save the value of the ID key (located above relationships) for future use.
  3. Run
    echo '{"data": {"type": "access_key", "attributes": {"plane": "data"}}}' | http --session sys post http://127.0.0.1:8001/api/access_keys
    and from the output, save the value of the ID key (located above relationships) for future use.

STEP 3 - Run the backup

  1. Fill in the following yaml with all the relevant details and start the backup by running
    kubectl apply -f <yaml name>:
    Change the path as needed, and fill in the --control-access-key and --data-access-key with the saved ID key from the previous step (items 2 and 3).
    Modify --control-plane-url and --data-plane-url based on your system paths. Verify that your URL starts with https://.
apiVersion: batch/v1
kind: Job
metadata:
  name: gibbybackupjob
# Optional: 
# namespace: <ANY NAMESPACE>
spec:
  template:
    spec:
      containers:
        - name: gibby
          image: gcr.io/iguazio/gibby:0.8.4
          volumeMounts:
            - mountPath: /tmp/bak
              name: backups-volume
          args:
            - "create"
            - "snapshot"
            - "--control-plane-url=<DASHBOARD URL (example -  https://dashboard.default-tenant.app.dev84.lab.iguazeng.com)>"
            - "--data-plane-url=<WEBAPI URL (example - https://webapi.default-tenant.app.dev84.lab.iguazeng.com)>"
            - "--control-access-key=<CONTROL ACCESS KEY EXTRACTED IN STEP 2>"
            - "--data-access-key=<DATA ACCESS KEY EXTRACTED IN STEP 2>"
            - "--backup-name=<BACKUP NAME>"
            - "--path=/tmp/bak"
# Optional: 
#         Comma separated list of containers to backup        
#         - "--containers=users,bigdata"
#         Split size threshold for backup files [MB]
#         - "--file-size-limit=512"
#         Max data size to include along with object attributes [Bytes] (<256KB) (v3.0.1+ only)
#         - "--object-scanner-max-included-data-size=131072"
#         Number of objects scanners replicas
#         - "--object-scanner-replicas=<N>"
#         Comma separated list of object types to skip during backup [objects,tiny_objects,streams]
#         - "--skip-object-types=tiny_objects,streams"
#         Enable/disable recovery after failure [enabled, disabled]
#         - "--recovery-mode=enabled"
#         Comma separated list of backuppers to run [v3io,platform_resources]
#         - "--backupppers=v3io,platform_resources"
#         List of platform resources to backup separated by semicolon and including URL query
#         - "--platform-resources=users?include=user_groups,primary_group,data_policy_layers,data_lifecycle_layers;user_groups;data_policy_groups;data_policy_layers;data_lifecycle_layers"
#         Outputs logger to a file
#         - "--logger-file-path=/backups/<LOG-FILENAME>"
#         Logger won't output with colors
#         - "--logger-no-color"
      restartPolicy: Never
      volumes:
        - name: backups-volume
          persistentVolumeClaim:
            claimName: <PVC NAME CREATED IN STEP 1>

Restore

Step 1 - Run Restore

  1. Fill the following yaml with all the relevant details and start the restore by running
    kubectl apply -f <yaml name>:
apiVersion: batch/v1
kind: Job
metadata:
  name: gibbyrestorejob
# Optional: 
# namespace: <ANY NAMESPACE>
spec:
  template:
    spec:
      containers:
        - name: gibby
          image: quay.io/iguazio/gibby:0.7.3
          volumeMounts:
            - mountPath: /tmp/bak
              name: backups-volume
          args:
            - "restore"
            - "backup"
            - "--control-plane-url=<DASHBOARD URL (example - https://dashboard.default-tenant.app.dev84.lab.iguazeng.com>)"
			- "--data-plane-url=<WEBAPI URL (example - https://webapi.default-tenant.app.dev84.lab.iguazeng.com)>"           
            - "--control-access-key=<CONTROL ACCESS KEY EXTRACTED IN STEP 2>"
            - "--data-access-key=<DATA ACCESS KEY EXTRACTED IN STEP 2>"
            - "--backup-name=<BACKUP NAME>"
            - "--path=/tmp/bak"
# Optional: 
#         Comma separated list of containers to restore        
#         - "--containers=users,bigdata"
#         Comma separated list of containers to restore under different name
#         - "--target-containers=original_container_name1:target_container_name1,original_container_name2:target_container_name2"
#         Specific snapshot to restore
#         - "--snapshot-id=<snapshotID>"
#         Number of objects restorers replicas
#         - "--object-restorers-replicas=<N>"
#         Comma separated list of object types to skip during restore [objects,tiny_objects,streams]
#         - "--skip-object-types=tiny_objects,streams"
#         Enable/disable recovery after failure [enabled, disabled]
#         - "--recovery-mode=enabled"
#         Outputs logger to a file
#         - "--logger-file-path=/backups/<LOG-FILENAME>"
#         Logger won't output with colors
#         - "--logger-no-color"
      restartPolicy: Never
      volumes:
        - name: backups-volume
          persistentVolumeClaim:
            claimName: <PVC NAME CREATED IN STEP 1>

Gibby as a Stand-Alone Utility

Gibby can be run as a standalone backup/restore utility.

Backup

  1. Contact Iguazio Support for the latest binary package of Gibby.
  2. Run the following command adding your values for the data-access-key and control-access-key fields.
./gibctl-<version>-linux-amd64 create snapshot \
--data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
--data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
--backup-name <BACKUP NAME> --path <PATH TO GIBBY BACKUP LOCATION>

Example

--control-plane-url https://dashboard.default-tenant.app.satsdl.satsnet.com.sg --data-access-key 40b59ba2-0e59-4ab7-8843-1bcdb1fb79ef 
--control-access-key 1eec7eaa-9064-4699-8c9b-b2327943b0ae --backup-name efi-test --path /home/iguazio/backup

Restore

Run the following command, substituting your values for the data-access-key and control-access-key fields.

./gibctl-<version>-linux-amd64 restore backup \
--data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
--data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
--backup-name <BACKUP NAME> --path <PATH TO GIBBY BACKUP LOCATION>

Example

https://dashboard.default-tenant.app.satsdl.satsnet.com.sg --data-access-key 40b59ba2-0e59-4ab7-8843-1bcdb1fb79ef --control-access-key 
1eec7eaa-9064-4699-8c9b-b2327943b0ae --backup-name backup-test --path /home/iguazio/backup

Gibby as a Docker Image

Backup

Run the following command, substituting your values for the data-access-key and control-access-key fields.

docker run --rm -v <PATH TO GIBBY BACKUP LOCATION>:/gibby_backup gcr.io/iguazio/gibby:0.7.10 \ 
create snapshot \
 --data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
 --data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
 --path /gibby_backup \
 --backup-name <BACKUP NAME>

Restore

Run the following command, substituting your values for the data-access-key and control-access-key fields.

docker run --rm -v <PATH TO GIBBY BACKUP LOCATION>:/gibby_backup gcr.io/iguazio/gibby:0.7.10 \ 
restore backup \
 --data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
 --data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
 --path /gibby_backup \
 --backup-name <BACKUP NAME>

Optional Backup Run Arguments

Use the following optional arguments (options) when running your backup.

Options Description
-c, --containers=users,bigdata Comma separated list of containers to backup
--skip-object-types=tiny_objects,streamsComma separated list of object types to skip during backup [objects,tiny_objects,streams]
--recovery-mode=enabledEnable/disable recovery after failure [enabled, disabled]
--backupppers=v3io,platform_resourcesComma separated list of backuppers to run [v3io,platform_resources]
--platform-resources=users?include=user_groups,
primary_group,data_policy_layers,data_lifecycle_layers;
user_groups;data_policy_groups;data_policy_layers;
data_lifecycle_layers
List of platform resources to backup separated by semicolon and including URL query
--attribute-scattering=enabledEnable/disable attribute scattering [enabled/disabled]
--available-storage-space-margin=25Additional storage space overhead required for backup [percentage]. Set to 0 to disable storage space validation
--backup-config-spec=Path to yaml file with backup config spec. Yaml parameters override command line arguments
--v3io-retries=5Number of retries in case of v3io request failures
--v3io-retry-interval=3Interval between v3io retries [sec]
--request-timeout=60REST request timeout (against v3io and internal REST servers) [sec]
--file-size-limit=512Split size threshold for backup (pak) files [MB]
--object-scanner-task-listen-addr=:48541Listening port for object task server
--report-max-failed-items-limit=50Maximum number of failed item details to include in the restore report
--report-server-listen-addr string=:58541Listening port for report server
--logger-file-path=/backups/Outputs logger to a file
--logger-no-colorLogger won't output with colors
--object-scanner-max-included-data-size=131072Max data size to include along with object attributes [Bytes] (<256KB) (v3.0.1+ only)
--object-scanner-replicas=Number of objects scanners replicas. By default, is equal to number of VNs or according to available memory size
--object-scanner-max-concurrent-writes=Maximum number of concurrent object body write tasks per each object scanner (object body throttling). Set to negative number for unlimited
--object-scanner-max-pending-object-bodies=500Max number of pending object bodies write tasks per each object scanner
--object-scanner-max-pending-object-attributes=250Max number of object attributes pending for packing per each object scanner
--object-scanner-choke-get-items=0Duration of object-scanner sleep (in mili-seconds) before sending each get-items request
--object-writer-ack-retries=5Number of retries in case of object writer ACK request failures against object task server
--object-writer-ack-retry-interval=3Interval between object writer ACK retries against object task server [sec]
--object-writer-replicas=24Number of replicas to perform object writes
--tree-scanner-container-content-max-nodes-limit=1000Max number of tree nodes in get container contents response
--tree-scanner-max-pending-tree-nodes=500Max number of pending tree node indexing task
--tree-scanner-num-of-tree-node-directory-scanners=3Number of tree node directory fetchers per each tree node scanner
--tree-scanner-replicas=3Number of replicas to perform tree node scanners
--max-unused-buffers-pool-size=36Maximum number of unused buffers in buffer pool
--packing-buffer-size=10The size of buffers used for packing/unpacking objects [MB]
--profiling=disabledEnable/disable profiling
--profiling-port=6060Port for profiling (pprof) web server
--backup-config-spec=Path to .yaml file that specifies the backup configuration. This flag can be specified multiple times: Gibby merges the .yaml files. You could use, for example, one config file with control/data access-keys and another config file with the select configuration to exclude and/or include directories/subtrees.

Example: Configuration file using the select option

This example show how to filter files during creation of the snapshot.

spec:    
  select:
    - container: container_a                   # container name
      # object will be part of the snapshot only if it matches one or more of the include patterns and not matches any of the exclude
      # patterns. no include patterns means include all.  
      # backup the subtree /A/B (anything not under /A/B will not be backed-up). 
      # exclude the directory /A/B/C/D and the subtree /A/B/C/E from the backup. 
      spec:
        - kind: include-subtree                   # include-dir/exclude-dir/include-subtree/exclude-subtree
          path: "/A/B"
        - kind: exclude-dir
          path: "/A/B/C/D"                        # notice that /A/B/C/D/F will be included ("exclude-dir" kind)
        - kind: exclude-subtree
          path: "/A/B/C/E"                        # notice that /A/B/C/E/F will nt be included ("exclude-subtree" kind)
    - container: container_b
       .
       .
       .

Optional Restore Run Arguments

Use the following optional arguments (options) when running your restore.

Options Description
--containers=users,bigdataComma separated list of containers to restore
--target-containers=original_container_name1:
target_container_name1,original_container_name2:
target_container_name2:
Comma separated list of containers to restore under different name
--snapshot-id=Specific snapshot to restore
--skip-object-types=tiny_objects,streamsComma separated list of object types to skip during restore [objects,tiny_objects,streams]
--recovery-mode=enabledEnable/disable recovery after failure [enabled, disabled]
--scattering-attributes-limit=1700000Attributes are split into chunks during restore if their total size is above the limit
--backup-config-spec=Path to yaml file with backup config spec. Yaml parameters override command line arguments
--v3io-retries=5Number of retries in case of v3io request failures
--v3io-retry-interval=3Interval between v3io retries [sec]
--request-timeout=60REST request timeout (against v3io and internal REST servers) [sec]
--report-max-failed-items-limit=50Maximum number of failed item details to include in the restore report
--report-server-listen-addr string=:58541Listening port for report server
--logger-file-path=/backups/Outputs logger to a file
--logger-no-colorLogger won't output with colors
--object-restorers-replicas=Number of objects restorers replicas
--object-restorer-num-of-object-writers=1Number of object writers per each object restorer
--tree-restorer-replicas=3Number of replicas to perform tree node restore
--tree-restorer-max-pending-tree-node-writes=100Max number of pending tree node write tasks
--max-unused-buffers-pool-size=36Maximum number of unused buffers in buffer pool
--packing-buffer-size=10The size of buffers used for packing/unpacking objects [MB]
--profiling=disabledEnable/disable profiling
--profiling-port=6060Port for profiling (pprof) web server