Backup Using Gibby
Overview
The Gibby utility allows you to back up all of your data that is stored on Iguazio's MLOps Platform. Gibby can run on an application node as a Kubernetes job, or as a standalone utility on a dedicated server.
Gibby as a Kubernetes Job
The following section describes how to deploy and use Gibby as a Kubernetes job.
Backing Up
STEP 1 - Create a Persistent Volume and Claim
1.Create a directory named pv
under /home/iguazio
on k8s-node1
(first app node).
2.Use the following yaml to create a persistent volume on app node 1 by running kubectl apply -f <yaml name>:
(edit the size and path as needed)
apiVersion: v1
kind: PersistentVolume
metadata:
name: gibby-pv
spec:
capacity:
storage: 100Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /tmp/pv
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-node1
- Use the following yaml to create a volume claim by running
kubectl apply -f <yaml name>:
(edit the size as needed)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gibby-pv-claim
spec:
storageClassName: local-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 99Gi
STEP 2 - Create Control Access Key and Data Access Key (In this step, run the commands on the data node)
- Run -
http --session sys --auth sys:IgZBackup! post http://127.0.0.1:8001/api/sessions
- Run -
echo '{"data": {"type": "access_key", "attributes": {"plane": "control"}}}' | http --session sys post http://127.0.0.1:8001/api/access_keys
and from the output, save the value of theID
key (located aboverelationships
) for future use. - Run -
echo '{"data": {"type": "access_key", "attributes": {"plane": "data"}}}' | http --session sys post http://127.0.0.1:8001/api/access_keys
and from the output, save the value of theID
key (located aboverelationships
) for future use.
STEP 3 - Run the backup
1.Fill in the below yaml with all the relevant details and start the backup by running kubectl apply -f <yaml name>:
. Change path as needed, and fill
in the --control-access-key
and --data-access-key
with the saved ID
key from the previous step (items 2 and 3). Modify --control-plane-url
and --data-plane-url
based on your system paths. Verify that your URL starts with https://
.
apiVersion: batch/v1
kind: Job
metadata:
name: gibbybackupjob
# Optional:
# namespace: <ANY NAMESPACE>
spec:
template:
spec:
containers:
- name: gibby
image: gcr.io/iguazio/gibby:0.8.4
volumeMounts:
- mountPath: /tmp/bak
name: backups-volume
args:
- "create"
- "snapshot"
- "--control-plane-url=<DASHBOARD URL (example - https://dashboard.default-tenant.app.dev84.lab.iguazeng.com)>"
- "--data-plane-url=<WEBAPI URL (example - https://webapi.default-tenant.app.dev84.lab.iguazeng.com)>"
- "--control-access-key=<CONTROL ACCESS KEY EXTRACTED IN STEP 2>"
- "--data-access-key=<DATA ACCESS KEY EXTRACTED IN STEP 2>"
- "--backup-name=<BACKUP NAME>"
- "--path=/tmp/bak"
# Optional:
# Comma separated list of containers to backup
# - "--containers=users,bigdata"
# Split size threshold for backup files [MB]
# - "--file-size-limit=512"
# Max data size to include along with object attributes [Bytes] (<256KB) (v3.0.1+ only)
# - "--object-scanner-max-included-data-size=131072"
# Number of objects scanners replicas
# - "--object-scanner-replicas=<N>"
# Comma separated list of object types to skip during backup [objects,tiny_objects,streams]
# - "--skip-object-types=tiny_objects,streams"
# Enable/disable recovery after failure [enabled, disabled]
# - "--recovery-mode=enabled"
# Comma separated list of backuppers to run [v3io,platform_resources]
# - "--backupppers=v3io,platform_resources"
# List of platform resources to backup separated by semicolon and including URL query
# - "--platform-resources=users?include=user_groups,primary_group,data_policy_layers,data_lifecycle_layers;user_groups;data_policy_groups;data_policy_layers;data_lifecycle_layers"
# Outputs logger to a file
# - "--logger-file-path=/backups/<LOG-FILENAME>"
# Logger won't output with colors
# - "--logger-no-color"
restartPolicy: Never
volumes:
- name: backups-volume
persistentVolumeClaim:
claimName: <PVC NAME CREATED IN STEP 1>
Restore
Step 1 - Run Restore
1.Fill the below yaml with all the relevant details and start restore by running kubectl apply -f <yaml name>:
apiVersion: batch/v1
kind: Job
metadata:
name: gibbyrestorejob
# Optional:
# namespace: <ANY NAMESPACE>
spec:
template:
spec:
containers:
- name: gibby
image: quay.io/iguazio/gibby:0.7.3
volumeMounts:
- mountPath: /tmp/bak
name: backups-volume
args:
- "restore"
- "backup"
- "--control-plane-url=<DASHBOARD URL (example - https://dashboard.default-tenant.app.dev84.lab.iguazeng.com>)"
- "--data-plane-url=<WEBAPI URL (example - https://webapi.default-tenant.app.dev84.lab.iguazeng.com)>"
- "--control-access-key=<CONTROL ACCESS KEY EXTRACTED IN STEP 2>"
- "--data-access-key=<DATA ACCESS KEY EXTRACTED IN STEP 2>"
- "--backup-name=<BACKUP NAME>"
- "--path=/tmp/bak"
# Optional:
# Comma separated list of containers to restore
# - "--containers=users,bigdata"
# Comma separated list of containers to restore under different name
# - "--target-containers=original_container_name1:target_container_name1,original_container_name2:target_container_name2"
# Specific snapshot to restore
# - "--snapshot-id=<snapshotID>"
# Number of objects restorers replicas
# - "--object-restorers-replicas=<N>"
# Comma separated list of object types to skip during restore [objects,tiny_objects,streams]
# - "--skip-object-types=tiny_objects,streams"
# Enable/disable recovery after failure [enabled, disabled]
# - "--recovery-mode=enabled"
# Outputs logger to a file
# - "--logger-file-path=/backups/<LOG-FILENAME>"
# Logger won't output with colors
# - "--logger-no-color"
restartPolicy: Never
volumes:
- name: backups-volume
persistentVolumeClaim:
claimName: <PVC NAME CREATED IN STEP 1>
Gibby as a Stand-Alone Utility
Gibby can be run as a standalone backup/restore utility.
Backup
- Contact Iguazio customer support for the latest binary package of Gibby.
- Run the following command adding your values for the
data-access-key
andcontrol-access-key
fields.
./gibctl-<version>-linux-amd64 create snapshot \
--data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
--data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
--backup-name <BACKUP NAME> --path <PATH TO GIBBY BACKUP LOCATION>
Example
--control-plane-url https://dashboard.default-tenant.app.satsdl.satsnet.com.sg --data-access-key 40b59ba2-0e59-4ab7-8843-1bcdb1fb79ef
--control-access-key 1eec7eaa-9064-4699-8c9b-b2327943b0ae --backup-name efi-test --path /home/iguazio/backup
Optional Backup Run Arguments
Use the following optional arguments (options) when running your backup.
Options | Description |
---|---|
-c, --containers=users,bigdata | Comma separated list of containers to backup |
--skip-object-types=tiny_objects,streams | Comma separated list of object types to skip during backup [objects,tiny_objects,streams] |
--recovery-mode=enabled | Enable/disable recovery after failure [enabled, disabled] |
--backupppers=v3io,platform_resources | Comma separated list of backuppers to run [v3io,platform_resources] |
--platform-resources=users?include=user_groups, | List of platform resources to backup separated by semicolon and including URL query |
--attribute-scattering=enabled | Enable/disable attribute scattering [enabled/disabled] |
--available-storage-space-margin=25 | Additional storage space overhead required for backup [percentage]. Set to 0 to disable storage space validation |
--backup-config-spec= | Path to yaml file with backup config spec. Yaml parameters will override command line arguments |
--v3io-retries=5 | Number of retries in case of v3io request failures |
--v3io-retry-interval=3 | Interval between v3io retries [sec] |
--request-timeout=60 | REST request timeout (against v3io and internal REST servers) [sec] |
--file-size-limit=512 | Split size threshold for backup (pak) files [MB] |
--object-scanner-task-listen-addr=:48541 | Listening port for object task server |
--report-max-failed-items-limit=50 | Maximum number of failed item details to include in the restore report |
--report-server-listen-addr string=:58541 | Listening port for report server |
--logger-file-path=/backups/ | Outputs logger to a file |
--logger-no-color | Logger won't output with colors |
--object-scanner-max-included-data-size=131072 | Max data size to include along with object attributes [Bytes] (<256KB) (v3.0.1+ only) |
--object-scanner-replicas= | Number of objects scanners replicas. By will be equal to number of VNs or according to available memory size |
--object-scanner-max-concurrent-writes= | Maximum number of concurrent object body write tasks per each object scanner (object body throttling). Set to negative number for unlimited |
--object-scanner-max-pending-object-bodies=500 | Max number of pending object bodies write tasks per each object scanner |
--object-scanner-max-failed-object-bodies= | Maximum number of failed object bodies before backup abort is triggered |
--object-scanner-max-pending-object-attributes=250 | Max number of object attributes pending for packing per each object scanner |
--object-writer-ack-retries=5 | Number of retries in case of object writer ACK request failures against object task server |
--object-writer-ack-retry-interval=3 | Interval between object writer ACK retries against object task server [sec] |
--object-writer-replicas=24 | Number of replicas to perform object writes |
--tree-scanner-container-content-max-nodes-limit=1000 | Max number of tree nodes in get container contents response |
--tree-scanner-max-pending-tree-nodes=500 | Max number of pending tree node indexing task |
--tree-scanner-num-of-tree-node-directory-scanners=3 | Number of tree node directory fetchers per each tree node scanner |
--tree-scanner-replicas=3 | Number of replicas to perform tree node scanners |
--max-unused-buffers-pool-size=36 | Maximum number of unused buffers in buffer pool |
--packing-buffer-size=10 | The size of buffers used for packing/unpacking objects [MB] |
--profiling=disabled | Enable/disable profiling |
--profiling-port=6060 | Port for profiling (pprof) web server |
Restore
Use the following command to restore your backup.
./gibctl-<version>-linux-amd64 restore backup \
--data-plane-url <WEBAPI URL for TENANT NAMESPACE> --control-plane-url <DASHBOARD URL for TENANT NAMESPACE> \
--data-access-key <DATA ACCESS KEY> --control-access-key <CONTROL ACCESS KEY> \
--backup-name <BACKUP NAME> --path <PATH TO GIBBY BACKUP LOCATION>
Example
https://dashboard.default-tenant.app.satsdl.satsnet.com.sg --data-access-key 40b59ba2-0e59-4ab7-8843-1bcdb1fb79ef --control-access-key
1eec7eaa-9064-4699-8c9b-b2327943b0ae --backup-name backup-test --path /home/iguazio/backup
Optional Restore Run Arguments Use the following optional arguments (options) when running your restore.
Options | Description |
---|---|
--containers=users,bigdata | Comma separated list of containers to restore |
--target-containers=original_container_name1: | Comma separated list of containers to restore under different name |
--snapshot-id= | Specific snapshot to restore |
--skip-object-types=tiny_objects,streams | Comma separated list of object types to skip during restore [objects,tiny_objects,streams] |
--recovery-mode=enabled | Enable/disable recovery after failure [enabled, disabled] |
--scattering-attributes-limit=1700000 | Attributes will be split into chunks during restore if their total size is above the limit |
--backup-config-spec= | Path to yaml file with backup config spec. Yaml parameters will override command line arguments |
--v3io-retries=5 | Number of retries in case of v3io request failures |
--v3io-retry-interval=3 | Interval between v3io retries [sec] |
--request-timeout=60 | REST request timeout (against v3io and internal REST servers) [sec] |
--report-max-failed-items-limit=50 | Maximum number of failed item details to include in the restore report |
--report-server-listen-addr string=:58541 | Listening port for report server |
--logger-file-path=/backups/ | Outputs logger to a file |
--logger-no-color | Logger won't output with colors |
--object-restorers-replicas= | Number of objects restorers replicas |
--object-restorer-num-of-object-writers=1 | Number of object writers per each object restorer |
--tree-restorer-replicas=3 | Number of replicas to perform tree node restore |
--tree-restorer-max-pending-tree-node-writes=100 | Max number of pending tree node write tasks |
--max-unused-buffers-pool-size=36 | Maximum number of unused buffers in buffer pool |
--packing-buffer-size=10 | The size of buffers used for packing/unpacking objects [MB] |
--profiling=disabled | Enable/disable profiling |
--profiling-port=6060 | Port for profiling (pprof) web server |
--verify false
flag must be added to the backup and restore yaml files.