Introducing the Platform
The Iguazio Data Science Platform (“the platform”) is a fully integrated and secure data science platform as a service (PaaS), which simplifies development, accelerates performance, facilitates collaboration, and addresses operational challenges. The platform incorporates the following components:
- A data science workbench that includes Jupyter Notebook, integrated analytics engines, and Python packages
- Model management with experiments tracking and automated pipeline capabilities
- Managed data and machine-learning (ML) services over a scalable Kubernetes cluster
- A real-time serverless functions framework — Nuclio
- An extremely fast and secure data layer that supports SQL, NoSQL, time-series databases, files (simple objects), and streaming
- Integration with third-party data sources such as Amazon S3, HDFS, SQL databases, and streaming or messaging protocols
- Real-time dashboards based on Grafana
Data Science Workflow
The Iguazio Data Science Platform provides a complete data science workflow in a single ready-to-use platform that includes all the required building blocks for creating data science applications from research to production:
- Collect, explore, and label data from various real-time or offline sources
- Run ML training and validation at scale over multiple CPUs and GPUs
- Deploy models and applications into production with serverless functions
- Log, monitor, and visualize all your data and services
The Tutorial Notebooks
The home directory of the platform’s running-user directory (
- To view and run the tutorials from the platform, you first need to create a Jupyter Notebook service, if you don’t already have one (see instructions).
welcome.ipynbnotebook and main README.mdfile provide a similar introduction to that available on the current page, in different formats.
Start out by running the getting-started tutorial to familiarize yourself with the platform and experience firsthand some of its main capabilities.
End-to-End Use-Case Applications
Iguazio provides full end-to-end use-case applications (demos) that demonstrate how to use the platform and related tools to address data science requirements for different industries and implementations.
Pre-Deployed Platform Demos
The platform comes pre-deployed with the following end-to-end use-case demos, which are available in the
- Natural language processing (NLP) — processes natural-language textual data and generates a Nuclio serverless function that translates any given text string to another (configurable) language.
- Stream enrichment — implements a typical stream-based data-engineering pipeline, including real-time data enrichment using a NoSQL table.
- Smart stock trading — reads stock-exchange data from an internet service into a time-series database (TSDB) and performs real-time market-sentiment analysis on specific stocks; the data is saved to a platform NoSQL table for generating reports and analyzing and visualizing the data on a Grafana dashboard.
- Real-time user segmentation — builds a stream-event processor on a sliding time window for tagging and untagging users based on programmatic rules of user behavior.
You can download additional demos from GitHub — for example:
- scikit-learn AutoML pipeline — builds a full end-to-end automated-ML (AutoML) pipeline using scikit-learn and the UCI Iris data set.
- Horovod image classification with distributed training — implements an end-to-end image-classification solution using TensorFlow (versions 1 or 2), Keras, Horovod, and Nuclio.
# Get additional demos !/User/get-additional-demos.sh
Additional Platform Resources
- Introduction video (available also in the Iguazio Trial Quick-Start tutorial)
- In-depth platform overview with a break down of the steps for developing a full data science workflow from development to production (available also as a tutorial notebook —
- Platform components, services, and development ecosystem introduction
- nuclio-jupyter SDK for creating and deploying Nuclio functions with Python and Jupyter Notebook
- Creating Virtual Environments in Jupyter Notebook
- Updating the Tutorial Notebooks to the Latest Version
Creating Virtual Environments in Jupyter Notebook
A virtual environment is a named, isolated, working copy of Python that maintains its own files, directories, and paths so that you can work with specific versions of libraries or Python itself without affecting other Python projects. Virtual environments make it easy to cleanly separate projects and avoid problems with different dependencies and version requirements across components. See the virtual-env tutorial notebook for step-by-step instructions for using Conda to create your own Python virtual environments, which will appear as custom kernels in Jupyter Notebook.
Updating the Tutorial Notebooks to the Latest Version
The v3io Directory
v3io data mount for browsing the platform data containers.
For information about the predefined data containers and data mounts and how to reference data in these containers, see Platform Data Containers in the
- The documentation is versioned; you are currently viewing the documentation for version 2.8.0. You can see the current version and select a different version from the version-selection box in the main navigation menu at the top of each documentation-site page.
- The the software-specifications and release-notes documentation is confidential and restricted to registered users only. For more information, contact firstname.lastname@example.org.
The Iguazio support team will be happy to assist with any questions.