Introducing the Platform's Application Services
In addition to its core data services, the platform comes pre-deployed with essential and useful proprietary and third-party open-source tools and libraries that facilitate the implementation of a full data science workflow, from data collection to production (see Introducing the Platform). Both built-in and integrated tools are exposed to the user as application services that are managed by the platform using Kubernetes. Each application is packaged as a logical unit within a Docker container and is fully orchestrated by Kubernetes, which automates the deployment, scaling, and management of each containerized application. This provides users with the ability and flexibility to run any application anywhere, as part of their operational pipeline.
The application services can be viewed and managed from the dashboard
The platform's application development ecosystem includes
- Distributed data frameworks and engines — such as Spark, Presto, Horovod, and Hadoop.
- The Nuclio serverless framework.
- Enhanced support for time-series databases (TSDBs) — including a CLI tool, serverless functions, and integration with Prometheus.
- Jupyter Notebook and Zeppelin interactive web notebooks for development and testing of data science and general data applications.
- A web-based shell shell) service and Jupyter terminals, which provide bash command-line shells for running application services and performing basic file-system operations.
- Integration with popular Python machine-learning and scientific-computation packages for development of ML and artificial intelligence (AI) applications — such as TensorFlow, Keras, scikit-learn, pandas, PyTorch, Pyplot, and NumPy.
- Integration with common Python libraries that enable high-performance Python based data processing — such as Dask and RAPIDS.
- Support for Data Science Automation (MLOps) Services using the MLRun library and Kubeflow Pipelines — including defining, running, and tracking managed, scalable, and portable ML tasks and full workflow pipelines.
- The V3IO Frames open-source unified high-performance DataFrame API library for working with NoSQL, stream, and time-series data in the platform.
- Support for executing code over GPUs.
- Integration with data analytics, monitoring, and visualizations tools — including built-in integration with the open-source Grafana metric analytics and monitoring tool and easy integration with commercial business-intelligence (BI) analytics and visualization tools such as Tableau, Looker, and QlikView.
- Logging and monitoring services for monitoring, indexing, and viewing application-service logs — including a log-forwarder service and integration with Elasticsearch.
As a prerequisite to using the platform's application services, you need to configure conditional forwarding for your cluster's DNS server. For more information and step-by-step instructions, see Configuring the DNS Server.
To help you locate the services and tools that interest you, following is an alphabetical list with links to relevant documentation:
- Docker Registry
- Kubeflow Pipelines
- Log Forwarder
- MPI Operator
- TSDB CLI
- TSDB Nuclio Functions
- Web Shell