Python Machine-Learning and Scientific-Computation Packages

On This Page

The platform's Jupyter Notebook service pre-deploys the pandas open-source Python library for high-performance data processing using structured DataFrames ("pandas DataFrames"). The platform also pre-deploys other Python packages that utilize pandas DataFrames, such the Dask parallel-computation library or Iguazio's V3IO Python SDK and V3IO Frames libraries.

You can easily install additional Python machine-learning (ML) and scientific-computation packages — such as TensorFlow, Keras, scikit-learn, PyTorch, Pyplot, and NumPy. The platform's architecture was designed to deploy computation to one or more CPU or GPU with a single Python API.

For example, you can install the TensorFlow open-source library for numerical computation using data-flow graphs. You can use TensorFlow to train a logistic regression model for prediction or a deep-learning model, and then deploy the same model in production over the same platform instance as part of your operational pipeline. The data science and training portion can be developed using recent field data, while the development-to-production workflow is automated and time to insights is significantly reduced. All the required functionality is available on a single platform with enterprise-grade security and a fine-grained access policy, providing you with visibility into the data based on the organizational needs of each team. The following Python code sample demonstrates the simplicity of using the platform to train a TensorFlow model and evaluate the quality of the model's predictions:

    input_fn=lambda: input_fn(train_data, num_epochs, True, batch_size))
results = model.evaluate(input_fn=lambda: input_fn(
    test_data, 1, False, batch_size))
for key in sorted(results):
    print('%s: %s' % (key, results[key]))

The image-classification-with-distributed-training demo demonstrates how to build an image recognition and classification ML model and perform distributed model training by using Horovod, Keras, TensorFlow, and Nuclio.

See Also