Data Collection

How do I Ingest data?

There are several methods to ingest data including:

Batch ingestion via API/Python SDK
Real-time ingestion via Kafka, Kinesis, and others
Using a JDBC interface.

In addition, you can utilize our Feature Store to do batch/real-time ingestion with a series of transformations, aggregations, and joins.

What data sources can you consume data from?

In general, Iguazio can ingest data from any source that has an API or Python SDK. This includes but is not limited to:

Databases (Oracle, Postgres, Snowflake, etc.)
Files (CSV, TSV, Parquet, etc.)
Streams (Kafka, Kinesis, MQTT, etc.)

What types of data are supported?

In general, any type of data is supported for usage within the platform. Iguazio works well with structured as well as unstructured data.

Do I have to upload data to Iguazio?

No, uploading data to Iguazio is not a must. You can leave your data outside of Iguazio (for example on S3) and Iguazio can still access the data. However, for some workloads, especially for real time workloads, it is recommended to use Iguazio's key value DB for increased performance.

How do you explore the data in Iguazio?

Typically, data scientists are exploring data using Jupyter notebooks. Iguazio is an open environment, so you can leverage any package and tool (for example, pandas, plotly etc.) to explore your data. You can always integrate Iguazio with visualization tools for advanced reporting and exploration.

What kind of data can be used for training?

There is no limitation. Users can train models on images, videos, audio, text, time series, tabular, or any other data type.

We are required to keep track of data location and trace lineage. How can I manage the data in Iguazio?

Metadata about the data stored in Iguazio can be queried and exported to be read by a Data Governance tool.

Can I use a 3rd party tool for visualization?

Yes, you can connect Iguazio via JDBC to third party tools includiong Tableau, Looker, QlikView and many others.