Scaling NLP Pipelines at S&P Global (IHS Markit)

Name: Scaling NLP Pipelines at S&P Global (IHS Markit)
Uploaded: 2021-12-15T13:28:27+00:00
Duration: 59 min 38 s
Description: The data science team at S&P Global (IHS Markit) share practical advice on building sophisticated NLP pipelines that work at scale.

The data science team at S&P Global (IHS Markit) share practical advice on building sophisticated NLP pipelines that work at scale. Using a robust and automated MLOps process, they run complex models that make massive amounts of unstructured data searchable and indexable.

In this session, they share their journey with MLOps and provide practical advice for other data science teams looking to:

Ingest, prepare, classify and index structured and unstructured data (in this case, PDFs and Images)
Handle terabytes of data in hours, not months
Make deployment of models seamless by working in one unified research and production environment
Leverage CI/CD for ML
Allow for sharing and reuse of components across projects and teams
Utilize auto-scaling serverless functions to abstract away infrastructure complexities
Build rapidly, iterate faster, and focus on the business logic and not the underlying infrastructure
Nick and Yaron share their approach to automating the NLP pipeline end to end. They’ll also touch on how to use Iguazio and the MLRun open-source framework, which comes with capabilities such as Spot integration and Serving Graphs, to reduce costs, accelerate and simplify the data science process.

Watch More

Session #16

Building a Real-Time ML Pipeline with a Feature Store

Session #8

NetApp's Michael Oglesby on Building ML Pipelines Over Federated Data

Session #13

How Feature Stores Accelerate & Simplify Deployment of AI to Production