What are the privacy and security implications of using open source components in AI?

The topic of privacy and security is of utmost concern when dealing with AI, and specifically with generative AI. It has ethical, moral and legal implications, as can be seen by some of the recent data privacy lawsuits.

One school of thought finds that open source is safer and more private. OpenAI, for example, is trained on open sources like GitHub and Wikipedia.

Some of the privacy concerns arise when it comes to how these solutions are using the data users prompt. There were cases when users were able to receive other users’ private data, like phone numbers, as a response to their prompt. However, commercial solutions are not necessarily safer.

When tuning your own model you own the data lifecycle and can control which data is going in. It’s important to ensure that data pipelines filter out private information and do not emit it. This call center demo example, uses an open source PII (personally identifiable information) recognizer to filter out private data patterns, like names, social security numbers, credit cards, etc. This ensures private data will not appear in the results when tuning or prompting the model.

Need help?

Contact our team of experts or ask a question in the community.

Have a question?

Submit your questions on machine learning and data science to get answers from out team of data scientists, ML engineers and IT leaders.