When you use a hosted AI service, your conversations are processed on servers you do not own, by infrastructure you do not control, under terms of service that give the provider broad rights to use your data. For casual personal use, this is an abstract concern. For businesses handling client data, proprietary information, or anything that creates a competitive or regulatory obligation, it is a real and immediate risk that deserves a real and immediate answer.
What “Being Used for Training” Actually Means
Most major AI providers use conversations to train future model versions unless you explicitly opt out. This does not mean your specific conversations are reproduced verbatim — the training process involves showing the model patterns in data, not memorising individual exchanges. But your data participates in shaping how the model behaves for everyone. If you have NDA obligations to clients, or your conversations contain genuinely novel business intelligence, the question of whether that intelligence is entering a training pool you do not control is a real information security question.
The opt-out mechanisms exist, but they are not prominent. Opting out typically requires navigating to organisation-level account settings, finding the correct toggle, and confirming the change. The opt-outs often apply at the account level rather than the conversation level — you cannot opt out of training for one conversation and not others.
What Self-Hosting Actually Changes
A self-hosted AI on a VPS you control eliminates the training data concern entirely. Your prompts and conversations never leave your infrastructure, are never sent to a third-party server, and are never accessible to the AI provider’s infrastructure. There is no opt-out mechanism to find and maintain because there is no third party with access to your data. The data is yours. The training that happens on your VPS is training you control.
For businesses, self-hosting also changes the GDPR calculation. Using a hosted AI provider creates a data processing relationship that requires a Data Processing Agreement under GDPR. Self-hosting removes that requirement because you are not providing data to a third party.
The Security Model Is Yours to Define
Self-hosting does not automatically mean more secure. A poorly configured self-hosted AI is less secure than a well-configured hosted service. When you control the infrastructure, you can define the security posture precisely: TLS on all API connections, authentication required for the API endpoint, firewall restricting traffic to necessary ports only, no logging of prompt content to disk.
These are not exotic requirements — they are standard server security practices that any competent systems administrator can implement. The difference is that with self-hosting, you implement them yourself and you control them completely.
Want a guide to setting up a private, self-hosted AI on a VPS? I cover privacy, security hardening, and the full setup process — from VPS provisioning to running models — in the guide here.
The Security Audit: What to Check on a Self-Hosted AI
A self-hosted AI deployment should be audited regularly for security vulnerabilities. The most critical checks are: whether the API endpoint is accessible from the public internet without authentication (it should not be), whether TLS is enforced on all connections (it must be), whether the server OS and Ollama are receiving security updates (they should be, automatically), and whether there are any logged conversations or prompts stored on disk in plaintext (there should not be).
The authentication layer is the most commonly overlooked security element in self-hosted AI deployments. Ollama’s default configuration does not require authentication — it accepts requests from any source that can reach the server’s port. If your Ollama instance is on a VPS with a public IP and no firewall rule blocking port 11434, anyone on the internet can send prompts to your AI and receive responses. The fix is to configure a reverse proxy like Nginx with API key authentication in front of Ollama, or to use Ollama’s built-in authentication if available in your version.
The GDPR Compliance Angle
For businesses in the EU or handling data of EU residents, GDPR applies to any system that processes personal data. A self-hosted AI that processes conversations containing personal data — names, email addresses, business information, anything that identifies an individual — is a data processing system that falls under GDPR. The obligations this creates are: a lawful basis for processing (usually contractual necessity or legitimate interest), data minimisation (not processing more than necessary), security (appropriate technical measures), and the right of data subjects to access and delete their data.
For most small businesses using self-hosted AI for their own internal workflows — drafting, research, internal communications — the risk profile is low. The data is not shared with third parties, is processed only on your own infrastructure, and is not retained beyond the current session unless you explicitly configure logging. The key compliance action is to conduct a basic Data Protection Impact Assessment (DPIA) for your AI workflows, document your lawful basis, and ensure that the technical security measures described above are in place. This is not a legal obligation for most small businesses, but it is good practice and demonstrates due diligence if a question ever arises.
Ready to support your health? Browse supplements on Gumroad — buy now from £8.
Leave a Reply