Data Science Projects with DeepSeek: Key Benefits and Risks

0 0 4 minutes read

In January 2025, Chinese AI development firm DeepSeek released their R1 LLM. They detonated the notion that developing LLMs required enormous technical and financial resources. The DeepSeek R1 LLM cost a fraction of other LLMs to develop. And, in a vast market upset, DeepSeek was also released under an open-source license.

DeepSeek R1 is a direct challenger to OpenAI’s o1, the leading AI solution since the debut of OpenAI’s ChatGPT in 2022.

The DeepSeek AI assistant was an immediate hit. The mobile app quickly topped the official app stores’ download lists and caused major turbulence in the stock markets. But it has also raised suspicions regarding its data privacy policy.

What Does the DeepSeek Coder Do?

DeepSeek Coder is an advanced AI-powered coding assistant that provides code suggestions, automates debugging, and optimizes data query performance. In data science, the primary goal of DeepSeek is to reduce the time spent on repetitive coding tasks. DeepSeek optimizes workflows and recommends coding best practices. It also provides AI-driven debugging solutions to fix syntax errors and logical flaws. It’s an outstanding tool for enhancing data search capabilities for large datasets.

DeepSeek improves the decision-making processes with these specific features:

Feature Engineering. Automates the encoding, scaling, and transformation of variables. Recommends techniques for feature selection.
Data Pipeline Automation. Automates ETL (Extract, Transform, Load) processes. Suggests best practices for managing both real-time and batch data processing.
Exploratory Data Analysis (EDA). Offers efficient methods to summarize data. It helps identify key insights through automated statistical analysis.
AI-Assisted Statistical Analysis. Generates hypothesis tests and confidence intervals. Assists with time series forecasting and regression analysis.
AI-powered documentation. Generates comprehensive documentation and automated code reviews. Enhances collaboration and ensures quality control.

The Differences Between DeepSeek R1 and OpenAI o1

DeepSeek R1’s systematic, iterative method excels in coding tasks and mathematical reasoning. It can solve very complex problems but is slower than Openai o1. It’s on an open-source framework, which makes it budget-friendly. It’s a good solution for users with limited computational power. The open-source DeepSeek R1 offers cost-effective API integrations or self-hosted and locally deployed solutions. As a drawback, complex visual data may trip up DeepSeek.

In contrast, OpenAI o1 is faster to deliver accurate coding outputs. It outperforms DeepSeek in visual processing performance. It excels in interpreting misleading graph information. OpenAI 01’s main drawback is cost. However, its speed and excellent performance may justify the expense.

Exposing Data Security and Privacy Concerns

The privacy policies and terms of use of commercial AI models have always been controversial. AI vendors have been scraping the internet for training data for years, not pausing to distinguish between copyrighted works and personal data or asking anyone’s permission.

AI research leader OpenAI’s policies currently state that paying users’ data is safe from prying. They also use secure data transmission protocols and state-of-the-art data encryption practices. Even so, data science researchers are safer using a VPN to encrypt their internet connections. A VPN ensures secure data transmission and protects sensitive information during analysis. Most researchers also try to keep sensitive data out of conversations with AI interfaces.

However, in February 2025, researchers revealed problems in the DeepSeek app security. They found several security and privacy issues in the mobile app for Apple iOS.

DeepSeek has a built-in ability to turn off the automatic transport layer security setting on iOS. It also uses several poor data encryption practices. Any data transmitted or stored by the DeepSeek app is vulnerable to device tracking and data eavesdropping via several well-known attack paths.

The Key Data Security and Privacy Risks

The key risks identified in the audit of DeepSeek on the Apple iOS have since also been found on Android and the web application:

Unencrypted data transmission. DeepSeek transmits unencrypted sensitive data over the internet. Your data is vulnerable to interception, manipulation, or corruption.
Weak and incorrectly applied encryption. DeepSeek uses the outdated and compromised Triple DES encryption. The encryption keys are also hardcoded, and initialization vectors are reused.
Insecure local data storage. People’s usernames, passwords, and encryption keys are at risk of credential theft.
Extensive data collection and fingerprinting. DeepSeek – like most apps – collects a lot of user data. User defaults, file timestamps, or system boot are among the data that can be de-anonymized and used to identify individuals.
Data is subject to PRC laws. The user data is transmitted to China. It sometimes goes to US-based or ostensibly US-owned servers, such as the controversial Chinese company ByteDance, but it ends up in China. This practice raises concerns about the PRC’s access to the private information of US citizens.

Implications for US Enterprises and Government Agencies

The fact that DeepSeek transmits unencrypted data over the internet is a massive red flag. Individual users can encrypt their sessions using a VPN to secure their data transmissions. However, they still have to deal with the poor encryption of their data at rest.

Poor data encryption could mean data breaches that include prompt information, the company’s intellectual property, and strategic plans.

There are added risks, such as user fingerprinting and tracking, as well as data analysis and storage in China. US and EU-based companies may risk losing control of their proprietary and user data.

Mitigating the Risks for DeepSeek Data Science Projects

For US and EU organizations, the risk of storing and processing their data in China is a significant concern. They can’t use the mobile or web app without risking their data security and privacy. Theft of intellectual property may be a valid concern.

Every company should review the data collection, privacy policy, and terms of service for potential risks. Perhaps the current DeepSeek model presents no risks to them. Other companies may find a satisfactory solution with a self-hosted (e.g. Hugging Face) or fully hosted solution.

The Future of AI Data Security and Privacy in Data Science

There is a growing expectation that we will soon achieve fully automated, AI-enhanced predictive modelling and forecasting. AI will manage entire workflows — from data ingestion to deployment.

Currently, AI features such as data cleaning and writing SQL queries can raise productivity and reduce errors. AI serves as an excellent assistant, but humans are still responsible for quality control. The improved AI data results allow us to keep pursuing human innovation.

However, even the best AI tool has limited value if we have to keep looking over our shoulders to see who is watching us. An analysis of DeepSeek’s code shows that our conversations with it aren’t private. But, given the US Big Tech industry’s track record, who can we trust with our data science secrets?