Signup Bonus

Get +1,000 bonus credits on Pro, +2,500 on Business. Start building today.

View plans
NovaKit
Back to Blog

AI Code Security: How to Avoid the 48% Vulnerability Problem

Nearly half of AI-generated code contains security vulnerabilities. Here's what's going wrong, the most common issues, and a practical checklist to ship secure AI-assisted code.

15 min read
Share:

AI Code Security: How to Avoid the 48% Vulnerability Problem

A Stanford study found that developers using AI coding assistants produced significantly less secure code than those coding manually. GitHub's own research shows 48% of AI-generated code contains security vulnerabilities.

Nearly half.

This isn't a theoretical problem. AI-generated code is shipping to production every day. Some of that code has SQL injection vulnerabilities. Cross-site scripting holes. Authentication bypasses.

Here's what's going wrong—and how to fix it.

Why AI Writes Insecure Code

The Training Data Problem

AI models learn from existing code. GitHub has billions of lines of it. Much of that code is:

  • Old: Written before current security best practices
  • Educational: Tutorials that skip security for clarity
  • Insecure: Production code with vulnerabilities
  • Incomplete: Snippets without full context

The model learns patterns. If it sees 1000 examples of query = "SELECT * FROM users WHERE id = " + user_input, it learns that pattern. It doesn't know that pattern is SQL injection waiting to happen.

The Optimization Problem

AI optimizes for "code that works." Security is a constraint, not the objective.

When you ask: "Write a function to get user data from the database"

AI optimizes for: Function that retrieves user data ✓ AI doesn't optimize for: Function that does so securely

Security requires knowing what could go wrong. AI generates what could go right.

The Context Problem

Security is contextual. The same code can be secure in one context and vulnerable in another.

# Secure in admin dashboard (authenticated, trusted users)
# Vulnerable in public API (untrusted input)
def search_users(query):
    return db.execute(f"SELECT * FROM users WHERE name LIKE '%{query}%'")

AI doesn't know where the code will run. It can't assess the threat model.

The Confidence Problem

AI never says "I'm not sure this is secure." It generates confident-looking code regardless of security implications.

This is dangerous. Confident insecure code gets merged. Uncertain code gets reviewed.

The Most Common AI Security Vulnerabilities

Based on analysis of thousands of AI-generated code samples:

1. SQL Injection (23% of vulnerabilities)

What AI generates:

def get_user(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    return db.execute(query)

The vulnerability: User input directly in SQL string. Attacker can inject: admin'-- to bypass authentication.

Secure alternative:

def get_user(username):
    query = "SELECT * FROM users WHERE username = %s"
    return db.execute(query, (username,))

Why AI gets it wrong: Training data full of f-string queries. They're shorter, clearer for examples. AI learned the pattern.

2. Cross-Site Scripting / XSS (18% of vulnerabilities)

What AI generates:

function displayComment(comment) {
    document.getElementById('comments').innerHTML += `<div>${comment}</div>`;
}

The vulnerability: User content rendered as HTML. Attacker can inject: <script>stealCookies()</script>

Secure alternative:

function displayComment(comment) {
    const div = document.createElement('div');
    div.textContent = comment;  // Automatically escaped
    document.getElementById('comments').appendChild(div);
}

Why AI gets it wrong: innerHTML is the "easy" way. Appears in countless tutorials.

3. Hardcoded Credentials (15% of vulnerabilities)

What AI generates:

def connect_to_api():
    api_key = "sk_live_abc123xyz789"
    return requests.get(url, headers={"Authorization": f"Bearer {api_key}"})

The vulnerability: Secrets in source code. Committed to git. Visible to anyone with repo access.

Secure alternative:

import os

def connect_to_api():
    api_key = os.environ.get("API_KEY")
    if not api_key:
        raise ValueError("API_KEY environment variable required")
    return requests.get(url, headers={"Authorization": f"Bearer {api_key}"})

Why AI gets it wrong: Examples with placeholder values look like hardcoded credentials.

4. Broken Authentication (12% of vulnerabilities)

What AI generates:

app.post('/login', (req, res) => {
    const user = db.find(u => u.username === req.body.username);
    if (user && user.password === req.body.password) {
        res.cookie('user', user.username);
        res.redirect('/dashboard');
    }
});

The vulnerabilities:

  • Plain text password comparison (should be hashed)
  • Unsigned cookie (can be forged)
  • No rate limiting (brute force possible)
  • No CSRF protection

Secure alternative:

app.post('/login', rateLimit({ max: 5 }), csrfProtection, async (req, res) => {
    const user = await db.findUser(req.body.username);
    if (user && await bcrypt.compare(req.body.password, user.passwordHash)) {
        req.session.userId = user.id;  // Signed session
        res.redirect('/dashboard');
    } else {
        // Same response for user not found / wrong password
        res.status(401).send('Invalid credentials');
    }
});

Why AI gets it wrong: Simple auth examples abound. Security is "advanced."

5. Insecure Deserialization (8% of vulnerabilities)

What AI generates:

import pickle

def load_user_preferences(data):
    return pickle.loads(data)

The vulnerability: Pickle can execute arbitrary code during deserialization. Attacker sends malicious pickle data → remote code execution.

Secure alternative:

import json

def load_user_preferences(data):
    return json.loads(data)  # Safe: only data, no code execution

Why AI gets it wrong: Pickle is Python's "default" serialization. Appears frequently.

6. Path Traversal (7% of vulnerabilities)

What AI generates:

def get_file(filename):
    with open(f"/uploads/{filename}") as f:
        return f.read()

The vulnerability: User can request ../../../etc/passwd to read system files.

Secure alternative:

import os

def get_file(filename):
    # Resolve to absolute path and verify it's in uploads
    base_dir = os.path.abspath("/uploads")
    file_path = os.path.abspath(os.path.join(base_dir, filename))

    if not file_path.startswith(base_dir):
        raise ValueError("Invalid path")

    with open(file_path) as f:
        return f.read()

Why AI gets it wrong: Simple file operations are common examples. Security checks are "extra."

7. Missing Input Validation (10% of vulnerabilities)

What AI generates:

app.post('/transfer', (req, res) => {
    const { fromAccount, toAccount, amount } = req.body;
    db.transfer(fromAccount, toAccount, amount);
    res.send('Transfer complete');
});

The vulnerabilities:

  • No validation that user owns fromAccount
  • No validation amount is positive
  • No validation accounts exist
  • No transaction limits

Secure alternative:

app.post('/transfer', authenticate, async (req, res) => {
    const { toAccount, amount } = req.body;
    const fromAccount = req.user.accountId;  // From auth, not user input

    // Validate
    if (!Number.isFinite(amount) || amount <= 0 || amount > 10000) {
        return res.status(400).send('Invalid amount');
    }

    if (!await db.accountExists(toAccount)) {
        return res.status(400).send('Invalid destination');
    }

    await db.transfer(fromAccount, toAccount, amount);
    res.send('Transfer complete');
});

Why AI gets it wrong: Validation is "obvious" to humans, invisible to AI.

8. Exposure of Sensitive Data (7% of vulnerabilities)

What AI generates:

@app.get("/user/{user_id}")
def get_user(user_id: int):
    user = db.get_user(user_id)
    return user  # Returns everything including password_hash, ssn, etc.

Secure alternative:

from pydantic import BaseModel

class UserResponse(BaseModel):
    id: int
    username: str
    email: str
    # password_hash, ssn, etc. not included

@app.get("/user/{user_id}")
def get_user(user_id: int):
    user = db.get_user(user_id)
    return UserResponse(**user.dict())

Why AI gets it wrong: Returning "the object" is simple. Filtering is extra work.

The Security Review Checklist

Before merging any AI-generated code, run through this checklist:

Input Handling

  • All user input is validated
  • Input type, length, format verified
  • Dangerous characters escaped/sanitized
  • File uploads validated (type, size, content)

SQL/Database

  • No string concatenation for queries
  • Parameterized queries or ORM used
  • Database user has minimal privileges
  • Sensitive data encrypted at rest

Authentication

  • Passwords hashed (bcrypt, argon2)
  • Sessions are signed/encrypted
  • Session expiration implemented
  • Rate limiting on login
  • Account lockout after failed attempts

Authorization

  • Every endpoint checks permissions
  • User can only access their own data
  • Admin functions protected
  • No IDOR (Insecure Direct Object Reference)

Output Handling

  • User content escaped before rendering
  • Content-Type headers set correctly
  • No sensitive data in responses
  • Error messages don't leak internals

Secrets

  • No hardcoded credentials
  • Secrets from environment/vault
  • API keys not in logs
  • .gitignore includes secrets files

Dependencies

  • No known vulnerable dependencies
  • Dependencies from trusted sources
  • Lock files for reproducibility
  • Regular dependency updates

Automated Security Scanning

Don't rely on manual review alone. Implement automated scanning:

Static Analysis (SAST)

Tools that scan code for vulnerabilities:

For JavaScript/TypeScript:

  • ESLint security plugins
  • Semgrep
  • SonarQube

For Python:

  • Bandit
  • Semgrep
  • PyLint security checkers

For general:

  • Snyk Code
  • GitHub Advanced Security
  • Checkmarx

Dependency Scanning

Check for known vulnerable dependencies:

# npm
npm audit

# pip
pip-audit

# General
snyk test

Secret Detection

Catch hardcoded secrets before commit:

# Pre-commit hook options
gitleaks
trufflehog
detect-secrets

Dynamic Analysis (DAST)

Test running application:

  • OWASP ZAP
  • Burp Suite
  • Nuclei

Infrastructure as Code Scanning

If AI generates terraform/cloudformation:

  • Checkov
  • tfsec
  • Terrascan

The CI/CD Security Gate

Implement security as a gate, not a suggestion:

# Example GitHub Actions workflow
security-scan:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4

    - name: Run Semgrep
      uses: returntocorp/semgrep-action@v1
      with:
        config: p/security-audit

    - name: Check dependencies
      run: npm audit --audit-level=high

    - name: Secret detection
      uses: gitleaks/gitleaks-action@v2

    - name: Fail if issues found
      run: |
        if [ -f semgrep-results.json ] && [ $(jq '.results | length' semgrep-results.json) -gt 0 ]; then
          echo "Security issues found!"
          exit 1
        fi

If security scan fails, PR can't merge. No exceptions.

Training Your Team

Security scanning catches known patterns. Your team catches everything else.

Security Training Essentials

Every developer should understand:

  1. OWASP Top 10: The most common web vulnerabilities
  2. Threat Modeling: How to think about what could go wrong
  3. Secure Coding Practices: Language-specific security patterns
  4. AI-Specific Risks: Why AI code needs extra scrutiny

Code Review for Security

When reviewing AI-generated code, ask:

  1. "What user input touches this code?"
  2. "What could a malicious user send?"
  3. "What happens if this data is wrong/malicious?"
  4. "What sensitive data does this touch?"
  5. "What permissions should this require?"

Security Champions

Designate team members as security champions:

  • Extra security training
  • Review all AI-generated code
  • Triage security scanner results
  • Advocate for secure patterns

The Prompt Engineering Angle

Better prompts can produce more secure code:

Bad Prompt:

"Write a function to query users from the database"

Better Prompt:

"Write a secure function to query users from the database. Use parameterized queries to prevent SQL injection. Don't return password hashes or sensitive fields. Include input validation."

Even Better:

"Write a function to query users following these security requirements:

  • Parameterized queries only (no string concatenation)
  • Return only: id, username, email, created_at
  • Validate that id is a positive integer
  • Handle not-found case without leaking info
  • Add rate limiting consideration in comments"

Explicit security requirements in prompts produce more secure code. Still verify, but start from a better baseline.

What We Do at NovaKit

Security is central to how we build NovaKit:

AI Chat: Trained to prefer secure patterns. Will flag potential security issues in code it generates.

AI Builder: Generated code uses secure defaults. Parameterized queries, proper escaping, environment variables for secrets.

Security Scanning: Built-in scanning for generated code before deployment.

Transparency: You see all generated code. Nothing hidden.

We can't guarantee perfect security (no one can), but we can guarantee security is a first-class concern, not an afterthought.

The Bottom Line

48% vulnerability rate is a crisis hiding in plain sight.

AI is shipping insecure code to production right now. Some of it is in your codebase. Some of it handles user data. Some of it processes payments.

The solution isn't to stop using AI. The solution is to:

  1. Know the risks: Understand what AI gets wrong
  2. Verify everything: Never trust AI code implicitly
  3. Automate scanning: Catch vulnerabilities before production
  4. Train your team: Security awareness for everyone
  5. Prompt better: Ask for secure code explicitly
  6. Gate deployment: Security scans must pass

AI is a powerful tool. Like all powerful tools, it requires respect and caution.

Use it wisely.


NovaKit builds security into AI-assisted development. See how we approach secure code generation and build with confidence.

Enjoyed this article? Share it with others.

Share:

Related Articles