Data Management

Implementation and audit guidance for secure handling of data across its lifecycle.


AIJET Principles: A = Awareness I = Integrity J = Judgment E = Ethics T = Transparency
Filter by principle:

Integrity Judgment Transparency
AI Threats: Strictly enforce individual accounts to prevent AI tools from harvesting data through shared or generic accounts.

Guidance to Implement

Deploy centralized identity management (e.g.; Azure AD; Okta) to enforce user uniqueness and integrate with SaaS SSO where possible.

Guidance to Audit

Search for duplicate login patterns; account naming anomalies (e.g.; 'admin'; 'user1'); or reused credentials across users in SIEM and IAM systems.

Key Performance Indicator

X% of users use individual accounts for system access.

Awareness Judgment Transparency
AI Threats: Ensure role-based access accounts for datasets that could be misused for AI model training; profiling; or generation.

Guidance to Implement

Implement role-based access controls and perform periodic reviews of all access rights.

Guidance to Audit

Access control audit reports and review meeting minutes.

Key Performance Indicator

X% of employees have access according to their roles; with regular access reviews.

Judgment Transparency
AI Threats: Annual access reviews must detect any unauthorized access by AI-based crawlers; bots; or automated data collectors.

Guidance to Implement

Schedule annual reviews to assess and document user access rights.

Guidance to Audit

Annual review reports approved by the data owner.

Key Performance Indicator

X% of user access rights reviewed annually for compliance.

Integrity Judgment Transparency
AI Threats: Addresses MITRE ATLAS T0011: Data Poisoning by enforcing data quality control before training LLMs.

Guidance to Implement

Use anomaly detection; data provenance; and diversity checks before including any dataset in model pipelines.

Guidance to Audit

Maintain dataset validation reports and review version control for any training data updates.

Key Performance Indicator

X% of datasets are validated for quality and sanitized before use in AI models.

Integrity Judgment Transparency
AI Threats: Limits "Unknown Model Goals / Shadow Retraining" (MITRE T0006).

Guidance to Implement

Maintain a signed "Model Lineage Card" in the data-catalogue; update on every retrain.

Guidance to Audit

Verify card completeness; cross-check hash of training data vs. stored checksum.

Key Performance Indicator

X% of models and datasets have documented provenance for accountability.

Integrity Judgment Transparency
AI Threats: Blocks data-/model-poisoning and backdoored model supply-chain attacks by ensuring tampered or rogue assets cannot enter the environment.

Guidance to Implement

Tier 1: Accept only models from registries that support signed artifacts (e.g.; Hugging Face with TUF). Tier 2: For high-risk use cases; enforce reproducibility and SBOM traceability.

Guidance to Audit

Confirm registry enforcement; signed artifact settings; and verify a representative sample. Review exception logs for unsigned artifacts.

Key Performance Indicator

X% of datasets and models are verified for integrity using cryptographic checksums.

Awareness Integrity Transparency
AI Threats: Mitigates risk of "shadow profiles" or unauthorized AI personalization.

Guidance to Implement

Implement granular consent flags; propagate to feature store and model registry.

Guidance to Audit

Inspect consent database and lineage tags for a random 10 records.

Key Performance Indicator

X% of user consents are captured and linked to datasets used for AI training.

Integrity Ethics
AI Threats: Lowers breach impact and model inversion success.

Guidance to Implement

Data‑curation pipeline enforces schema + policy; nightly scan flags violations.

Guidance to Audit

Review last scan report; confirm zero critical hits.

Key Performance Indicator

X% of data curations comply with schema and pseudonymization policies.

Integrity Transparency
AI Threats: Limits training on stale or revoked data; meets storage‑limitation laws.

Guidance to Implement

Tag datasets with TTL; schedule deletion jobs; log hash of purged sets.

Guidance to Audit

Cross‑check 3 purged hashes against deletion ledger.

Key Performance Indicator

X% of datasets are tagged with TTL and automatically purged when expired.

Integrity Judgment
AI Threats: Reduces model hallucinations and bias from poor data; improves performance.

Guidance to Implement

Define data quality metrics; automate validation pipelines; maintain quality scorecard.

Guidance to Audit

Sample training sets for quality issues; review rejection logs; test data validation gates.

Key Performance Indicator

X% of datasets used for AI models meet quality standards to reduce bias and errors.

Guidance to Implement

Integrate data ownership training into onboarding and require annual re-certification from data owners.

Guidance to Audit

Signed acknowledgment forms and training attendance records.

High Risk: 0
AI Threats & Mitigation: Ensure data owners receive updated training on risks of contributing organizational data to external AI systems.
Principles: A | T
KPI: % of data owners complete annual re-certification with updated training on AI risks.

Guidance to Implement

Include data classification questions in regular assessments and address identified gaps with targeted training.

Guidance to Audit

Assessment results and remediation plans.

High Risk: 0
AI Threats & Mitigation: Ensure assessments include evaluation of employee understanding of risks related to AI misuse of classified data.
Principles: A
KPI: % of employees score ≥ Y% on data classification assessments; with follow-up training where necessary.

Guidance to Implement

Distribute detailed procedures for secure data handling and conduct regular training sessions.

Guidance to Audit

Procedure documentation and training logs.

High Risk: 0
AI Threats & Mitigation: Ensure procedures emphasize integrity during data handling to protect against AI-assisted data tampering.
Principles: A | T
KPI: % of employees handling sensitive data follow documented secure data handling procedures.

Guidance to Implement

Regularly review and update the data retention policy and include it in mandatory training sessions.

Guidance to Audit

Policy distribution records and compliance audit reports.

High Risk: 0
AI Threats & Mitigation: Comply with retention policies to reduce the risk of old sensitive data being exposed to AI scraping.
Principles: A | J | T
KPI: % compliance with the data retention policy; reducing AI scraping risks.

Guidance to Implement

Implement approved data destruction methods and schedule periodic audits to verify compliance.

Guidance to Audit

Destruction logs and audit reports.

High Risk: 0
AI Threats & Mitigation: Destruction methods should prevent AI-driven data reconstruction from disposed media.
Principles: I | T
KPI: % of sensitive data is destroyed using approved methods with audit logs.

Guidance to Implement

Maintain a list of approved SaaS platforms and perform regular vendor security reviews.

Guidance to Audit

Vendor approval records and audit logs.

High Risk: 0
AI Threats & Mitigation: Approved SaaS platforms must undergo AI risk reviews to prevent data leakage into third-party models.
Principles: J | T
KPI: % of SaaS platforms used for sensitive data handling are approved and comply with AI-related risks.

Guidance to Implement

Deploy application detection in endpoint agents and web gateways to flag unsanctioned AI tool use. Maintain an allowlist of approved AI platforms with controlled access.

Guidance to Audit

Analyze outbound connections and application usage reports to detect unauthorized AI platforms and correlate with departments or user roles.

High Risk: 0
AI Threats & Mitigation: Restrict use of generative AI platforms unless evaluated for secure data handling and privacy protection.
Principles: E | J | T
KPI: % compliance with approved Generative AI platforms for data handling.

Guidance to Implement

Implement an export approval process integrated with DLP tools to monitor and document data exports.

Guidance to Audit

Export approval logs and DLP reports.

High Risk: 0
AI Threats & Mitigation: Monitor all data exports to block unauthorized flows that may be used for AI training or exfiltration.
Principles: A | J | T
KPI: % of data exports undergo approval with DLP monitoring to block unauthorized flows.

Guidance to Implement

Enforce secure transmission protocols via network controls and conduct periodic audits.

Guidance to Audit

Protocol configuration records and audit logs.

High Risk: 0
AI Threats & Mitigation: Use end-to-end encrypted and authenticated protocols to prevent AI-facilitated data interception.
Principles: T
KPI: % of classified data transmissions use end-to-end encrypted and secure protocols.

Guidance to Implement

Develop social media usage policies that include security best practices and distribute them.

Guidance to Audit

Policy documents and training session records.

High Risk: 0
AI Threats & Mitigation: Include risks of AI-assisted social engineering in social media security training.
Principles: A | T
KPI: % of employees comply with social media security policies; including AI-related risks.

Guidance to Implement

Clarify acceptable use policies for personal email and cloud storage; monitor usage for compliance.

Guidance to Audit

Policy documents and usage logs.

High Risk: 0
AI Threats & Mitigation: Define acceptable AI interactions with personal platforms to reduce risk of unmonitored data exfiltration.
Principles: T
KPI: % compliance with personal webmail and cloud storage policies; including AI interactions.

Guidance to Implement

Implement DLP solutions to monitor data transfers and deliver clear training on data handling responsibilities.

Guidance to Audit

DLP reports and training attendance records.

High Risk: 0
AI Threats & Mitigation: Detect and block attempts to use external drives for AI dataset injection or model theft.
Principles: A | T
KPI: % of external drive usage is restricted and monitored for AI-related risks.

Guidance to Implement

Review and document cross-border data transfer processes to ensure they meet all applicable regulatory requirements.

Guidance to Audit

Compliance audit reports and transfer logs.

High Risk: 0
AI Threats & Mitigation: Ensure legal safeguards to prevent global AI actors from training on sensitive exported datasets.
Principles: A | E | J | T
KPI: % of cross-border data transfers comply with regulatory and AI-related data handling laws.

Guidance to Implement

Deploy automated DLP tools to scan for unauthorized shadow data and schedule regular remediation reviews.

Guidance to Audit

DLP scan reports and remediation records.

High Risk: 0
AI Threats & Mitigation: DLP tools should detect AI attempts to duplicate sensitive data onto shadow locations.
Principles: J | T
KPI: % of shadow data copies are detected and remediated using DLP tools.

Guidance to Implement

Utilize external monitoring services to detect unsanctioned copies of sensitive data and document findings.

Guidance to Audit

External monitoring reports and remediation actions.

High Risk: 0
AI Threats & Mitigation: External monitoring must catch unsanctioned use of organizational data in public AI datasets.
Principles: T
KPI: % of unsanctioned use of organizational data in public AI datasets is detected.

Guidance to Implement

Implement comprehensive logging for all outbound data transfers and analyze logs for anomalies.

Guidance to Audit

Outbound transfer logs and review reports.

High Risk: 0
AI Threats & Mitigation: Monitor outbound flows to detect suspicious AI training dataset creation or behavior.
Principles: A | J | T
KPI: % of outbound data transfers are logged and analyzed for AI dataset creation or pre-training activity.

Guidance to Implement

Configure automated alerts based on predefined high-risk thresholds for data exports.

Guidance to Audit

Alert logs and threshold configuration documentation.

High Risk: 0
AI Threats & Mitigation: Configure alerts for data volumes typical of AI model feeding or unauthorized pre-training dumps.
Principles: A | T
KPI: % of high-risk data exports trigger alerts based on predefined AI-specific thresholds.

Guidance to Implement

Use automated checksum and hash validation tools integrated into backup and monitoring processes.

Guidance to Audit

Checksum logs and backup verification reports.

High Risk: 0
AI Threats & Mitigation: Hash validation helps detect AI-led tampering or adversarial poisoning of backup datasets.
Principles: I | J | T
KPI: % of critical data is validated for integrity with automated tools to prevent AI tampering.

Guidance to Implement

Maintain detailed audit trails for changes to critical data assets and review them regularly

Guidance to Audit

Audit logs and review meeting minutes.

High Risk: 0
AI Threats & Mitigation: Audit trail reviews should include changes consistent with AI-generated activity or scripting.
Principles: J | T
KPI: % of changes to critical data assets are logged and reviewed for AI-generated activities.

Guidance to Implement

Use automated data discovery tools to continuously update an inventory of sensitive data assets; review quarterly.

Guidance to Audit

Data inventory reports and audit logs.

High Risk: 0
AI Threats & Mitigation: Maintain an up-to-date inventory to trace data usage in AI environments and assess exposure.
Principles: J | T
KPI: % of sensitive data assets are tracked and inventoried for AI-related exposure risks.

Guidance to Implement

Use anomaly detection; data provenance; and diversity checks before including any dataset in model pipelines.

Guidance to Audit

Maintain dataset validation reports and review version control for any training data updates.

High Risk: 1
AI Threats & Mitigation: Addresses MITRE ATLAS T0011: Data Poisoning by enforcing data quality control before training LLMs.
Principles: I | J | T
KPI: % of datasets are validated for quality and sanitized before use in AI models.

Guidance to Implement

Maintain a signed “Model Lineage Card” in the data-catalogue; update on every retrain.

Guidance to Audit

Verify card completeness; cross-check hash of training data vs. stored checksum.

High Risk: 1
AI Threats & Mitigation: Limits “Unknown Model Goals / Shadow Retraining” (MITRE T0006).
Principles: I | J | T
KPI: % of models and datasets have documented provenance for accountability.

Guidance to Implement

Tier 1: Accept only models from registries that support signed artifacts (e.g.; Hugging Face with TUF). Tier 2: For high-risk use cases; enforce reproducibility and SBOM traceability.

Guidance to Audit

Confirm registry enforcement; signed artifact settings; and verify a representative sample. Review exception logs for unsigned artifacts.

High Risk: 1
AI Threats & Mitigation: Blocks data-/model-poisoning and backdoored model supply-chain attacks by ensuring tampered or rogue assets cannot enter the environment.
Principles: I | J | T
KPI: % of datasets and models are verified for integrity using cryptographic checksums.

Guidance to Implement

Automate provenance capture in the ML pipeline (e.g.; DVC; MLflow). Store metadata in an immutable repository accessible to Risk & Compliance. Flag any asset whose lineage cannot be fully resolved.

Guidance to Audit

Review pipeline logs to verify that provenance records are generated for each new version. Spot-check metadata completeness (source URL; licence; checksums; approving owner).

High Risk: 1
AI Threats & Mitigation: Facilitates rapid incident triage if a model is later found vulnerable; supports audits against licence and regulatory obligations.
Principles: I | J | T
KPI: % of models and datasets have complete lineage documentation for audit purposes.

Guidance to Implement

Integrate software-composition analysis (SCA) and AV scanning in CI/CD. Enforce allow-listing of trusted model registries. Reject package names not present in curated repositories.

Guidance to Audit

Review SCA reports for high-severity issues. Verify build logs show hash checks & signature validation. Spot-check blocked “hallucinated” packages.

High Risk: 1
AI Threats & Mitigation: Blocks hallucinated or typosquatted packages and dependency-chain attacks (ChatGPT Package Hallucination; compromised PyTorch chain).
Principles: I | J | T
KPI: % of AI-related software dependencies are scanned for integrity and malicious indicators.

Guidance to Implement

Enforce signature verification at ingestion. Maintain a list of trusted publishers & keys. Log/ block unsigned or unverified models.

Guidance to Audit

Inspect registry logs for unsigned download attempts. Check publisher-verification records & revocation handling.

High Risk: 1
AI Threats & Mitigation: Prevents malicious or backdoored model uploads such as PoisonGPT and compromised Hugging Face models.
Principles: I | J | T
KPI: % of external models are verified and signed before being used in production.

Guidance to Implement

Implement granular consent flags; propagate to feature store and model registry.

Guidance to Audit

Inspect consent database and lineage tags for a random 10 records.

High Risk: 1
AI Threats & Mitigation: Mitigates risk of “shadow profiles” or unauthorized AI personalization.
Principles: A | I | T
KPI: % of user consents are captured and linked to datasets used for AI training.

Guidance to Implement

Data‑curation pipeline enforces schema + policy; nightly scan flags violations.

Guidance to Audit

Review last scan report; confirm zero critical hits.

High Risk: 1
AI Threats & Mitigation: Lowers breach impact and model inversion success.
Principles: I | E
KPI: % of data curations comply with schema and pseudonymization policies.

Guidance to Implement

Tag datasets with TTL; schedule deletion jobs; log hash of purged sets.

Guidance to Audit

Cross‑check 3 purged hashes against deletion ledger.

High Risk: 1
AI Threats & Mitigation: Limits training on stale or revoked data; meets storage‑limitation laws.
Principles: I | T
KPI: % of datasets are tagged with TTL and automatically purged when expired.

Guidance to Implement

Define data quality metrics; automate validation pipelines; maintain quality scorecard.

Guidance to Audit

Sample training sets for quality issues; review rejection logs; test data validation gates.

High Risk: 1
AI Threats & Mitigation: Reduces model hallucinations and bias from poor data; improves performance.
Principles: I | J
KPI: % of datasets used for AI models meet quality standards to reduce bias and errors.