Implementation and audit guidance for secure handling of data across its lifecycle.
AIJET Principles: A = Awareness I = Integrity J = Judgment E = Ethics T = Transparency
ID | Requirement | Guidance to implement | Guidance to audit | AI Threats and Mitigation | Principles | KPI |
---|---|---|---|---|---|---|
DAT-01 | Ensure all users use individual accounts. Shared or group credentials are strictly prohibited | Deploy centralized identity management (e.g., Azure AD, Okta) to enforce user uniqueness and integrate with SaaS SSO where possible. | Search for duplicate login patterns, account naming anomalies (e.g., ‘admin’, ‘user1’), or reused credentials across users in SIEM and IAM systems. | Strictly enforce individual accounts to prevent AI tools from harvesting data through shared or generic accounts. | I | J | T | X% of users use individual accounts for system access. |
DAT-02 | Assign access rights based on the principle of least privilege, aligned with each business function | Implement role-based access controls and perform periodic reviews of all access rights. | Access control audit reports and review meeting minutes. | Ensure role-based access accounts for datasets that could be misused for AI model training, profiling, or generation. | A | J | T | X% of employees have access according to their roles, with regular access reviews. |
DAT-03 | Perform a comprehensive review of all user access rights at least annually | Schedule annual reviews to assess and document user access rights. | Annual review reports approved by the data owner. | Annual access reviews must detect any unauthorized access by AI-based crawlers, bots, or automated data collectors. | J | T | X% of user access rights reviewed annually for compliance. |
DAT-04 | Review privileged and sensitive access rights at least semi-annually | Conduct semi-annual reviews for high-privilege accounts using automated alerts. | Review logs and checklist records. | Privileged access reviews should flag accounts interacting with AI services that could misuse sensitive datasets. | J | T | X% of high-privilege access is reviewed semi-annually, including AI-related access risks. |
DAT-05 | Implement real-time monitoring for access to critical systems and data | Deploy continuous monitoring solutions with real-time alert capabilities for critical systems. | Monitoring dashboards and alert logs. | Real-time monitoring must detect large-volume, AI-like scraping behaviors or suspicious model training attempts. | A | T | X% of critical systems are monitored in real-time with alerts for unusual activities. |
DAT-06 | Ensure temporary access rights are automatically time-bound by design | Utilize systems that enforce expiration dates on temporary access and perform regular audits. | Temporary access logs and system configuration records. | Temporary access rights must expire rapidly to prevent silent AI-driven data harvesting windows. | T | X% of temporary access rights are time-bound and automatically revoked after use. |
DAT-07 | Implement an approval workflow for all access requests to sensitive data | Adopt a formal workflow system integrated with your IAM solution to track and approve access requests. | Approval logs and IAM audit reports. | Approval workflows should explicitly include AI exposure risk evaluation before granting access to sensitive data. | J | T | X% of sensitive data access requests go through an approval workflow with AI risk evaluation. |
DAT-08 | Prohibit sharing of sensitive company data with external AI models without explicit legal clearance. | Update Acceptable Use Policies to include LLM restrictions and monitor outbound data usage on corporate networks. | Review data access logs, endpoint monitoring reports, and staff acknowledgment forms. | Addresses OWASP LLM02:2025 by preventing leakage of sensitive data via generative AI platforms. | J | T | X% compliance with policies prohibiting unauthorized sharing of sensitive data with AI models. |
DAT-09 | Assign data owners for each critical dataset and define their responsibilities | Document data ownership roles in a formal matrix and integrate them into data governance. | Data ownership matrices and governance meeting minutes. | Assign data owners to oversee datasets critical to AI system training and maintain logs on dataset usage. | A | T | X% of critical datasets have assigned and accountable data owners. |
DAT-10 | Review and validate data ownership annually as part of governance | Conduct annual reviews of data ownership as part of overall governance processes and update records accordingly. | Annual review reports and updated ownership records. | Validate annually whether dataset owners control access to AI training data and sensitive personal information. | A | J | T | X% of data ownership records are reviewed and updated annually. |
DAT-11 | Data owners are aware of their responsibilities | Integrate data ownership training into onboarding and require annual re-certification from data owners. | Signed acknowledgment forms and training attendance records. | Ensure data owners receive updated training on risks of contributing organizational data to external AI systems. | A | T | X% of data owners complete annual re-certification with updated training on AI risks. |
DAT-12 | Evaluate employee understanding of the organization’s data classification policy through periodic assessments | Include data classification questions in regular assessments and address identified gaps with targeted training. | Assessment results and remediation plans. | Ensure assessments include evaluation of employee understanding of risks related to AI misuse of classified data. | A | X% of employees score ≥ Y% on data classification assessments, with follow-up training where necessary. |
DAT-13 | Follow documented procedures for secure data manipulation | Distribute detailed procedures for secure data handling and conduct regular training sessions. | Procedure documentation and training logs. | Ensure procedures emphasize integrity during data handling to protect against AI-assisted data tampering. | A | T | X% of employees handling sensitive data follow documented secure data handling procedures. |
DAT-14 | Ensure employees understand and comply with the applicable data retention policy | Regularly review and update the data retention policy and include it in mandatory training sessions. | Policy distribution records and compliance audit reports. | Comply with retention policies to reduce the risk of old sensitive data being exposed to AI scraping. | A | J | T | X% compliance with the data retention policy, reducing AI scraping risks. |
DAT-15 | Securely destroy sensitive data using approved corporate methods | Implement approved data destruction methods and schedule periodic audits to verify compliance. | Destruction logs and audit reports. | Destruction methods should prevent AI-driven data reconstruction from disposed media. | I | T | X% of sensitive data is destroyed using approved methods with audit logs. |
DAT-16 | Ensure that all SaaS platforms used for sensitive data are approved by the company | Maintain a list of approved SaaS platforms and perform regular vendor security reviews. | Vendor approval records and audit logs. | Approved SaaS platforms must undergo AI risk reviews to prevent data leakage into third-party models. | J | T | X% of SaaS platforms used for sensitive data handling are approved and comply with AI-related risks. |
DAT-17 | Use of Generative AI platforms is limited to those explicitly approved by the company for secure data handling | Deploy application detection in endpoint agents and web gateways to flag unsanctioned AI tool use. Maintain an allowlist of approved AI platforms with controlled access. | Analyze outbound connections and application usage reports to detect unauthorized AI platforms and correlate with departments or user roles. | Restrict use of generative AI platforms unless evaluated for secure data handling and privacy protection. | E | J | T | X% compliance with approved Generative AI platforms for data handling. |
DAT-18 | Mandate that all data exports outside the organization follow a company-approved process | Implement an export approval process integrated with DLP tools to monitor and document data exports. | Export approval logs and DLP reports. | Monitor all data exports to block unauthorized flows that may be used for AI training or exfiltration. | A | J | T | X% of data exports undergo approval with DLP monitoring to block unauthorized flows. |
DAT-19 | Transmit classified data only using approved secure protocols | Enforce secure transmission protocols via network controls and conduct periodic audits. | Protocol configuration records and audit logs. | Use end-to-end encrypted and authenticated protocols to prevent AI-facilitated data interception. | T | X% of classified data transmissions use end-to-end encrypted and secure protocols. |
DAT-20 | Users have guidance regarding social media | Develop social media usage policies that include security best practices and distribute them. | Policy documents and training session records. | Include risks of AI-assisted social engineering in social media security training. | A | T | X% of employees comply with social media security policies, including AI-related risks. |
DAT-21 | Users have guidance regarding personal webmail and webdrive | Clarify acceptable use policies for personal email and cloud storage; monitor usage for compliance. | Policy documents and usage logs. | Define acceptable AI interactions with personal platforms to reduce risk of unmonitored data exfiltration. | T | X% compliance with personal webmail and cloud storage policies, including AI interactions. |
DAT-22 | Uploads and use of external drives are restricted | Implement DLP solutions to monitor data transfers and deliver clear training on data handling responsibilities. | DLP reports and training attendance records. | Detect and block attempts to use external drives for AI dataset injection or model theft. | A | T | X% of external drive usage is restricted and monitored for AI-related risks. |
DAT-23 | Ensure cross-border data transfers comply with applicable regulations (e.g. HIPAA, GDPR…) | Review and document cross-border data transfer processes to ensure they meet all applicable regulatory requirements. | Compliance audit reports and transfer logs. | Ensure legal safeguards to prevent global AI actors from training on sensitive exported datasets. | A | E | J | T | X% of cross-border data transfers comply with regulatory and AI-related data handling laws. |
DAT-24 | Implement regular monitoring to detect shadow copies of sensitive data on unauthorized devices | Deploy automated DLP tools to scan for unauthorized shadow data and schedule regular remediation reviews. | DLP scan reports and remediation records. | DLP tools should detect AI attempts to duplicate sensitive data onto shadow locations. | J | T | X% of shadow data copies are detected and remediated using DLP tools. |
DAT-25 | Regularly monitor the internet to detect unsanctioned shadow copies of sensitive data | Utilize external monitoring services to detect unsanctioned copies of sensitive data and document findings. | External monitoring reports and remediation actions. | External monitoring must catch unsanctioned use of organizational data in public AI datasets. | T | X% of unsanctioned use of organizational data in public AI datasets is detected. |
DAT-26 | Log and monitor all outbound data transfers | Implement comprehensive logging for all outbound data transfers and analyze logs for anomalies. | Outbound transfer logs and review reports. | Monitor outbound flows to detect suspicious AI training dataset creation or behavior. | A | J | T | X% of outbound data transfers are logged and analyzed for AI dataset creation or pre-training activity. |
DAT-27 | Trigger alerts for high-risk data exports | Configure automated alerts based on predefined high-risk thresholds for data exports. | Alert logs and threshold configuration documentation. | Configure alerts for data volumes typical of AI model feeding or unauthorized pre-training dumps. | A | T | X% of high-risk data exports trigger alerts based on predefined AI-specific thresholds. |
DAT-28 | Implement validation to detect unauthorized data modification | Use automated checksum and hash validation tools integrated into backup and monitoring processes. | Checksum logs and backup verification reports. | Hash validation helps detect AI-led tampering or adversarial poisoning of backup datasets. | I | J | T | X% of critical data is validated for integrity with automated tools to prevent AI tampering. |
DAT-29 | Log and review changes to critical data assets with audit trails | Maintain detailed audit trails for changes to critical data assets and review them regularly | Audit logs and review meeting minutes. | Audit trail reviews should include changes consistent with AI-generated activity or scripting. | J | T | X% of changes to critical data assets are logged and reviewed for AI-generated activities. |
DAT-30 | Maintain a regularly updated inventory of sensitive data assets and their locations | Use automated data discovery tools to continuously update an inventory of sensitive data assets; review quarterly. | Data inventory reports and audit logs. | Maintain an up-to-date inventory to trace data usage in AI environments and assess exposure. | J | T | X% of sensitive data assets are tracked and inventoried for AI-related exposure risks. |
DAT-31 | Validate and sanitize datasets used in internal or vendor-facing AI tools. | Use anomaly detection, data provenance, and diversity checks before including any dataset in model pipelines. | Maintain dataset validation reports and review version control for any training data updates. | Addresses MITRE ATLAS T0011: Data Poisoning by enforcing data quality control before training LLMs. | I | J | T | X% of datasets are validated for quality and sanitized before use in AI models. |
DAT-32 | Document lineage for internal or vendor AI model (dataset source, preprocessing steps, fine-tuning logs). | Maintain a signed “Model Lineage Card” in the data-catalogue; update on every retrain. | Verify card completeness; cross-check hash of training data vs. stored checksum. | Limits “Unknown Model Goals / Shadow Retraining” (MITRE T0006). | I | J | T | X% of models and datasets have documented provenance for accountability. |
DAT-33 | Verify the origin and integrity of all pre-trained AI models and datasets before initial use and each subsequent update. | Tier 1: Accept only models from registries that support signed artifacts (e.g., Hugging Face with TUF). Tier 2: For high-risk use cases, enforce reproducibility and SBOM traceability. | Confirm registry enforcement, signed artifact settings, and verify a representative sample. Review exception logs for unsigned artifacts. | Blocks data-/model-poisoning and backdoored model supply-chain attacks by ensuring tampered or rogue assets cannot enter the environment. | I | J | T | X% of datasets and models are verified for integrity using cryptographic checksums. |
DAT-34 | Track full provenance metadata (source, licence, revision, hashes) for every model and dataset throughout the ML lifecycle. | Automate provenance capture in the ML pipeline (e.g., DVC, MLflow). Store metadata in an immutable repository accessible to Risk & Compliance. Flag any asset whose lineage cannot be fully resolved. | Review pipeline logs to verify that provenance records are generated for each new version. Spot-check metadata completeness (source URL, licence, checksums, approving owner). | Facilitates rapid incident triage if a model is later found vulnerable; supports audits against licence and regulatory obligations. | I | J | T | X% of models and datasets have complete lineage documentation for audit purposes. |
DAT-35 | Scan all AI-related software dependencies (PyPI/NPM, model weight files) for integrity and malicious indicators; block hallucinated or typosquatted package names before installation. | Integrate software-composition analysis (SCA) and AV scanning in CI/CD. Enforce allow-listing of trusted model registries. Reject package names not present in curated repositories. | Review SCA reports for high-severity issues. Verify build logs show hash checks & signature validation. Spot-check blocked “hallucinated” packages. | Blocks hallucinated or typosquatted packages and dependency-chain attacks (ChatGPT Package Hallucination, compromised PyTorch chain). | I | J | T | X% of AI-related software dependencies are scanned for integrity and malicious indicators. |
DAT-36 | Require cryptographic signing and verified publisher identity for every external pre-trained model (e.g., Hugging Face) before production use. | Enforce signature verification at ingestion. Maintain a list of trusted publishers & keys. Log/ block unsigned or unverified models. | Inspect registry logs for unsigned download attempts. Check publisher-verification records & revocation handling. | Prevents malicious or backdoored model uploads such as PoisonGPT and compromised Hugging Face models. | I | J | T | X% of external models are verified and signed before being used in production. |
DAT-37 | Capture and version user consents; link to downstream datasets and models. | Implement granular consent flags; propagate to feature store and model registry. | Inspect consent database and lineage tags for a random 10 records. | Mitigates risk of “shadow profiles” or unauthorized AI personalization. | A | I | T | X% of user consents are captured and linked to datasets used for AI training. |
DAT-38 | Strip unnecessary fields before training; pseudonymize whenever possible | Data‑curation pipeline enforces schema + policy; nightly scan flags violations. | Review last scan report; confirm zero critical hits. | Lowers breach impact and model inversion success. | I | E | X% of data curations comply with schema and pseudonymization policies. |
DAT-39 | Enforce TTLs for raw, processed, and model‑ready data; auto‑purge on expiry. | Tag datasets with TTL; schedule deletion jobs; log hash of purged sets. | Cross‑check 3 purged hashes against deletion ledger. | Limits training on stale or revoked data; meets storage‑limitation laws. | I | T | X% of datasets are tagged with TTL and automatically purged when expired. |
DAT-40 | Implement data quality framework for training, validation, and testing datasets | Define data quality metrics; automate validation pipelines; maintain quality scorecard. | Sample training sets for quality issues; review rejection logs; test data validation gates. | Reduces model hallucinations and bias from poor data; improves performance. | I | J | X% of datasets used for AI models meet quality standards to reduce bias and errors. |