5 – Data Management – Human CyberSecurity Knowledge

Implementation and audit guidance for secure handling of data across its lifecycle.

AIJET Principles: A = Awareness I = Integrity J = Judgment E = Ethics T = Transparency

ID	Requirement	Guidance to implement	Guidance to audit	AI Threats and Mitigation	Principles	KPI
DAT-01	Ensure all users use individual accounts. Shared or group credentials are strictly prohibited	Deploy centralized identity management (e.g., Azure AD, Okta) to enforce user uniqueness and integrate with SaaS SSO where possible.	Search for duplicate login patterns, account naming anomalies (e.g., ‘admin’, ‘user1’), or reused credentials across users in SIEM and IAM systems.	Strictly enforce individual accounts to prevent AI tools from harvesting data through shared or generic accounts.	I \| J \| T	X% of users use individual accounts for system access.
DAT-02	Assign access rights based on the principle of least privilege, aligned with each business function	Implement role-based access controls and perform periodic reviews of all access rights.	Access control audit reports and review meeting minutes.	Ensure role-based access accounts for datasets that could be misused for AI model training, profiling, or generation.	A \| J \| T	X% of employees have access according to their roles, with regular access reviews.
DAT-03	Perform a comprehensive review of all user access rights at least annually	Schedule annual reviews to assess and document user access rights.	Annual review reports approved by the data owner.	Annual access reviews must detect any unauthorized access by AI-based crawlers, bots, or automated data collectors.	J \| T	X% of user access rights reviewed annually for compliance.
DAT-04	Review privileged and sensitive access rights at least semi-annually	Conduct semi-annual reviews for high-privilege accounts using automated alerts.	Review logs and checklist records.	Privileged access reviews should flag accounts interacting with AI services that could misuse sensitive datasets.	J \| T	X% of high-privilege access is reviewed semi-annually, including AI-related access risks.
DAT-05	Implement real-time monitoring for access to critical systems and data	Deploy continuous monitoring solutions with real-time alert capabilities for critical systems.	Monitoring dashboards and alert logs.	Real-time monitoring must detect large-volume, AI-like scraping behaviors or suspicious model training attempts.	A \| T	X% of critical systems are monitored in real-time with alerts for unusual activities.
DAT-06	Ensure temporary access rights are automatically time-bound by design	Utilize systems that enforce expiration dates on temporary access and perform regular audits.	Temporary access logs and system configuration records.	Temporary access rights must expire rapidly to prevent silent AI-driven data harvesting windows.	T	X% of temporary access rights are time-bound and automatically revoked after use.
DAT-07	Implement an approval workflow for all access requests to sensitive data	Adopt a formal workflow system integrated with your IAM solution to track and approve access requests.	Approval logs and IAM audit reports.	Approval workflows should explicitly include AI exposure risk evaluation before granting access to sensitive data.	J \| T	X% of sensitive data access requests go through an approval workflow with AI risk evaluation.
DAT-08	Prohibit sharing of sensitive company data with external AI models without explicit legal clearance.	Update Acceptable Use Policies to include LLM restrictions and monitor outbound data usage on corporate networks.	Review data access logs, endpoint monitoring reports, and staff acknowledgment forms.	Addresses OWASP LLM02:2025 by preventing leakage of sensitive data via generative AI platforms.	J \| T	X% compliance with policies prohibiting unauthorized sharing of sensitive data with AI models.
DAT-09	Assign data owners for each critical dataset and define their responsibilities	Document data ownership roles in a formal matrix and integrate them into data governance.	Data ownership matrices and governance meeting minutes.	Assign data owners to oversee datasets critical to AI system training and maintain logs on dataset usage.	A \| T	X% of critical datasets have assigned and accountable data owners.
DAT-10	Review and validate data ownership annually as part of governance	Conduct annual reviews of data ownership as part of overall governance processes and update records accordingly.	Annual review reports and updated ownership records.	Validate annually whether dataset owners control access to AI training data and sensitive personal information.	A \| J \| T	X% of data ownership records are reviewed and updated annually.
DAT-11	Data owners are aware of their responsibilities	Integrate data ownership training into onboarding and require annual re-certification from data owners.	Signed acknowledgment forms and training attendance records.	Ensure data owners receive updated training on risks of contributing organizational data to external AI systems.	A \| T	X% of data owners complete annual re-certification with updated training on AI risks.
DAT-12	Evaluate employee understanding of the organization’s data classification policy through periodic assessments	Include data classification questions in regular assessments and address identified gaps with targeted training.	Assessment results and remediation plans.	Ensure assessments include evaluation of employee understanding of risks related to AI misuse of classified data.	A	X% of employees score ≥ Y% on data classification assessments, with follow-up training where necessary.
DAT-13	Follow documented procedures for secure data manipulation	Distribute detailed procedures for secure data handling and conduct regular training sessions.	Procedure documentation and training logs.	Ensure procedures emphasize integrity during data handling to protect against AI-assisted data tampering.	A \| T	X% of employees handling sensitive data follow documented secure data handling procedures.
DAT-14	Ensure employees understand and comply with the applicable data retention policy	Regularly review and update the data retention policy and include it in mandatory training sessions.	Policy distribution records and compliance audit reports.	Comply with retention policies to reduce the risk of old sensitive data being exposed to AI scraping.	A \| J \| T	X% compliance with the data retention policy, reducing AI scraping risks.
DAT-15	Securely destroy sensitive data using approved corporate methods	Implement approved data destruction methods and schedule periodic audits to verify compliance.	Destruction logs and audit reports.	Destruction methods should prevent AI-driven data reconstruction from disposed media.	I \| T	X% of sensitive data is destroyed using approved methods with audit logs.
DAT-16	Ensure that all SaaS platforms used for sensitive data are approved by the company	Maintain a list of approved SaaS platforms and perform regular vendor security reviews.	Vendor approval records and audit logs.	Approved SaaS platforms must undergo AI risk reviews to prevent data leakage into third-party models.	J \| T	X% of SaaS platforms used for sensitive data handling are approved and comply with AI-related risks.
DAT-17	Use of Generative AI platforms is limited to those explicitly approved by the company for secure data handling	Deploy application detection in endpoint agents and web gateways to flag unsanctioned AI tool use. Maintain an allowlist of approved AI platforms with controlled access.	Analyze outbound connections and application usage reports to detect unauthorized AI platforms and correlate with departments or user roles.	Restrict use of generative AI platforms unless evaluated for secure data handling and privacy protection.	E \| J \| T	X% compliance with approved Generative AI platforms for data handling.
DAT-18	Mandate that all data exports outside the organization follow a company-approved process	Implement an export approval process integrated with DLP tools to monitor and document data exports.	Export approval logs and DLP reports.	Monitor all data exports to block unauthorized flows that may be used for AI training or exfiltration.	A \| J \| T	X% of data exports undergo approval with DLP monitoring to block unauthorized flows.
DAT-19	Transmit classified data only using approved secure protocols	Enforce secure transmission protocols via network controls and conduct periodic audits.	Protocol configuration records and audit logs.	Use end-to-end encrypted and authenticated protocols to prevent AI-facilitated data interception.	T	X% of classified data transmissions use end-to-end encrypted and secure protocols.
DAT-20	Users have guidance regarding social media	Develop social media usage policies that include security best practices and distribute them.	Policy documents and training session records.	Include risks of AI-assisted social engineering in social media security training.	A \| T	X% of employees comply with social media security policies, including AI-related risks.
DAT-21	Users have guidance regarding personal webmail and webdrive	Clarify acceptable use policies for personal email and cloud storage; monitor usage for compliance.	Policy documents and usage logs.	Define acceptable AI interactions with personal platforms to reduce risk of unmonitored data exfiltration.	T	X% compliance with personal webmail and cloud storage policies, including AI interactions.
DAT-22	Uploads and use of external drives are restricted	Implement DLP solutions to monitor data transfers and deliver clear training on data handling responsibilities.	DLP reports and training attendance records.	Detect and block attempts to use external drives for AI dataset injection or model theft.	A \| T	X% of external drive usage is restricted and monitored for AI-related risks.
DAT-23	Ensure cross-border data transfers comply with applicable regulations (e.g. HIPAA, GDPR…)	Review and document cross-border data transfer processes to ensure they meet all applicable regulatory requirements.	Compliance audit reports and transfer logs.	Ensure legal safeguards to prevent global AI actors from training on sensitive exported datasets.	A \| E \| J \| T	X% of cross-border data transfers comply with regulatory and AI-related data handling laws.
DAT-24	Implement regular monitoring to detect shadow copies of sensitive data on unauthorized devices	Deploy automated DLP tools to scan for unauthorized shadow data and schedule regular remediation reviews.	DLP scan reports and remediation records.	DLP tools should detect AI attempts to duplicate sensitive data onto shadow locations.	J \| T	X% of shadow data copies are detected and remediated using DLP tools.
DAT-25	Regularly monitor the internet to detect unsanctioned shadow copies of sensitive data	Utilize external monitoring services to detect unsanctioned copies of sensitive data and document findings.	External monitoring reports and remediation actions.	External monitoring must catch unsanctioned use of organizational data in public AI datasets.	T	X% of unsanctioned use of organizational data in public AI datasets is detected.
DAT-26	Log and monitor all outbound data transfers	Implement comprehensive logging for all outbound data transfers and analyze logs for anomalies.	Outbound transfer logs and review reports.	Monitor outbound flows to detect suspicious AI training dataset creation or behavior.	A \| J \| T	X% of outbound data transfers are logged and analyzed for AI dataset creation or pre-training activity.
DAT-27	Trigger alerts for high-risk data exports	Configure automated alerts based on predefined high-risk thresholds for data exports.	Alert logs and threshold configuration documentation.	Configure alerts for data volumes typical of AI model feeding or unauthorized pre-training dumps.	A \| T	X% of high-risk data exports trigger alerts based on predefined AI-specific thresholds.
DAT-28	Implement validation to detect unauthorized data modification	Use automated checksum and hash validation tools integrated into backup and monitoring processes.	Checksum logs and backup verification reports.	Hash validation helps detect AI-led tampering or adversarial poisoning of backup datasets.	I \| J \| T	X% of critical data is validated for integrity with automated tools to prevent AI tampering.
DAT-29	Log and review changes to critical data assets with audit trails	Maintain detailed audit trails for changes to critical data assets and review them regularly	Audit logs and review meeting minutes.	Audit trail reviews should include changes consistent with AI-generated activity or scripting.	J \| T	X% of changes to critical data assets are logged and reviewed for AI-generated activities.
DAT-30	Maintain a regularly updated inventory of sensitive data assets and their locations	Use automated data discovery tools to continuously update an inventory of sensitive data assets; review quarterly.	Data inventory reports and audit logs.	Maintain an up-to-date inventory to trace data usage in AI environments and assess exposure.	J \| T	X% of sensitive data assets are tracked and inventoried for AI-related exposure risks.
DAT-31	Validate and sanitize datasets used in internal or vendor-facing AI tools.	Use anomaly detection, data provenance, and diversity checks before including any dataset in model pipelines.	Maintain dataset validation reports and review version control for any training data updates.	Addresses MITRE ATLAS T0011: Data Poisoning by enforcing data quality control before training LLMs.	I \| J \| T	X% of datasets are validated for quality and sanitized before use in AI models.
DAT-32	Document lineage for internal or vendor AI model (dataset source, preprocessing steps, fine-tuning logs).	Maintain a signed “Model Lineage Card” in the data-catalogue; update on every retrain.	Verify card completeness; cross-check hash of training data vs. stored checksum.	Limits “Unknown Model Goals / Shadow Retraining” (MITRE T0006).	I \| J \| T	X% of models and datasets have documented provenance for accountability.
DAT-33	Verify the origin and integrity of all pre-trained AI models and datasets before initial use and each subsequent update.	Tier 1: Accept only models from registries that support signed artifacts (e.g., Hugging Face with TUF). Tier 2: For high-risk use cases, enforce reproducibility and SBOM traceability.	Confirm registry enforcement, signed artifact settings, and verify a representative sample. Review exception logs for unsigned artifacts.	Blocks data-/model-poisoning and backdoored model supply-chain attacks by ensuring tampered or rogue assets cannot enter the environment.	I \| J \| T	X% of datasets and models are verified for integrity using cryptographic checksums.
DAT-34	Track full provenance metadata (source, licence, revision, hashes) for every model and dataset throughout the ML lifecycle.	Automate provenance capture in the ML pipeline (e.g., DVC, MLflow). Store metadata in an immutable repository accessible to Risk & Compliance. Flag any asset whose lineage cannot be fully resolved.	Review pipeline logs to verify that provenance records are generated for each new version. Spot-check metadata completeness (source URL, licence, checksums, approving owner).	Facilitates rapid incident triage if a model is later found vulnerable; supports audits against licence and regulatory obligations.	I \| J \| T	X% of models and datasets have complete lineage documentation for audit purposes.
DAT-35	Scan all AI-related software dependencies (PyPI/NPM, model weight files) for integrity and malicious indicators; block hallucinated or typosquatted package names before installation.	Integrate software-composition analysis (SCA) and AV scanning in CI/CD. Enforce allow-listing of trusted model registries. Reject package names not present in curated repositories.	Review SCA reports for high-severity issues. Verify build logs show hash checks & signature validation. Spot-check blocked “hallucinated” packages.	Blocks hallucinated or typosquatted packages and dependency-chain attacks (ChatGPT Package Hallucination, compromised PyTorch chain).	I \| J \| T	X% of AI-related software dependencies are scanned for integrity and malicious indicators.
DAT-36	Require cryptographic signing and verified publisher identity for every external pre-trained model (e.g., Hugging Face) before production use.	Enforce signature verification at ingestion. Maintain a list of trusted publishers & keys. Log/ block unsigned or unverified models.	Inspect registry logs for unsigned download attempts. Check publisher-verification records & revocation handling.	Prevents malicious or backdoored model uploads such as PoisonGPT and compromised Hugging Face models.	I \| J \| T	X% of external models are verified and signed before being used in production.
DAT-37	Capture and version user consents; link to downstream datasets and models.	Implement granular consent flags; propagate to feature store and model registry.	Inspect consent database and lineage tags for a random 10 records.	Mitigates risk of “shadow profiles” or unauthorized AI personalization.	A \| I \| T	X% of user consents are captured and linked to datasets used for AI training.
DAT-38	Strip unnecessary fields before training; pseudonymize whenever possible	Data‑curation pipeline enforces schema + policy; nightly scan flags violations.	Review last scan report; confirm zero critical hits.	Lowers breach impact and model inversion success.	I \| E	X% of data curations comply with schema and pseudonymization policies.
DAT-39	Enforce TTLs for raw, processed, and model‑ready data; auto‑purge on expiry.	Tag datasets with TTL; schedule deletion jobs; log hash of purged sets.	Cross‑check 3 purged hashes against deletion ledger.	Limits training on stale or revoked data; meets storage‑limitation laws.	I \| T	X% of datasets are tagged with TTL and automatically purged when expired.
DAT-40	Implement data quality framework for training, validation, and testing datasets	Define data quality metrics; automate validation pipelines; maintain quality scorecard.	Sample training sets for quality issues; review rejection logs; test data validation gates.	Reduces model hallucinations and bias from poor data; improves performance.	I \| J	X% of datasets used for AI models meet quality standards to reduce bias and errors.

5 – Data Management

Designed For All