TL;DR

Gartner predicts 60% of AI projects will be abandoned because the data wasn’t ready. The bottleneck isn’t models or GPUs. It’s the petabytes of unstructured data trapped across NFS shares, S3 buckets, and tape archives that can’t be moved safely. AI readiness is a data migration problem.

Every enterprise AI conversation starts with the same assumption: we have the data, we just need the model.

The numbers tell a different story.

The Data Readiness Gap

Metric	Source
80% of enterprise data is unstructured	Gartner
60% of AI projects will be abandoned due to data not being AI-ready (through 2026)	Gartner
63% of organizations lack the right data management practices for AI	Gartner (Q3 2024 survey)
Unstructured data will triple between 2023 and 2026	IDC Global DataSphere Forecast
Organizations with successful AI invest 4x more in data foundations	Gartner (April 2026)

The gap between “we have data” and “our data is AI-ready” is enormous. And it’s growing.

Does This Sound Familiar?

Before you read on, a quick check:

Your AI team asked for “all the research data” and nobody knew where it all was
You copied terabytes to S3 and discovered permissions were gone afterward
Your migration “finished without errors” but nobody verified the file counts matched
You have data on NAS shares that hasn’t been touched in years, but nobody knows if it’s safe to delete
Two departments store files on the same NAS, with different permission models, and both want to “move to the cloud”

If any of these hit close to home, you’re dealing with the same problem 63% of enterprises are facing. Keep reading.

Where the Data Actually Lives

Unstructured data doesn’t sit in one place. In a typical enterprise, it’s spread across:

NFS shares on on-prem NAS (NetApp, Isilon, QNAP, Synology)
SMB file servers (Windows, Samba, legacy departmental shares)
S3-compatible object storage (AWS S3, MinIO, Wasabi, Backblaze B2)
Cloud file services (Azure Files, Amazon EFS, Google Cloud Filestore)
Tape archives (LTO via LTFS, vendor-specific HSM systems)
SaaS platforms (SharePoint, Google Drive, Dropbox Business)

Each system stores metadata differently. Permissions work differently. Timestamps have different precision. File naming conventions and path length limits vary.

This is not a storage problem. This is a data mobility problem.

Why “Just Copy It” Doesn’t Work

When an AI team says “we need all the research data on that NFS share copied to our S3 training bucket,” the request sounds simple. It isn’t.

Permissions Break Across Protocols

An NFS share uses POSIX permissions: owner, group, mode bits, maybe NFSv4 ACLs. S3 uses IAM policies and bucket policies. There’s no 1:1 mapping. If you lose the permission metadata during the copy, you’ve created a compliance gap: files that were restricted on NFS are now accessible to anyone with bucket access.

Metadata Gets Lost

File timestamps, extended attributes, POSIX ACLs, hard-link relationships: standard copy tools drop some or all of these. For AI training data, this metadata is often critical. When was this medical image captured? Who approved this document? Which version is canonical when three copies exist with the same name?

Verification Is an Afterthought

Most copy operations end with “it finished without errors.” That’s not verification. Verification means: every file that existed on the source exists on the destination, with matching content (checksums), matching metadata, and matching directory structure. At petabyte scale, with millions of files, this is a non-trivial operation.

Scale Changes Everything

Moving 100 files is trivial. Moving 100 million files across protocols is an engineering challenge. Flat directories with millions of entries. Network timeouts. Partial failures. Files changing on the source during the migration. Resumability after interruption. These are the real-world conditions at enterprise scale.

What Most Teams Do vs. What Actually Works

	The Common Approach	What Actually Works
Discovery	`du -sh /data` and a spreadsheet	Structured scan: file types, size distribution, age profile, permissions audit, duplicate detection
Migration	rsync, robocopy, or a weekend script	Cross-protocol transfer with metadata preservation, incremental sync, parallel workers
Verification	”It finished without errors”	Checksum verification, file count match, metadata comparison, audit report
Permissions	”We’ll fix them after”	Permission mapping preserved during transfer, compliance maintained
Failure handling	Start over	Resumable transfers, per-file error tracking, partial completion support

The Three Steps Nobody Talks About

Before any AI model can train on enterprise unstructured data, three things need to happen:

1. Discovery: Know What You Have

You can’t prepare data you haven’t inventoried. A proper discovery scan answers:

How much data, across how many files and directories?
What types? (Documents, media, databases, archives, code)
How is it distributed by size? (Millions of tiny files vs. thousands of large ones require different transfer strategies)
What’s the age profile? (Active data vs. cold archive)
Are there duplicates consuming storage?
What permissions and ownership patterns exist?

This isn’t du -sh /data. It’s a structured analysis that informs every decision downstream.

2. Migration: Move It Safely

“Safely” means:

Cross-protocol fidelity: NFS to S3, SMB to NFS, S3 to S3. Each combination has different metadata handling requirements.
Metadata preservation: Permissions, timestamps, extended attributes, ACLs. Not just file content.
Incremental capability: Move the bulk, then sync the delta. Don’t re-transfer petabytes because 0.1% changed.
Resumability: When (not if) the network drops, pick up where you stopped. Don’t restart.
Parallelism: Single-threaded copy at petabyte scale means waiting weeks. Work-stealing across multiple workers means days.

3. Verification: Prove It Worked

After the migration completes, you need proof:

File count match: Source and destination have the same number of files and directories.
Checksum verification: Content integrity confirmed, not assumed.
Metadata comparison: Permissions, timestamps, and attributes match.
Audit trail: A report showing exactly what was moved, what changed, and what (if anything) failed.

Without this, “the copy finished” is a hope, not a fact.

What Changes When Data Is AI-Ready

Organizations that solve the data mobility problem unlock real capabilities:

Consolidation: Scattered departmental NFS shares and S3 buckets become a unified, accessible dataset. AI teams stop waiting for “can someone export that data and put it on the shared drive?”

Tiering: Hot data stays on fast NAS for active training. Completed datasets tier to S3 for cost efficiency. Archive copies go to tape for compliance retention. Each tier requires a migration path that preserves data integrity.

Governance: When you know where every file is, who owns it, and how it got there, compliance stops being a retroactive audit and starts being a continuous process.

Cost control: IDC reports that 85% of IT leaders are increasing storage spend in 2026. Moving the right data to the right tier, and decommissioning what’s no longer needed, is the most direct path to controlling that spend.

The Bottom Line

Gartner says organizations with successful AI initiatives invest four times more in data and analytics foundations than their peers. The emphasis is on “foundations,” not “models.”

The AI readiness conversation needs to shift from “which model should we use?” to “can we actually get our data to where the model can reach it, without breaking permissions, losing metadata, or creating compliance gaps?”

That’s a data migration problem. And it’s solvable.

See what you're working with

syncopio scans your NFS, SMB, and S3 storage and shows you exactly what you have: file types, size distribution, age profile, permissions, and potential issues. Before you move a single byte, you know what you’re dealing with. Then migrate with full metadata preservation, parallel transfers, and checksum verification.

Run a discovery scan on your data →

Sources: Gartner, “Lack of AI-Ready Data Puts AI Projects at Risk” (February 2025). IDC, “Global DataSphere Structured and Unstructured Data Forecast, 2024-2028.” Gartner, “Organizations with Successful AI Initiatives Invest Up to Four Times More in Data and Analytics Foundations” (April 2026).

Why 60% of Enterprise AI Projects Fail Before They Start