According to Jim McGann of Network World, “New data profiling technology makes it possible for organizations to reclaim storage capacity, archive data with business value, delete aged and abandoned data with no business value, tier content to other classes of storage and even manage storage charge backs with reliable statistics.” Unstructured data can be a headache for many IT personnel. Not only does it eat up data storage capacity but it can also contain unguarded, sensitive information.
Data profiling technology looks at all unstructured data and gives the enterprise a “searchable” map of metadata. Data profiling typically uses high-speed NFS or CIFS crawling to index files and repositories. According to Mr. McGann unstructured data can typically be classified as follows:
- Abandoned: owned by ex-employees and not accessed in many years.
- Aged: not accessed in three or more years.
- Redundant: duplicate content based on unique MD5 hash.
- Personal: multimedia files such as music libraries, photos and movies.
- Risk: sensitive content.
- Archive: data with long-term business value that must be preserved.
- Active: manage data in place to determine future disposition.
Unstructured data or “mystery data” through profiling can now be managed, deleted and moved with more clarity and accuracy. Mr. McGann also recommends that data profiling tools should contain a mechanism to validate, “to ensure that content is purged reliably and that is has not changed since it was profiled.”