Implementing Thin Provisioning on IBM XIV Storage

By Douglas Babcock, Storage Architect, Evolving Solutions

It’s perfectly acceptable to run a thick provisioned XIV system at 100% of the hard & soft capacity limits. Due to the XIV grid, performance is largely independent of capacity usage. A 1,000 I/O and 200MB/sec workload puts the same load on XIV whether it runs on a 200GB or 2TB logical volume, i.e. XIV was originally designed to accommodate thin provisioning and oversubscription.

For a thin provisioned and oversubscribed XIV system, it is recommended to keep some amount of hard capacity in reserve, in order to accommodate volume or pool capacity growth that is occurring faster than expected. The specific amount of reserved hard capacity varies by customer and their oversubscription risk acceptance. Personally, I would say a 15-20% reserve at the array level is reasonable.

So… If you’re running thick provisioned today and your system hard capacity utilization is only 30-35%, you’re a likely candidate for thin provisioning and oversubscription. If you’re running thick provisioned today and your system hard capacity utilization is 75-80%, then you likely won’t gain much by switching to or adding thin provisioned volumes.

As an example, one large XIV customer has policies in place where they will continue to add volumes to an XIV system until it reaches 60% utilization of hard capacity. They will begin to move applications off an array when the XIV system reaches 75-80% utilization (depends slightly on how rapidly the growth occurred and what steps are needed to migrate a workload). All this is tempered by performance, i.e. they may stop adding volumes or move volumes off the array whenever XIV average response times reach 5 ms, regardless of utilization.

Other considerations:

  • Risk management: Some customers won’t allow oversubscription for production volumes period, but will use it heavily for dev/test/QA volumes.
  • Performance: If you’re already seeing higher than acceptable response times, it doesn’t make sense to increase the aggregate array workload by adding thin provisioned capacity or volumes.
  • Rebuild times: Are only dependent on the current XIV hard capacity in use, i.e. the rebuild time on a system that has 75% of its hard capacity in use is the same duration whether the volumes are defined as 100% thick, 100% thin, or any mix of thick and thin.
  • Server touch points: Presenting larger thin provisioned volumes to servers can minimize the number of times XIV volumes need to be expanded and the corresponding need to run server level rescans or perform other LVM actions (less management). In this scenario, XIV pool utilization needs to be monitored and hard capacity can be added (from reserved) or moved (from other pools) into any pool seeing a faster than expected growth, without requiring any server level intervention (this is one of the benefits of thin provisioning).

Also, consider free capacity. Free capacity is all about where you measure it. If you create a volume on XIV:

  • Initially it takes no capacity (0% XIV utilization)
  • When you lay out a partition or file system on a volume, file system metadata will begin to consume XIV capacity. Some file systems initially lay out metadata at the beginning of the volume (1% XIV utilization) and others lay out metadata at fixed intervals across the volume’s capacity (up to 10% XIV utilization). Note that in any metadata scenario the file system will initially report 0% utilization.
  • As users add & delete files within a file system, different file systems may prefer to use “new” blocks first (NTFS) and others tend to immediately reclaim “deleted” blocks first (VMware). Over time, an “unattended” NTFS file system will use 100% of an XIV volume with some combination of active files and logically deleted files that only it knows about (NTFS reports xx% and XIV reports 100%). This NTFS scenario is not thin provisioned “friendly” for any storage array. File systems that always use deleted space first tend to provide better correlation across the file system and XIV utilization metrics.
  • XIV will return capacity to a storage pool if large contiguous blocks of data are zeroed by a server. This can occur via the server writing zeroes (Microsoft SDELETE utility) or via the server sending XIV a list of blocks to zero (VMware VAAI primitives). The minimum amount of zeroed capacity that XIV will return for intra-pool reuse is 1MB (between volumes in the same pool). The minimum capacity that XIV will recognize as free space at the system level is 16GB, i.e. the minimum amount of XIV free capacity that be reclaimed as system free capacity or that can be moved between XIV storage pools is 16GB, which must be contiguous and align to XIV’s internally managed boundaries. If all the layers above XIV play nicely (databases, file systems and LVMs), XIV pool and volume utilizations will vary over time instead of always increasing. Note that this occurs on XIV for either thick or thin provisioned volumes, however no capacity savings are realized for thick provisioned volumes (but XIV rebuilds may become shorter if hard capacity utilization decreases).
  • Lastly, DBAs have influence over how XIV capacity is used. If a DBA decides to imbed a higher amount of free space in tables or tablespaces for performance or growth reasons, XIV will typically report a higher capacity usage. For example, if a DBA constructs a table with 50% free space at the end of every row, that table will typically take twice as much file system and XIV capacity as a table with no imbedded free space. The 50% table will also be unlikely to return any single 1MB contiguous area to all zeroes which prevents XIV from reclaiming any space over time. I’ve seen 50% tables consume 100% of file system and XIV capacities even when they’re empty, due to database metadata propagation or to non-zero default field values being applied to all rows in a table (database 0%, file system 100%, XIV 100%).

Concluding thoughts:

  • Deciding whether or not to implement thin provisioning is mainly about your current XIV hard to soft capacity ratio, while also factoring in any growth.
  • If you really want to push storage arrays to 80-90% utilizations, you usually have to adopt a thin provisioned and oversubscribed model.
  • LVM striping, file system behavior, and/or database layouts can affect any storage array’s ability to provide benefits from thin provisioning (everyone has to play nice!).
  • It all depends on where you measure and report capacity usage…

Liz Young

By Liz Young, Marketing Coordinator
Photo of Liz Young

Related Blog Posts