In my article How to enable Nutanix De-duplication per VMDK/VHD/RAW I explained how to enable or disable de-duplication and fingerprinting-on-write for individual VMDK/VHD or RAW vdisks via nCLI.
Once de-duplication and fingerprinting-on-write are both enabled for a given vdisk NOS will process every new write with the US Secure Hash Algorithm 1 (SHA1) using native SHA1 optimizations available on Intel processors. The created hashes are then utilized for post-process de-duplication executed by the Curator background process. However, I have omitted a somewhat important information about this process.
Because only new writes are fingerprinted-on-write, previous writes that now constitute data blocks in existing vdisk are not fingerprinted and therefore will not be considered for the post-process de-duplication.
In order to fingerprint existing vdisk data it is necessary to utilize a new nCLI command called vdisk_manipulator. The vdisk_manipulator is useful to allow you to fingerprint entire disks with existing data after a Nutanix NOS 4.x upgrade or after enabling de-duplication for the first time in a container.
To fingerprint an entire vdisk
% vdisk_manipulator –-vdisk_name=NFS:90967668 –operation=add_fingerprints
(click to enlarge)
To fingerprint portion of a vdisk, as an example the first 10GB
% vdisk_manipulator –-vdisk_name=NFS:90967668 –operation=add_fingerprints –end_offset_mb=10240
Fingerprinting only a portion of the disk is useful in cases whereas a vdisk contains the System OS (Windows, Linux, etc.) and at the same time it contains a large amount of non-dedupable data such as videos, application level de-duplicated data such as Exchange and Zip files, or transactional databases. The process of manually fingerprinting data generates a large amount of metadata that overtime may demand extra RAM for the Cassandra distributed database, therefore be thoughtful about fingerprinting unnecessary vdisks. In cases where de-duplication is not ideal it’s suggested enabling compression for the container.
The vdisk_manipulator has also additional options to delete fingerprints and compress/decompress.
Please note that if you have enabled post-process de-duplication at the container level when you first created the container all data in every vdisk is automatically fingerprinted-on-write and de-duplicated.
As you can see the functional implementation for fingerprinted-on-write and de-duplication is per vdisk, but in Nutanix PRISM is has been exposed as a container feature for ease of use and simplicity.
If you want to learn more about on-disk de-duplication I suggest Nutanix 4.0 Hybrid On-Disk De-Duplication Explained or the Nutanix Bible.
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net