VMware: Repairing orphaned ESX snapshots

This post was last updated on September 15th, 2013 at 12:12 am

Update: Consolidate Snapshots

Snapshots created via API (NetApp SMVI, Equallogic’s Auto-Snapshot Manager/VMware Edition, VMware VCB or VDR) occasionally get stuck. If you find an orphaned snapshot (ie – it is not listed in the snapshot manager but you can’t change the size of a vmdk / the provisioned size grayed out, or you happen to notice a -delta file where there shouldn’t be one or your scheduled monitoring tasks report an issue (you do run scheduled monitoring tasks don’t you?)) here are some potential fixes.

Background
VMware hard drives are stored as two files, a “name”.vmdk descriptor text file and a “name”-flat.vmdk binary (or “name”-delta.vmdk for snapshots) which holds the actual blocks.

When you take a snapshot of a hard drive a new set of files are created. The new file holds any new or changed blocks and the old descriptor file is updated

No Snapshot

Before a snapshot is created, the VM configuration file (“name”.vmx) contains (among many other things) a line referencing the hard drive descriptor file:

ide0:0.fileName = “KyleA.vmdk”

At this point the “name”.vmsd is essentially blank

The original “name”.vmdk hard drive descriptor file contains (among other things) two lines referencing the ID of the drive (note the ID is unique only for snapshots – all master disks are fffffffe) as well as a line referencing the binary “-flat” file for this drive.
CID=fffffffe
parentCID=ffffffff
# Extent description
RW 41943040 VMFS “KyleA-flat.vmdk”

One snapshot
When a snapshot is taken, the configuration file (“name”.vmx) gets updated with the name of the current snapshot

ide0:0.fileName = “KyleA-000001.vmdk”

The snapshot descriptor file gets updated with (among other things) the name of the associated .vmsn file which includes the state of RAM, CPU and VMX.

snapshot0.filename = “KyleA-Snapshot2.vmsn”

The original “name”.vmdk is left unchanged (see above)

A snapshot binary is created as “name”-000001-delta.vmdk (note the name change from “-flat”)
The descriptor file “name”-000001.vmdk is created with a line referencing the “delta” binary file plus a new ID. The important item is the reference to the parentCID and parentFileNameHint, both referencing the master vmdk descriptor.
Snap descriptor .vmdk:
CID=fffffffe
parentCID=fffffffe
parentFileNameHint=”KyleA.vmdk”
# Extent description
RW 41943040 VMFSSPARSE “KyleA-000001-delta.vmdk”

The end result is the .vmx points to the descriptor file of the VMDK to be written to. If that is a snapshot then it in turn references the next file “up” the snapshot tree.

When a snapshot is committed (ie deleted when it is currently being written to or is directly up the tree from the current running state), the blocks in the -delta are committed to the -delta or -flat it calls “parent” and the descriptor and -delta are deleted. If there is a snap below it, that snap is updated to reference the deleted snapshot’s parent as parent.

ie
If you start with VMX->snap3->snap2->snap1->flat
then delete snap2  (assuming snap3 is the current binary)
you end up with VMX->snap3->snap1->flat

Normally you don’t have to worry about the details, however you occasionally run into issues where snapshots can’t be removed.

Removing “hidden” snapshots
Method 1:

Use the GUI to make a snapshot, then use the Snapshot Manager to “Delete All”

Method 1a:
Power off the VM
Use the GUI to make a snapshot, then use the Snapshot Manager to “Delete All”

Method 2:
Connect to the ESX server with an SSH utility like putty
open each “name”.vmdk descriptor files and look for the line
ddb.deletable = “false” (see a walk-through on this below)
Change this to “true”
Create another snapshot then delete them all. You can use the GUI, but now that your this far the command line to create a snapshot is

vmware-cmd “name”.vmx createsnapshot “test” “” 0 0

The command to remove all snapshots is

vmware-cmd “name”.vmx removesnapshots

Method 2a:
Shut the VM down before doing Method2

Walk through on finding and changing the ddb.deletable setting:
From the console of an ESX server login as root. From an ESXi server enable local troubleshooting mode then login as root

from the command prompt :
cd /vmfs/volumes/”datastore name”
“datastore name” is the case-sensitive name of the datastore the VM is stored on.
Use “ls” to get a list of all datastores if needed

cd “vm name”
“vm name” is the case sensitive name of the VM. use “ls” to get a list of all VMs

use “ls” to get a list of all files

for each vmdk file to check use: “cat name.vmdk” to display the contents of the file
If you find one with ddb.deletable = “false” open the file with vi to edit.

vi “name”.vmdk
arrow to the “f” in “false” and hit “x” five times until you have deleted the work “false”
hit “i” to switch to insert mode and type “true” the line should now read
ddb.deletable = “true”

hit the escape key and then type “:wr” then “q” to save your changed and exit

Note that while you can use the GUI to create a snapshot and then commit them all I’ve had better luck using the command line.

This entry was posted in Computing, Virtualization, VMware and tagged . Bookmark the permalink.

27 Responses to VMware: Repairing orphaned ESX snapshots

  1. Olivier says:

    Do you have a recipy to recover data from a failed delete all snapshot that left me with corrupted vmdk *to boot the VM I had to modify the CID to fit the parent ID*. Now it seems like i’m missing the data from one of the snapshot file and i’m stuck with data as old as 2007-2008 instead of having my data of 2012.

    Thank you,

    • JAndrews says:

      First you should never keep a VMware snapshot for more than a few days – I have never heard of one 5 years old!

      Second you should be actually backing up your virtual machine data (a snapshot by itself is not a valid backup mechanism)

      Third VMware support has tools to recover from corrupted snapshots that are not available to the general public, a one-time support call costs (I believe) $250 which would be well worth it to recover your data.
      Assuming the snapshot files still exist there is still a change to recover the VM.

      If you have another VM with years of snapshots I suggest using Converter to migrate the guest to a new VM, bypassing the snapshot mechanism altogether.

  2. James Minard says:

    I think I’m in a situation where I need to follow these steps. We’ve already tried method 1 and 1a, and also somewhat tried method 2, except we did not change the ddb.deletable setting from false to true. I’m not by any stretch of the imagination an VMWare technician, so without understanding the ramifications of changing that setting, I’m leery to do it. Does changing that setting do anything destructive?

    • JAndrews says:

      How large are the vmdk files? Can you copy them to (preferably) another datastore or to a second directory on that datastore?

      The setting prevents that snapshot from being rolled up. If the snapshot is no longer being used then there is no issue. If you have a backup reading from that file then that backup will fail.

      If you are paranoid and can’t copy the vmdks then use VMware Converter to migrate the current state of the VM to a new set of vmdks.

  3. James Minard says:

    Unfortunately this issue arose when the datastore ran out of disk space when an Acronis virtual appliance was creating a snapshot of this VM during the nightly backup. We freed up enough space in the datastore to get the VM back up and running, but it is a large database server with 4 separate virtual disks (50GB, 600GB, 100GB, and 700 GB respectively), and we simply do not have the space to copy them off anywhere, or to clone them to a new VM (which I was told would also resolve the issue).

    The ddb.deletable=false setting is within the base DB.vmdk file. It looks like it’s set to true on the DB-000001.vmdk file that the VM is showing as the current hard drive file that it’s using to run. Looks like that’s the case for the other 3 virtual disk files as well.

    So if I’m understanding your post, I should be able to change that setting to true on all 4 base vmdk files and then try to create new snapshots and remove them using the command line?

    • JAndrews says:

      Yes, by setting the flag to true the snapshot will be rolled up when you consolidate the VM.

      You can always run to BestBuy and get an external 2TB eSATA drive – if the server doesn’t have eSATA plug a laptop in as close to it as possible and get the VM copied over a weekend.

  4. James Minard says:

    I really appreciate your responses. I sent your writeup on this to the people that installed the system to see what they think, since they are far more knowledgeable than I am about all of this and would understand it better than I do. To add just a little bit more to the equation, this system is running HP Left Hand software to share the internal storage of 2 physical servers and make a 2 virtual data stores – 1 with SAS drives and the other with SSD drives. 3 out of the 4 vmdk files for this DB server live on the SAS store and the other usually lives on the SSD store. But because the main VM files are on the SAS store, when the backup crashed and left the virtual disks running on the delta files, the 4th drive was left with the delta on the SAS store and the base vmdk file on the SSD store. I’m not even sure how to navigate to the SSD store from the putty session i’m in to run the “cat db.vmdk” command for that file, since I ended up in the DB folder on the SAS store by running the command vmware-command -l. Since I can’t navigate to that store to run the command and ultimately change the setting from false to true, there’s no sense in changing it on the other 3 files until I can get someone on the sytem that knows more about nagivating in the VMWare command line than I do. Thanks again.

  5. Thank you for the tip!

    I will give this a try next time via command line and see how it works out.

    I have tried one time where the snapshot had run for too long and there was around 40 delta times. It was impossible to consolidate them and I had to clone to a new VMDK file.

  6. Marcus Limosani says:

    Hi Joshua,
    Hoping you can provide some advice.
    I have a client with a VM that is running, but i can’t back it up, or clean it up.

    There are 12 -delta files in existence but the base VMDK file indicates that the flat file is the run one.
    The 000001 through 000007 all indicate they are marked for deletion, 8-12 are not.

    I tried copying this whole VM to an iSCSI target as i think i might have some disk issues.
    The copy process failed on the 000004 delta file.

    Also, the GUI doesnt indicate there are any snapshots.
    The environment is running slow, and as mentioned, i can’t effect a backup.
    We have Acronis VMProtect in place, it fails, i have tried the built in Windows backup, and also Shadow Protect.

    I am at a bit of a loss on how to proceed from here, any advice would be greatly appreciated.

    • JAndrews says:

      Clone. That makes a complete/consolidated copy of the VM. Preferably shut the VM down first, then disconnect the vNIC from the old one so you don’t accidentally power it on.

      If you don’t have enough space on the host you can use Converter to migrate the VM to another host.

      Once that is done you can start playing with the -delta files and try to fix it. If you have vSphere 5.x use the consolidate function (this scenario is why it was implemented)

      Let me know how it goes!

    • Wim says:

      before you consolidate the delta files to the normal flatfile check the Parent and CID of the file
      less snapshot.vmdk | egrep -i “cid|parentcid|vmdk” –color do this for all the delta files untill you reach the flatfile .
      if a CID or parrent is diferent you can correct this by using vi make shure that your chain is complete.

  7. Doug Rafuse says:

    I tried method 1 and it worked like a charm (vSphere 5.1)! The disk references returned back to the original file names, the snapshot files all gone. Just in time to enlarge the disks on our crashing server.

    Thanks very much for posting this!!

  8. fabio says:

    sorry my english is not perfect 🙂

    I inherited a system vmware and I have a problem with a virtual machine:

    I have these two files test.vmdk and test-000002.vmdk

    the snapshot file does not exist, as I delete files in test-000002.vmdk

  9. Fabio says:

    In realtà la macchina aveva come disco utilizzato test-000002.vmdk , ho riavviato la management del nodo e ho potuto eliminare lo snapshot.

    Grazie

  10. Ben Lamond says:

    If you have another VM with years of snapshots I suggest using Converter to migrate the guest to a new VM, bypassing the snapshot mechanism altogether.

  11. Wim says:

    hey all i have a problem where the falt.vmdk will not copy it comes with a timeout.
    i have tried to repair but it states that there is nothing to repair when i try to migrate the disk vmware says it is corrupt.

  12. MladenPokric says:

    Hi,

    At the moment i’m doing the backup of one VM. I power it off, and doing export OVF template, folder of files on my external HDD.

    Summary for this VM shows that “virtual machine disk consolidation is needed”.

    Using Vsphere client 5.1.0. Is it safe to do consolidate under Sphere Client? Virtual machine must be on or off to do consolidate?

    In sphere cilient it shows for HDD, provisioned size is : 250Gb, file 00004.vmdk.

    When using WinSCP i can see original file flat 250gb, and four vmdk files, and of course delta files.

    in the summary tab : provisioned storage 390Gb, Not shared storage 314Gb, Used storage 314Gb.

    Little confusing to me. What to do, and have some good planing, not cane out of storage or similiar problems…

    I found this situation when i come to company. Figuring what is the best to do.

  13. W. Wandrey says:

    I heard it is possible to migrate the VM to another datastore (if you have enough space), delete the orphaned snapshot-files from the old datastore and if you want remigrate the VM back to the further datastore.

Leave a Reply