Why GCP Anthos on NetApp HCI is a big deal?

Google Cloud & NetApp announced a new validated design with GKE running on NetApp HCI on-premise.

Read what you might miss from NetApp announcements during Aug-Nov 2019 compressed into a single article.

Kubernetes was originally designed by Google, Google is one of the main contributors to Docker, and obviously the most advanced, mature & stable on the market. If you tried GKE in GCP & other competitive solutions, you know what I’m talking about.

Containers on-premises are difficult when you want to make Enterprise solution for new containerized applications on-premises for number of reasons: Installation, configuration, management, updates of your core infrastructure components, persistent & performant & predictable storage performance, DevOps do not want to deal with infrastructure they want just consume it. These are the key problems to solve and NetApp aims to do it.

NVA-1141: NetApp HCI with Anthos. NVA Design/Deployment

Bullet points why Google Anthos on NetApp HCI is an important announcement:

  • Hybrid cloud. NetApp according to its Data fabric vision, continue to bring hybrid cloud experience to its users in the flash. Now with Anthos on HCI your on-prem data center becomes just another cloud zone. Software updates for GKE & Anthos are on Google’s shoulders, you just consume it. Not just NetApp HCI maintenance like software & firmware updates can be bought as a service, but space as well. You can pay as you go & consume infrastructure as a service: OPEX instead of CAPEX by request with NetApp Keystone
  • NetApp Kubernetes Services (NKS) In addition to NetApp NKS which allows for the deployment & management of on-premises & in the cloud kubernetes clusters, Anthos provides the ability to deploy clusters on-prem and fully integrated with Google Cloud, including the ability to manage from the GKE console. NKS bundled with Istio, Helm & many other components for your microservices which puts DevOps to the next level. Cloud infrastructure on-premises reached your data center
  • Storage automation. NetApp Trident is literally the most advanced storage driver for containers at the market so far which brings automation, API and persistent storage to containerization world. NetApp Trident with NKS & Anthos totally make sense. Speaking about Automation, NetApp Ansible playbooks are also the most advanced on the market at the moment with 106 published & supported modules, and SolidFire itself is known as fully API-driven storage, so you can work with it solely through RESTful API
  • Simple, predictive and performant enterprise storage with QoS whether on-prem or in the cloud: use Trident and Ansible with NetApp HCI on-prem or CVO or CVS in AWS, Azure or GCP, moreover replicate your data to the cloud for DR or Test/Dev
  • NetApp HCI vs other HCI solutions. One of the most notable HCI competitor is Nutanix so I want to use it as an example. Nutanix’s storage architecture with local disk drives certainly interesting but not unique and obviously have some architectural disadvantages, scalability was one issue to name. Local disk drives are blessing & great news for tiny solutions and not so good of idea when you need to scale it up, cheapness of a small solution with commodity HW & local drives might turn into curse at scale. That’s why Nutanix eventually developed dedicated storage nodes connected over the network to overcome the issue while stepping to the very competitive lend of network storage systems. Because dedicated storage nodes connected over the network is not something new & unique for Nutanix, there are plenty of capable & scalable network storage systems out there. Therefore, most exciting part of Nutanix is their ecosystem & simplicity not the storage architecture though. Now thanks to Anthos, NetApp HCI get in to a unique position with scalability, ecosystem, simplicity, hybrid cloud & functionality for microservices where some other great competitors like Nutanix not reached yet, and that gives NetApp a momentum in the HCI market
  • Performance. Don’t forget about NetApp’s Max Data software which already working with VMware & SolidFire, it will take NetApp only one last step to bring DCPMM like Intel Optane to NetApp HCI. Note NetApp just announced on Insight 2019 a compute node with Intel Cascade Lake CPUs which required for Optane. Max Data is not available on NetApp HCI yet, but we can clearly see that NetApp putting everything together to make it happen. Persistent memory in form of a file system for a Linux host server with tiering for cold blocks to “slow” SSD storage can put NetApp on top of all the competitors in terms of performance

HCI Performance

Speaking about which, take a look on these two performance tests:

  1. IOmark-VM-HC: 5 storage & 18 compute nodes using data stores & VVols
  2. IOmark-VDI-HC: 5 storage nodes & 12 compute nodes with only data stores

Total 1,440 VMs with 3,200 VDI desktops.

Notice how asymmetrical number of storage nodes compared to compute nodes are, and in “Real” HCI architectures with local drives you have to have more equipment, while with NetApp HCI you can choose how much storage and how much compute resources you need and scale them separately. Dedup & compression were enabled in the tests.

Disclaimer

This article is for information purposes only, may contain errors and personal opinions. This text neither authorized nor sponsored by NetApp. If you have spotted an error, please let me know.

Why use NetApp snapshots even when you do not have Premium bundle software?

If you are extremely lazy and do not want to read any farther, the answer is “use snapshots to improve RPO and use ndmpcopy to restore files, LUNs and SnapCreator for app-consistent snapshots.

Premium bundle includes a good deal of software besides Base software in each ONTAP system, like:

  • SnapCenter
  • SnapRestore
  • FlexClone
  • And others.

So, without Premium bundle, with only Basic software we have two issues:

  • You can create snapshots, but without SnapRestore or FlexClone you cannot restore them quickly
  • And without SnapCenter you cannot make application consistent snapshot.

And some people asking, “Do I need to use NetApp snapshots in such circumstances?”

And my answer is: Yes, you can, and you should use ONTAP snapshots.

Here is the explanation of why and how:

Snapshots without SnapRestore

Why use NetApp storage hardware snapshots? Because they have no performance penalty and also no such a thing as snapshot consolidation which causes a performance impact. NetApp snapshots work pretty well and they also have other advantages. Even though it is not that fast as with SnapRestore or FlexClone to restore your data captured in snapshots, you can create snaps very fast. And most times, you need to restore something very seldom, so fast creation of snapshots with slow restoration will give you better RPO compare to a full backup. Of course, I have to admit that you improved RPO only for cases when your data were logically corrupted, and no physical damage was done to the storage because if your storage physically damaged, snapshots will not help. With ONTAP you can have up to 1023 snapshots per volume, and you can create them as fast as you need with no performance degradation whatsoever, which is pretty awesome.

Snapshots with NAS 

If we are speaking about NAS environment without SnapRestore license, you always can go to the .snapshot folder and copy any previous version of a file you need to restore. Also, you can use the ndmpcopy command to perform file, folder or even volume restoration inside storage without involving a host.

Snapshots with SAN 

If we are speaking about SAN environment without SnapRestore license, you do not have such ability as copying a file on your LUN and restore it. There are two stages in case you need to restore something on a LUN:

  1. You copy entire LUN from a snapshot
  2. And then you can either:
    • Restore entire LUN on the place of the last active version of your LUN
    • Or you can copy data from copied LUN to the active LUN.

To do that, you can use either ndmpcopy or lun copy commands to perform the first stage. And if you want to restore only some files from an old version of the LUN from a snapshot, you need to map that copy to a host and copy required data back to active LUN.

Application consistent storage snapshots 

Why do you need application consistency in the first place? Sometimes, in an environment like the NAS file share with doc files, etc., you do not need that at all. But if you are using applications like Oracle DB, MS SQL or VMWare you’d better have application consistency. Imagine you have a Windows machine and you are pulling hard drive while Windows is running, let’s forget for a moment that your Windows will stop working, this is not the point here, and let’s focus on data protection side of that. The same happens when you are creating a storage snapshot, data captured in that snapshot will be similarly not complete. Will the pulled off hard drive be a proper copy of your data? Kind of, right? Because some of the data will be lost in host memory and your FS probably will not be consistent, and even though you’ll be able to restore logged file system, your application data will be damaged in a way it hard to restore, because against of the data lost from host memory. Similarly, snapshots will contain probably damaged FS, if you try to restore from such a copy, your Windows might not start, or it might start after FS recheck, but your applications especially Data Bases definitely will not like such a backup. Why? Because most probably you’ll get your File System corrupted because applications and OS which were running on your machine didn’t have a chance to destage data from memory to your hard drive. So, you need someone who will prepare your OS & applications to create a backup. As you may know, application consistent storage hardware snapshots can be created by backup software like Veeam, Commvault, and many others, or you even can trigger a storage snapshot creation yourself with relatively simple Ansible or PowerShell script. Also, you can do application-consistent snapshots with free NetApp SnapCreator software framework, unlike SnapCenter, it does not have a simplistic and straight-forward application GUI wizards which help to walk you through with the process of integration with your app. Most times, you have to write a simple script for your application to benefit online & application-consistent snapshots, another downside that SnapCreator is not officially supported software. But at the end of the day, it is relatively easy setup, and it will definitely pay you off once you finish setting up.

List of other software features available in Basic software

This Basic ONTAP functionality also might be useful: 

  • Horizontal scaling, nod-disruptive operations such as online volume & LUN migration, non-disruptive upgrade with adding new nodes to the cluster
  • API automation
  • FPolicy file screening
  • Create snapshots to improve RPO
  • Storage efficiencies: Deduplication, Compression, Compaction
  • By default ONTAP deduplicate data across active file system and all the snapshots on the volume. Savings from the snapshot data sharing is a magnitude of number of snapshots: the more snapshots you have, the more savings you’ll have
  • Storage Multi-Tenancy
  • QoS Maximum
  • External key manager for Encryption
  • Host-based MAX Data software which works with ONTAP & SAN protocols
  • You can buy FlexArray license to virtualize 3rd party storage systems
  • If you have an All Flash system, then you can purchase additional FabricPool license which is useful especially with snapshots, because it is destaged cold data to cheap storage like AWS S3, Google Cloud, Azure Blob, IBM Cloud, Alibaba Cloud or on-premise StorageGRID system, etc.

Summary

Even Basic software has a reach functionality on your ONTAP system, you definitely should use NetApp snapshots, and set up application integration to make your snapshot application consistent. With hardware NetApp storage snapshots, you can have 1023 snapshots per volume, create them as fast as you need without sacrificing storage performance, so snapshots will increase your RPO. Application consistency with SnapCreator or any other 3rd party backup software will build confidence that all the snapshots can be restorable when needed.

ONTAP and ESXi 6.х tuning

This article will be useful to those who own an ONTAP system and ESXi environment.

ESXi tuning can be divided into the next parts:

  • SAN network configuration optimization
  • NFS network configuration optimization
  • Hypervisor optimization
  • Guest OS optimization
  • Compatibility for software, firmware, and hardware

There are a few documents which you should use when tuning ESXi for NetApp ONTAP:

TR-4597 VMware vSphere with ONTAP

SAN network

In this section, we will describe configurations for iSCSI, FC, and FCoE SAN protocols

ALUA

ONTAP 9 has ALUA always enabled for FC, FCoE and iSCSI protocols. If ESXi host correctly detected ALUA, then Storage Array Type plug-in will show VMW_SATP_ALUA. With ONTAP it is allowed to use Most Recently Used or Round Robin load balancing algorithm.

Round Robin will show better results if you have more than one path to a controller. In the case of Microsoft Cluster with RDM drives it is recommended to use the Most Recently Used algorithm. Read more about Zoning for ONTAP clusters.

Storage ALUA Protocols ESXi policy Algorithm
ONTAP
9
Enabled FC/FCoE/iSCSI VMW_SATP_ALUA Most
Recently Used
Round
Robin

Let’s check policy and algorithm applied to a Datastore:

# esxcli storage nmp device list
naa.60a980004434766d452445797451376b
Device Display Name: NETAPP Fibre Channel Disk (naa.60a980004434766d452445797451376b)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on;explicit_support=off; explicit_allow=on;alua_followover=on;{TPG_id=1,TPG_state=ANO}{TPG_id=0,TPG_state=AO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0; lastPathIndex=0: NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba2:C0:T6:L119, vmhba1:C0:T7:L119
Is Local SAS Device: false
Is USB: false
Is Boot USB Device: false

Ethernet Network

Ethernet network can be used for NFS and iSCSI protocols.

Jumbo frames

Either you are using iSCSI or NFS, it is recommended to use jumbo frames with speed of 1Gbps or greater. When you are setting up Jumbo frames.

ESXi & MTU9000

When you are setting up a virtual machine to achieve the best network performance, so you’d better use VMXNET3 virtual adapter since it supports both speeds greater than 1Gbps and MTU 9000. While E1000e virtual adapter supports MTU 9000 and speeds up to 1Gbps. Also, E1000e by default sets up 9000 MTU to all the VMs except Linux. Flexible virtual adapters support only MTU 1500.

To achieve maximum network throughput, connect your VM to virtual switch which also has MTU 9000.

NAS & VAAI

ONTAP storage systems support VAAI (vSphere Storage APIs Array Integration). VAAI hardware acceleration or hardware offload APIs, is a set of APIs to enable communication between VMware vSphere ESXi hosts and storage devices. So instead of ESXi host copying some data from storage, modifying it in host memory and putting it back to storage over the network, with VAAI some of the operations can be done by storage itself with API calls from ESXi host. VAAI enabled by default for SAN protocols but not for NAS. For NAS VAAI to work you need to install a vib kernel module called NetAppNFSVAAI on each ESXi host. Do not expect VAAI to solve all your problems but some performance definitely will improve. NetApp VSC also can help with NetAppNFSVAAI installation. For NFS VAAI function you have to set your NFS share on storage properly and to meet a few criteria:

  1. On the ONTAP storage set NFS export policy so ESXi servers can access it
  2. RO, RW and Superuser fields must have SYS or ANY values in export policy for your volume
  3. You have to enable NFSv3 AND NFSv4 protocols, even if NFSv4 will not be used
  4. Parent volumes in your junction path have to be readable. In most of the cases, it means that root volume (vsroot) on your SVM needs to have at least superuser field to be set up with SYS. Moreover, it is recommended to prohibit write access to SVM root volume.
  5. Enable vStorage feature has to be enabled for your SVM (vserver)

Example:

#cma320c-rtp::> export-policy rule show -vserver svm01 -policyname vmware_access -ruleindex 2
(vserver export-policy rule show)
Vserver: svm01
Policy Name: vmware_access <--- Applied to Exported Volume
Rule Index: 2
Access Protocol: nfs3 <---- needs to be 'nfs' or 'nfs3,nfs4'
Client Match Spec: 192.168.1.7
RO Access Rule: sys
RW Access Rule: sys
User ID To Which Anonymous Users Are Mapped: 65534
Superuser Security Flavors: sys
Honor SetUID Bits In SETATTR: true

#cma320c-rtp::> export-policy rule show -vserver svm01 -policyname root_policy -ruleindex 1
(vserver export-policy rule show)
Vserver: svm01
Policy Name: root_policy <--- Applied to SVM root volume
Rule Index: 1
Access Protocol: nfs <--- like requirement 1, set to nfs or nfs3,nfs4
Client Match Spec: 192.168.1.5
RO Access Rule: sys
RW Access Rule: never <--- this can be never for security reasons
User ID To Which Anonymous Users Are Mapped: 65534
Superuser Security Flavors: sys <--- this is required for VAAI to be set, even in the parent volumes like vsroot
Honor SetUID Bits In SETATTR: true
Allow Creation of Devices: true

#cma320c-rtp::> nfs modify -vserver svm01 -vstorage enabled

ESXi host

First of all, let’s not forget it is a good idea to leave 4GB of memory to the hypervisor itself.  Also, we need to tune some network values for ESXi

ProtocolsValues for ESXi 6.x with ONTAP 9.x
Net.TcpipHeapSizeiSCSI/NFS32
Net.TcpipHeapMaxiSCSI/NFS1536
NFS.MaxVolumesNFS256
NFS41.MaxVolumesNFS 4.1256
NFS.HeartbeatMaxFailuresNFS10
NFS.HeartbeatFrequencyNFS12
NFS.HeartbeatTimeoutNFS5
NFS.MaxQueueDepthNFS64 (If you have only AFF, then 128 or even 256)
Disk.QFullSampleSizeiSCSI/FC/FCoE32
Disk.QFullThresholdiSCSI/FC/FCoE8
VMFS3.HardwareAcceleratedLockingiSCSI/FC/FCoE1
VMFS3.EnableBlockDeleteiSCSI/FC/FCoE0

We can do it in a few ways:

  • The easiest way, again, to use VSC which will configure these values for you
  • Command Line Interface (CLI) on ESXi hosts
  • With the GUI interface of vSphere Client/vCenter Server
  • Remote CLI tool from VMware.
  • VMware Management Appliance (VMA)
  • Applying Host Profile

Let’s set up these values manually in command line:

# For Ethernet-based protocols like iSCSI/NFS
esxcfg-advcfg -s 32 /Net/TcpipHeapSize
esxcfg-advcfg -s 512 /Net/TcpipHeapMax

# For NFS protocol
esxcfg-advcfg -s 256 /NFS/MaxVolumes
esxcfg-advcfg -s 10 /NFS/HeartbeatMaxFailures
esxcfg-advcfg -s 12 /NFS/HeartbeatFrequency
esxcfg-advcfg -s 5 /NFS/HeartbeatTimeout
esxcfg-advcfg -s 64 /NFS/MaxQueueDepth

# For NFS v4.1 protocol
esxcfg-advcfg -s 256 /NFS41/MaxVolumes

# For iSCSI/FC/FCoE SAN protocols
esxcfg-advcfg -s 32 /Disk/QFullSampleSize
esxcfg-advcfg -s 8 /Disk/QFullThreshold

And now let’s check those settings:

# For Ethernet-based protocols like iSCSI/NFS
esxcfg-advcfg -g /Net/TcpipHeapSize
esxcfg-advcfg -g /Net/TcpipHeapMax

# For NFS protocol
esxcfg-advcfg -g /NFS/MaxVolumes
esxcfg-advcfg -g /NFS/HeartbeatMaxFailures
esxcfg-advcfg -g /NFS/HeartbeatFrequency
esxcfg-advcfg -g /NFS/HeartbeatTimeout
esxcfg-advcfg -g /NFS/MaxQueueDepth

# For NFS v4.1 protocol
esxcfg-advcfg -g /NFS41/MaxVolumes

# For iSCSI/FC/FCoE SAN protocols
esxcfg-advcfg -g /Disk/QFullSampleSize
esxcfg-advcfg -g /Disk/QFullThreshold

HBA

NetApp usually recommends using settings by default. However, in some cases, VMware, NetApp or Application vendor can ask you to modify those settings. Read more in VMware KB. Example:

# Set value for Qlogic on 6.0
esxcli system module parameters set -p qlfxmaxqdepth=64 -m qlnativefc
# View value for Qlogic on ESXi 6.0
esxcli system module list | grep qln

VSC

NetApp Virtual Storage Console (VSC) is a free software which helps you to set recommended values for ESXi hosts and Guest OS. Also, VSC helps with basic storage management like datastore creation from vCenter. VSC is a mandatory tool for VVOLs for ONTAP. VSC available only for the vCenter web client supported vCenter 6.0 and newer.

VASA Provider

VASA Provider is a free software which helps your vCenter to know about some specifics and storage capabilities like disk types: SAS/SATA/SSD, Storage Thing Provisioning, Enabled or disabled storage caching, deduplication and compression. VASA Provider integrates with VSC and allows to create storage profiles. VASA Provides also a mandatory tool for VVOLs. NetApp VASA, VSC and Storage Replication Adapter for SRM are bundled in a single virtual appliance and available for all NetApp customers.

Space Reservation — UNMAP

UNMAP functionality allows to free space on datastore and storage system after data been deleted from VMFS or inside Guest OS, this process known as space reclamation.  There are two independent processes:

  1. First space reclamation form: ESXi UNMAP to storage system when some data been deleted from VMFS datastore. For this type of reclamation to work, storage LUN has to be thin provisioned, and space allocation functionality needs to be enabled on the NetApp LUN. Reclamation of this type can happen in two cases:
    • A VM or VMDK has been deleted
    • Data deleted from Guest OS file system and space reclaimed on from Guest OS VMFS. Basically, after the UNMAP form, Guest OS already happened.
  2. Second space reclamation form: UNMPA from Guest OS when some data deleted on Guest OS file system to free space on VMware datastore (either NFS or SAN). This type of reclamation has nothing in to do with the underlying storage system and do not require any storage tuning or setup, but does need Guest OS tuning and some additional requirements for this feature to function.

Both space reclamation forms are not tied one to another, and you can have only one of them set up to work, but for the best space efficiency, you are interested in having both.

First space reclamation form: From ESXi host to storage system

Historically VMware introduced only the first space reclamation form: from VMFS to storage LUN in ESXi 5.0 with space reclamation happened automatically and nearly online. Moreover, it wasn’t the best idea because it immediately hit storage performance. So, with 5.X/6.0 VMware disabled automatic space reclamation and you have to run it manually. ESXi 6.X with VVOLs space reclamation works automatically and with ESXi 6.5 and VMFS6 it also works automatically, but in both cases,  it is asynchronously (not online process).  

On ONTAP space reclamation (space allocation) is always disabled by default:

lun modify -vserver vsm01 -volume vol2 -lun lun1_myDatastore -state offline

lun modify -vserver vsm01 -volume vol2 -lun lun1_myDatastore -space-allocation enabled lun modify -vserver vsm01 -volume vol2 -lun lun1_myDatastore -state online

If you are using an NFS datastore, space reclamation not needed, because with NAS this functionality available by design. UNMAP needed only for SAN environment because it definitely was one of the disadvantages to NAS.

This type of reclamation automatically occurs in ESXi 6.5 during up to 12 hours and can also be initiated manually.

esxcli storage vmfs reclaim config get -l DataStoreOnNetAppLUN
Reclaim Granularity: 248670 Bytes
Reclaim Priority: low esxcli storage vmfs reclaim config set -l DataStoreOnNetAppLUN -p high

Second space reclamation form: UNMPA from Guest OS

Since for VMs VMDK file is basically a block device, you can apply UNMAP mechanism there too. Starting from 6.0 VMware introduced such capability. It started with Windows in the VVOL environment with ESXi 6.0 with automatic space reclamation from Guest OS and manual space reclamation with Windows machines on ordinary datastores. Later introduced automatic space reclamation from Guest OS (Windows and Linux) on ordinary Datastores in ESXi 6.5.

Now to set it up to function properly it might be trickier then you think. The hardest thing to make this UNMAP work is just to comply with requirements. Once you comply with the requirements, it is easy to make it happen. So, you need to have:

  • Virtual Hardware Version 11
  • vSphere 6.0*/6.5
  • VMDK disks must be thin provisioned
  • The file system of the Guest OS must support UNMAP
    • Linux with SPC-4 support or Windows Server 2012 and later

* If you have ESXi 6.0, then CBT must be disabled, which means in a real production environment you are not going to have Guest OS UNMUP since no production can live without proper backups (Backup software leverage CBT for backups to function)

Moreover, if we are adding ESXi UNMAP to the storage system, a few more requirements needed to be honored:

  • LUN on the storage system must be thinly provisioned (in ONTAP it can be enabled/disabled on the fly)
  • Enable UNMAP in ONTAP
  • Enable UNMAP on Hypervisor

Never use Thin virtual disks on Thin LUN

For many years storage all vendors stated not to use thin virtual disks on thin LUNs, and now it is a requirement to make space reclamation from Guest OS.

Windows

UNMAP supported in Windows starting with Windows Server 2012. To make Windows reclama space from VMDK, NTFS must use allocation unit equal to 64KB. To check UNMAP settings issue next command:

fsutil behavior query disabledeletenotify

DisableDeleteNotify = 0 (Disabled) means UNMUP is going to report to the hypervisor to re-clame space.

Linux Guest OS SPC-4 support

Let’s check first is our virtual disk thin or thick:

sg_vpd -p lbpv
Logical block provisioning VPD page (SBC):
Unmap command supported (LBPU): 1

1 means we have a thin virtual disk. If you got 0, then your virtual disk either thick (sparse or eager), both are not supported with UNMAP. Let’s go farther and check that we have SPC-4

sg_inq -d /dev/sda
standard INQUIRY:
PQual=0 Device_type=0 RMB=0 version=0x06 [SPC-4]
Vendor identification: VMware
Product identification: Virtual disk
Product revision level: 2.0

We need to have SPC-4 to make UNMAP work automatically. Let’s check Guest OS notifying SCSI about reclaimed blocks

grep . /sys/block/sdb/queue/discard_max_bytes
1

1 means we are good. Now let’s try to create a file, remove it and see if we get our space freed:

sg_unmap --lba=0 --num=2048 /dev/sda
# or
blkdiscard --offset 0 --length=2048 /dev/sda

If you are getting “blkdiscard: /dev/sda: BLKDISCARD ioctl failed: Operation not supported”, then UNMAP doesn’t work properly. If we do not have an error, we can remount our filesystem with “-o discard” key to make UNMAP automatic.

mount /dev/sda /mnt/netapp_unmap -o discard

Guest OS

You need at least to check your Guest OS configurations for two reasons:

  1. To gain max performance
  2. To make sure in case of one controller down, your Guest OS survive takeover timeout

Disk alignment: to make sure you get max performance

Disk Misalignment is an infrequent situation, but you still you may get into it. There are two levels where you can get this type of problem:

  1. When you created a LUN in ONTAP with geometry, for example, Windows 2003 and then used it with Linux. This type of problem can occur only in a SAN environment. Its very simple to avoid when you are creating a LUN in ONTAP, make sure you chose proper LUN geometry. This problem happens between storage and hypervisor
  2. Inside of a virtual machine. It can happen in SAN and NAS environment.

To understand how it works let’s take a look on a properly aligned configuration

Fully aligned configuration

On this image upper block belong to Guest OS, block in the middle belongs to ESXi, and lower block represents ONTAP storage system.

First case: Misalignment with VMFS

When you have your VMFS file system misaligned with your storage system. Also, that will happen if you create on ONTAP a LUN with geometry not equal to VMware. It is very easy to fix: just create a new LUN in ONTAP with VMware geometry, create new VMFS datastore and move your VMs to the new one, destroy old one.

Second case: Misalignment inside your guest OS

This is also a very rare problem which can occur because you can get this problem with very old Linux distributives, Windows 2003 and older. However, we are here to discuss all the possible problems to understand better how it works, right? This type of problem can occur on NFS datastore and VMFS datastore leveraging SAN protocols, also in RDM and VVOLs. This type of problem usually happens with virtual machines using non-optimally aligned MBR on Guest OS or Guest OS which previously were converted from physical machines to virtual. How to identify and fix misaligned in Guest OS you can find in NetApp KB.

Misalignment on two levels simultaneously

Of course, if you are very lucky, you can get both simultaneously: on VMFS level and Guest OS level. Later in this article, we will discuss how to identify such a problem from the storage system side.

Takeover/Giveback

NetApp ONTAP storage systems consist of one or a few building blocks called HA pairs. Each HA pair consists of two controllers and in the event of one controller failure of one controller, second will take over and continue to serve clients. The takeover is a relatively fast process in ONTAP, and in new All-Flash FAS (AFF) configurations it takes from 2 to 15 seconds. However, with hybrid FAS systems, this time can be longer and take up to 60 seconds. 60 seconds is the absolute maximum after which NetApp guarantees failover to be completed in FAS systems, and it usually occurs for 2-15 seconds. This numbers should not scare you in any way, because during this time your VMs will survive, as long as your timeouts are set equals to or greater than 60 sec and default VMware Tools value for your VMs is 180 seconds in any way. Moreover, since your ONTAP cluster can have different models, generations and disk types of systems, it is a good idea to use the worst-case scenario which is 60 sec.

Guest OSUpdated Guest OS Tuning for SAN:
ESXi 5 and newer, or ONTAP 8.1 and newer (SAN)
Windowsdisk timeout = 60
Linuxdisk timeout = 60

Default values for Guest OS on NFS datastores are tolerable, and there is no need to change them. However, I would recommend testing a takeover in any way to be sure how it works at such events.

You can configure these values manually or with the use of NetApp Virtual Storage Console (VSC) utility. NetApp Virtual Storage Console (VSC) provides the scripts to help reduce the efforts involved in manually updating the guest OS tunings.

Windows

You can change Windows registry and reboot Guest OS. Timeout in Windows set in seconds in hexadecimal format.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Disk] “TimeOutValue”=dword:0000003c

Linux

Timeout in Linux set in seconds in decimal format. To do that you need to modify udev rule in your Linux OS. Location for udev rules may vary from one Linux distributive to another.

# RedHat systems
ACTION=="add", BUS=="scsi", SYSFS{vendor}=="VMware, " , SYSFS{model}=="VMware Virtual S", RUN+="/bin/sh -c 'echo 60 >/sys$DEVPATH/device/timeout'"

# Debian systems
ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="VMware " , ATTRS{model}=="Virtual disk ", RUN+="/bin/sh -c 'echo 60 >/sys$DEVPATH/device/timeout'"

# Ubuntu and SUSE systems
ACTION=="add", SUBSYSTEMS=="scsi", ATTRS{vendor}=="VMware, " , ATTRS{model}=="VMware Virtual S", RUN+="/bin/sh -c 'echo 60 >/sys$DEVPATH/device/timeout'"

VMware tools automatically sets udev rule with timeout 180 seconds. You should go and double check it with:

lsscsi
find /sys/class/scsi_device/*/device/timeout -exec grep -H . '{}' \;

Compatibility

NetApp has Interoperability Matrix Tool (IMT), and VMware has the Hardware Compatibility List (HCL). You need to check on them and stick to the versions both vendors support to reduce potential problems in your environment. Before any upgrades make sure, you are always staying in the compatibility list.

Summary

Most of this article are theoretical knowledge which most probably you will not need because nowadays nearly all parameters either automatically assigned or set as default configuration and very possible you will not see misalignment in new installations in real life, but if something will go wrong, information in this article will shed light on some of the under-hood aspects of ONTAP architecture and will help you to deep dive & figure out the reasons of your problem.

Correct settings for your VMware environment with NetApp ONTAP gives not just better performance but also ensure you will not get into trouble in the event of storage failover or network link outage. Make sure you are following NetApp, VMware and application recommendations. When you are setting up your infrastructure from scratch always test performance and availability, simulate storage failover and network link outage. Testing will help you to understand your infrastructure baseline performance, behavior in critical situations and help to find week points. Stay in compatibility lists, it will not guarantee yuo never get in to troubles, but reduce risks and keep you supportable by the both vendors.

Continue to read