How does the ONTAP cluster work? (part 3)

This article is part of the series How does the ONTAP cluster work? Also previous series of articles How ONTAP Memory work will be a good addition to this one.

Horizontal Scaling Clusterization

Horizontal scaling ONTAP clusterization came from Spinnaker acquisitions and often referred by NetApp as “Single Namespace,” “Horizontal Scaling Cluster,” or “ONTAP Storage System Cluster,” or “Scale-out cluster,” or just “ONTAP Cluster.” This type of clusterization often confused with HA pair or even with MetroCluster functionality. So to distinguish this one from others in ONTAP I will call it as the third type of clusterization. While MetroCluster and HA are Data Availability and even Data Protection technologies, single namespace clusterization does not provide data protection nor Data Availability: if there will be a hardware failure, the third type of clusterization is not involved in helping to mitigate such a problem. ONTAP forms (third type of) cluster out of one or a few HA pairs (multiple single-nodes are not supported in a single cluster) and adds to ONTAP system Non-Disruptive Operations (NDO) functionality such as non-disruptive online data migration across nodes in the cluster and non-disruptive hardware upgrade or online IP address migration. Data migration for NDO operations in ONTAP Cluster require dedicated Ethernet ports for such operations; they called cluster interconnect interfaces and does not use HA interconnect interfaces for this purpose. Cluster interconnect and HA interconnect interfaces couldn’t share the same ports, until A320 system.

Cluster interconnect with a single HA pair could have directly connected cluster interconnect ports (switch-less config) while systems with 4 or more nodes require two dedicated Ethernet cluster interconnect switches. ONTAP Cluster could consist only from an even number of nodes (they must be configured as HA pairs) except for Single-node cluster. Single-node cluster ONTAP system also called non-HA (or stand-alone), in such a configuration another cluster nodes cannot be added to a single node cluster, but single-node cluster can be converted to HA system and then other HA pairs can be added. ONTAP Cluster managed with a single pane of glass built-in management through Web-based GUI, CLI (SSH and PowerShell) and API. ONTAP Cluster provides Single Namespace for NDO operations through Storage Virtual Machines (SVM). Single Namespace in ONTAP system is a name for a collection of techniques used by (the third type) cluster to provide a level of abstraction and separate data from front-end network connectivity with data protocols like FC, FCoE, FC-NVMe, iSCSI, NFS and SMB from the data in volumes and therefore provide data virtualization. This virtualization provides online data mobility across cluster nodes while clients connected over data protocols still can access their data.

The general idea behind the single namespace is to trick clients so they would think they connected to a single device, while in reality connected to a cluster which consists of a bunch of nodes. This ”trick” work in different ways with different protocols. For example with FC protocol each node’s FC port gets unique WWPN address, while cluster SVM has a single WWNN, in this way client connected to the cluster consider it as a single FC node with multiple active ports, while some of them reported as optimized for traffic and some are none-optimized, in a very similar way it works with iSCSI and NVMeoF. With FC & iSCSI ALUA protocol used to switch between ports & links in case it becomes unavailable. With pNFS cluster operate in a similar to SAN protocol way: nodes in a cluster have interfaces with IP address, so clients perceive SVM as a single node with multiple active interfaces. ONTAP reports to clients ports as optimized which have direct access to nodes which serve volume with the data, and only if those ports not available, clients will use other active none-optimized ports. If protocols like NFS v3, NFS v4, SMB v2, and SMB v3 do not have such capabilities like SAN ALUA or pNFS, then ONTAP has another trick in its sleeve: Single Namespace provides several techniques for non-disruptive IP address migration for data protocols in case node or port dies.

SMB Continuous Availability (CA) extension to SMB v3 allows clients not to drop connections and this provides transparent failover in case a port, link or a node went down. Therefore MS SQL & Hyper-V servers on file share can survive. Without CA support in SMB protocol, in case of IP address migration to another port, clients will get session disruption, so only user file shares recommended to use. Since NFS v3 is a stateless protocol, if an IP address will migrate to another port or node, clients will not experience interruption. In the case of SAN, protocols interfaces do not migrate, but rather a new path selected.

Network access: Indirect data path

In some cases when data resides on one controller and this data accessed through another controller, then indirect data access occurs: For example hosts access network address through LIF 3A on node 3, while volume with data located on data_aggregate01 on node 1. In this scenario node 3 will access controller 1 through cluster interconnect interfaces & switches, get the data from node 1, and provide the information to the hosts requested it from LIF 3A interface which located on node 3 ports, and in some cases controller which owns data, in this example node 1, can reply directly to hosts. This functionality introduced as part of single namespace strategy (the third type of clusterization) to always provide hosts access to data no matter of location of the network addresses and volumes. Indirect data access occurs rarely in the cluster in situations like a LIF migrated by admin or cluster to another port; a volume migrated to another node; or if a node port went down. Though indirect data path adds some small latency to operations, in most of the cases it can be ignored. Some protocols like pNFS, SAN ALUA, NVMe ANA can automatically detect and switch its primary path to ports on the node with direct access to the data. Again, this (third) type of clusterization is not a data protection mechanism, but rather online data migration functionality.

Heterogeneous cluster

Cluster (the third type of clusterization) can consist of different HA pairs: AFF and FAS, different models and generations, performance and disks, and can include up to 24 nodes with NAS protocols or 12 nodes with SAN protocols. SDS systems can’t intermix with physical AFF or FAS appliances. The main purpose of the third type of customization is not data protection but rather non-disruptive operations like online volume migration or IP address migration between all the nodes in the cluster.

Storage Virtual Machine

Also known as Vserver or sometimes SVM. Storage Virtual Machine (SVM) is a layer of abstraction, and alongside with other functions, it virtualizes and separates physical front-end data network from data located on FlexVol volumes. Used for Non-Disruptive Operations and Multi-Tenancy. SVMs lives on nodes and on the image below pictured on disk shelves around volummes just to demonstrate each volume belong only to a single SVM.

Multi-Tenancy

ONTAP provides two techniques for Multi-Tenancy functionality: SVM and IP Spaces. On the one hand, SVMs are like KVM Virtual Machines; they provide virtualization abstraction from physical storage but on another hand quite different because unlike ordinary virtual machines does not allow to run third-party binary code like in Pure storage systems; they just provide a virtualized environment and storage resources instead. Also, SVMs, unlike ordinary virtual machines, does not run on a single node, SVM runs as a single entity on the whole cluster (unless it looks to system admin that way). SVM divide storage system into slices so few divisions or even organizations can share a storage system without knowing and interfering with each other while using same ports, data aggregates and nodes in the cluster and using separate FlexVol volumes and LUNs. Each SVM can run its own front end data protocols, a set of users, use its network addresses and management IP. With the use of IP Spaces users can have the same IP addresses and networks on the same storage system without interfering and network conflicts. Each ONTAP system must run at least one Data SVM to function but may run more. There are a few levels of ONTAP management: Cluster Admin level has all the privileges. Each Data SVM provides to its owner vsadmin user which have nearly full functionality like Cluster Admin level but lucks management of physical level like RAID group configuration, Aggregate configuration, physical network port configuration. Vsadmin can manage logical objects inside its SVM: create, delete and configure LUNs, FlexVol volumes and network interfaces/addresses so two SVMs in a cluster can’t interfere with each other. One SVM cannot create, delete, change or even see objects of another SVM so for SVM owners such an environment looks like they are only users on the entire storage system cluster. Multi-Tenancy is a free functionality in ONTAP. On the image below SVMs pictured on top of ONTAP cluster, but in reality, SVMs are part of ONTAP OS.

Non Disruptive Operations

There a few Non-Disruptive Operations (NDO) and Non-Disruptive Upgrade (NDU) with (Clustered) ONTAP system. NDO data operations include: data aggregate relocation within an HA pair between nodes, FlexVol volume online migration (known as Volume Move operation) across aggregates and nodes within the Cluster, LUN migration (known as LUN Move operation) between FlexVol volumes within the Cluster. LUN move and Volume Move operations use Cluster Interconnect interfaces for data transfer (HA-CI is not in use for such operations). SVM behave differently with network NDO operations, depending on the front-end data protocol. To decrease latency to its original level FlexVol volumes and LUNs should be located on the same node with network address through which the clients access the data, so network address could be created for SAN or moved for NAS protocols. NDO operations are free functionality.

NAS LIF

For NAS front-end data protocols there are NFSv2, NFSv3, NFSv4, CIFSv1, SMBv2, and SMB v3 protocols which do not provide network redundancy with the protocol itself, so they rely on storage and switch functionalities for this matter. ONTAP support Ethernet Port Channel and LACP with its Ethernet network ports on L2 layer (known in ONTAP as interface group or ifgrp), within a single node. And also ONTAP provides non-disruptive network failover between nodes in the cluster on L3 layer with migrating Logical Interfaces (LIFs) and associated IP addresses, similarly to VRRP, to the survived node and back home when failed node restored.

Though new versions of NAS protocols have built-in multipathing functionality: Extensions to NFS v4.1 protocol like pNFS (Supported starting with ONTAP 9.0) and NFS Session Trunking (NFS Multipathing. Not yet supported with ONTAP 9.6), and SMB v3 extension called SMB Multichannel (Available starting with ONTAP 9.4) allows automatically switch between paths in case of network link failure, while SMB Continuous Availability (CA) helps to preserve sessions without interruption. Unfortunately, all of these capabilities have limited support from clients. Until they become more popular ONTAP, will relay on build-in NAS LIF migration capabilities to move interfaces with assigned network addresses to survived node & port.

FailoverGroup

Failover group is functionality available only to NAS protocols and applied only to Ethernet ports/VLANs. SAN interfaces cannot (and do not need) to online migrate across cluster ports like NAS LIFs, therefore do not have failover group functionality. Failover group is a prescription to a LIF interface were to migrate in case if the hosted port will go down and should it return back automatically. By default FailoverGroup equal to Broadcast Domain. It is a good practice to specify manually between which ports a LIF can migrate, especially in the case where a few VLANs are used, so the LIF would not migrate to another VLAN. A failover group can be assigned to multiple LIFs, and each LIF can be assigned to a single failover group.

Broadcast Domain

A Broadcast Domain is a list of all the ethernet ports in the cluster which using the same MTU size and therefore used only for Ethernet ports. In many cases, it is a good idea to separate ports of different speeds. Unless of cause storage administrator wants to mix ports with lower speed & higher speed with the conjunction of a failover group to prescribe LIF migration from high-speed ports to slower speed ports in case if the first ports are unavailable, that is rarely the case and usually needed only with systems with a minimal number of ports. Each Ethernet port can be assigned only to one Broadcast Domain. If a port is a part of ifgroup, then such ifgroup port assigned to a Broadcast Domain. If a port or an ifgroup have VLANs, then Broadcast Domain assigned to each VLAN on that port or ifgroup.

SAN LIF

For front-end data SAN protocols. ALUA feature used for network load balancing and redundancy with FCP and iSCSI protocols, so all the ports on the node where data located are reported to clients as an active optimized (preferred), and if there are more than one port, ALUA will make sure hosts will load balance between them. And similarly it works with ANA in NVMe. While all other network ports on all other nodes in the cluster are reported by ONTAP to hosts as active none-optimized, so in case of one port or entire node goes down, the client will have access to its data using non-optimized path. Starting with ONTAP 8.3 Selective LUN Mapping (SLM) was introduced to reduce the number of unnecessary paths to the LUN and removes non-optimized paths to the LUN through all other cluster nodes except for HA partner of the node owning the LUN so cluster will report to the host paths only from the HA pair where LUN is located. Because ONTAP provides ALUA/ANA functionality for SAN protocols, SAN network LIFs do not migrate like with NAS protocols. When volume or LUN migration is finished, it is transparent to the storage system’s clients because of ONTAP Architecture and can cause temporary or permanent data indirect access through ONTAP Cluster interconnect (HA-CI is not in use for such situations) which will slightly increase latency for the clients. SAN LIFs used for FC, FCoE, iSCSI & FC-NVMe protocols.

iSCSI LIFs can live on the same ifgroup, port or VLAN with NAS LIFs since both using Ethernet ports.

On the image below pictured ONTAP cluster with 4 nodes (2 HA pairs) and a host accessing data over a SAN protocol.

Read more about Zoning for ONTAP clusters here.

VIP LIF

VIP (Virtual IP) LIFs, also known as BGP LIFs, require Top-of-the-Rack BGP Router to be used. VIP data LIFs used with ethernet for NAS environment. VIP LIFs, automatically load-balance traffic based on routing metrics and avoids inactive unused links and ports, unlike it usually happens with NAS protocols. VIP LIFs provides distribution across all the LIFs in the cluster, not limited to a single node as in NAS LIFs. VIP LIFs provides smarter load balance than it was realized with hash algorithms in Ethernet Port Channel & LACP with interface groups. VIP LIF interfaces are tested and can be used with MCC and SVM-DR and provide a more reliable, predictable and faster switch to the survived links & paths than NAS LIFs but require BGP routers.

Management interfaces

Node management LIF interface can migrate with associated IP address across Ethernet ports of a single node and available only while ONTAP running on the node. Usually, management interface placed on e0M port of the node; Node management IP sometimes used by cluster admin to communicate with a node to cluster shell in rare cases where commands have to be issued from a particular node. Cluster Management LIF interface with associated IP address available only while the entire cluster is up & running and by default can migrate across Ethernet ports, often located on one of the e0M ports on one of the cluster nodes and used by the cluster administrator for storage management. Management interfaces used for API communications, HTML GUI & SSH console management, by default SSH, connect administrator with cluster shell. Service Processor (SP) or BMC interfaces available only at hardware appliances like FAS & AFF, and each system has only SP or BMC. SP/BMC allows SSH out-of-band console communications with an embedded small computer installed on controller main-board and similarly to IPMI, or IP KVM enables to connect, monitor & manage controller even if it does not boot ONTAP OS. With SP/BMC it is possible to forcibly reboot or halt a controller and monitor coolers, temperature, etc.; once connected to SP/BMC console by SSH administrator can switch to cluster shell through it with issuing system console command; each controller has one SP/BMC interface which does not migrate like some other management interfaces. Usually, e0M and SP both lives on single management (wrench) physical Ethernet port but each have its own dedicated MAC address. Node LIFs, Cluster LIF & SP/BMC often using the same IP subnet. SVM management LIF, similarly to cluster management LIF, it can migrate across all the Ethernet ports on the nodes of the cluster and dedicated for a single SVM management. SVM LIF does not have GUI capability and can facilitate only for API Communications & SSH console management; SVM management LIF can live on e0M port but often placed by administrators on a data port in the cluster and usually on a dedicated management VLAN and can be different from IP subnets of node & cluster LIFs.

Cluster interfaces

Each cluster interconnect LIF interface usually lives on dedicated Ethernet port and cannot share ports with management and data interfaces. Cluster interconnect interfaces used for horizontal scaling functionality at times when, for example, a LUN or a Volume migrates from one node of the cluster to another node; cluster interconnect LIF similarly to node management LIFs can migrate only between ports of a single node. A few cluster interconnect interfaces can coexist on a single port, but usually, this happens temporarily because of cluster port recabling. Inter-cluster interface LIFs on another hand can live and share the same Ethernet ports with data LIFs and used for SnapMirror replication; Inter-cluster interface LIFs, similarly to node management & cluster interconnect LIFs can only migrate between ports of a single node.

Continue to read

How ONTAP Memory work

Zoning for ONTAP Cluster

Disclaimer

Please note in this article I described my own understanding of the internal organization of ONTAP systems. Therefore, this information might be either outdated, or I simply might be wrong in some aspects and details. I will greatly appreciate any of your contribution to make this article better, please leave any of your ideas and suggestions about this topic in the comments below.

All product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only.

4 thoughts on “How does the ONTAP cluster work? (part 3)”

  1. […] An SVM in ONATP cluster lives on all the nodes in the cluster. Each SVM separated one from another and used for creating a multi-tenant environment. Each SVM can be managed by a separate group of people or companies and one will not interfere with another. In fact they will not know about other existence at all, each SVM is like a separate physical storage system box. Read more about SVM here. […]

    Like

Leave a Reply