Why GCP Anthos on NetApp HCI is a big deal?

Google Cloud & NetApp announced a new validated design with GKE running on NetApp HCI on-premise.

Read what you might miss from NetApp announcements during Aug-Nov 2019 compressed into a single article.

Kubernetes was originally designed by Google, Google is one of the main contributors to Docker, and obviously the most advanced, mature & stable on the market. If you tried GKE in GCP & other competitive solutions, you know what I’m talking about.

Containers on-premises are difficult when you want to make Enterprise solution for new containerized applications on-premises for number of reasons: Installation, configuration, management, updates of your core infrastructure components, persistent & performant & predictable storage performance, DevOps do not want to deal with infrastructure they want just consume it. These are the key problems to solve and NetApp aims to do it.

NVA-1141: NetApp HCI with Anthos. NVA Design/Deployment

Bullet points why Google Anthos on NetApp HCI is an important announcement:

  • Hybrid cloud. NetApp according to its Data fabric vision, continue to bring hybrid cloud experience to its users in the flash. Now with Anthos on HCI your on-prem data center becomes just another cloud zone. Software updates for GKE & Anthos are on Google’s shoulders, you just consume it. Not just NetApp HCI maintenance like software & firmware updates can be bought as a service, but space as well. You can pay as you go & consume infrastructure as a service: OPEX instead of CAPEX by request with NetApp Keystone
  • NetApp Kubernetes Services (NKS) In addition to NetApp NKS which allows for the deployment & management of on-premises & in the cloud kubernetes clusters, Anthos provides the ability to deploy clusters on-prem and fully integrated with Google Cloud, including the ability to manage from the GKE console. NKS bundled with Istio, Helm & many other components for your microservices which puts DevOps to the next level. Cloud infrastructure on-premises reached your data center
  • Storage automation. NetApp Trident is literally the most advanced storage driver for containers at the market so far which brings automation, API and persistent storage to containerization world. NetApp Trident with NKS & Anthos totally make sense. Speaking about Automation, NetApp Ansible playbooks are also the most advanced on the market at the moment with 106 published & supported modules, and SolidFire itself is known as fully API-driven storage, so you can work with it solely through RESTful API
  • Simple, predictive and performant enterprise storage with QoS whether on-prem or in the cloud: use Trident and Ansible with NetApp HCI on-prem or CVO or CVS in AWS, Azure or GCP, moreover replicate your data to the cloud for DR or Test/Dev
  • NetApp HCI vs other HCI solutions. One of the most notable HCI competitor is Nutanix so I want to use it as an example. Nutanix’s storage architecture with local disk drives certainly interesting but not unique and obviously have some architectural disadvantages, scalability was one issue to name. Local disk drives are blessing & great news for tiny solutions and not so good of idea when you need to scale it up, cheapness of a small solution with commodity HW & local drives might turn into curse at scale. That’s why Nutanix eventually developed dedicated storage nodes connected over the network to overcome the issue while stepping to the very competitive lend of network storage systems. Because dedicated storage nodes connected over the network is not something new & unique for Nutanix, there are plenty of capable & scalable network storage systems out there. Therefore, most exciting part of Nutanix is their ecosystem & simplicity not the storage architecture though. Now thanks to Anthos, NetApp HCI get in to a unique position with scalability, ecosystem, simplicity, hybrid cloud & functionality for microservices where some other great competitors like Nutanix not reached yet, and that gives NetApp a momentum in the HCI market
  • Performance. Don’t forget about NetApp’s Max Data software which already working with VMware & SolidFire, it will take NetApp only one last step to bring DCPMM like Intel Optane to NetApp HCI. Note NetApp just announced on Insight 2019 a compute node with Intel Cascade Lake CPUs which required for Optane. Max Data is not available on NetApp HCI yet, but we can clearly see that NetApp putting everything together to make it happen. Persistent memory in form of a file system for a Linux host server with tiering for cold blocks to “slow” SSD storage can put NetApp on top of all the competitors in terms of performance

HCI Performance

Speaking about which, take a look on these two performance tests:

  1. IOmark-VM-HC: 5 storage & 18 compute nodes using data stores & VVols
  2. IOmark-VDI-HC: 5 storage nodes & 12 compute nodes with only data stores

Total 1,440 VMs with 3,200 VDI desktops.

Notice how asymmetrical number of storage nodes compared to compute nodes are, and in “Real” HCI architectures with local drives you have to have more equipment, while with NetApp HCI you can choose how much storage and how much compute resources you need and scale them separately. Dedup & compression were enabled in the tests.

Disclaimer

This article is for information purposes only, may contain errors and personal opinions. This text neither authorized nor sponsored by NetApp. If you have spotted an error, please let me know.

Why use NetApp snapshots even when you do not have Premium bundle software?

If you are extremely lazy and do not want to read any farther, the answer is “use snapshots to improve RPO and use ndmpcopy to restore files, LUNs and SnapCreator for app-consistent snapshots.

Premium bundle includes a good deal of software besides Base software in each ONTAP system, like:

  • SnapCenter
  • SnapRestore
  • FlexClone
  • And others.

So, without Premium bundle, with only Basic software we have two issues:

  • You can create snapshots, but without SnapRestore or FlexClone you cannot restore them quickly
  • And without SnapCenter you cannot make application consistent snapshot.

And some people asking, “Do I need to use NetApp snapshots in such circumstances?”

And my answer is: Yes, you can, and you should use ONTAP snapshots.

Here is the explanation of why and how:

Snapshots without SnapRestore

Why use NetApp storage hardware snapshots? Because they have no performance penalty and also no such a thing as snapshot consolidation which causes a performance impact. NetApp snapshots work pretty well and they also have other advantages. Even though it is not that fast as with SnapRestore or FlexClone to restore your data captured in snapshots, you can create snaps very fast. And most times, you need to restore something very seldom, so fast creation of snapshots with slow restoration will give you better RPO compare to a full backup. Of course, I have to admit that you improved RPO only for cases when your data were logically corrupted, and no physical damage was done to the storage because if your storage physically damaged, snapshots will not help. With ONTAP you can have up to 1023 snapshots per volume, and you can create them as fast as you need with no performance degradation whatsoever, which is pretty awesome.

Snapshots with NAS 

If we are speaking about NAS environment without SnapRestore license, you always can go to the .snapshot folder and copy any previous version of a file you need to restore. Also, you can use the ndmpcopy command to perform file, folder or even volume restoration inside storage without involving a host.

Snapshots with SAN 

If we are speaking about SAN environment without SnapRestore license, you do not have such ability as copying a file on your LUN and restore it. There are two stages in case you need to restore something on a LUN:

  1. You copy entire LUN from a snapshot
  2. And then you can either:
    • Restore entire LUN on the place of the last active version of your LUN
    • Or you can copy data from copied LUN to the active LUN.

To do that, you can use either ndmpcopy or lun copy commands to perform the first stage. And if you want to restore only some files from an old version of the LUN from a snapshot, you need to map that copy to a host and copy required data back to active LUN.

Application consistent storage snapshots 

Why do you need application consistency in the first place? Sometimes, in an environment like the NAS file share with doc files, etc., you do not need that at all. But if you are using applications like Oracle DB, MS SQL or VMWare you’d better have application consistency. Imagine you have a Windows machine and you are pulling hard drive while Windows is running, let’s forget for a moment that your Windows will stop working, this is not the point here, and let’s focus on data protection side of that. The same happens when you are creating a storage snapshot, data captured in that snapshot will be similarly not complete. Will the pulled off hard drive be a proper copy of your data? Kind of, right? Because some of the data will be lost in host memory and your FS probably will not be consistent, and even though you’ll be able to restore logged file system, your application data will be damaged in a way it hard to restore, because against of the data lost from host memory. Similarly, snapshots will contain probably damaged FS, if you try to restore from such a copy, your Windows might not start, or it might start after FS recheck, but your applications especially Data Bases definitely will not like such a backup. Why? Because most probably you’ll get your File System corrupted because applications and OS which were running on your machine didn’t have a chance to destage data from memory to your hard drive. So, you need someone who will prepare your OS & applications to create a backup. As you may know, application consistent storage hardware snapshots can be created by backup software like Veeam, Commvault, and many others, or you even can trigger a storage snapshot creation yourself with relatively simple Ansible or PowerShell script. Also, you can do application-consistent snapshots with free NetApp SnapCreator software framework, unlike SnapCenter, it does not have a simplistic and straight-forward application GUI wizards which help to walk you through with the process of integration with your app. Most times, you have to write a simple script for your application to benefit online & application-consistent snapshots, another downside that SnapCreator is not officially supported software. But at the end of the day, it is relatively easy setup, and it will definitely pay you off once you finish setting up.

List of other software features available in Basic software

This Basic ONTAP functionality also might be useful: 

  • Horizontal scaling, nod-disruptive operations such as online volume & LUN migration, non-disruptive upgrade with adding new nodes to the cluster
  • API automation
  • FPolicy file screening
  • Create snapshots to improve RPO
  • Storage efficiencies: Deduplication, Compression, Compaction
  • By default ONTAP deduplicate data across active file system and all the snapshots on the volume. Savings from the snapshot data sharing is a magnitude of number of snapshots: the more snapshots you have, the more savings you’ll have
  • Storage Multi-Tenancy
  • QoS Maximum
  • External key manager for Encryption
  • Host-based MAX Data software which works with ONTAP & SAN protocols
  • You can buy FlexArray license to virtualize 3rd party storage systems
  • If you have an All Flash system, then you can purchase additional FabricPool license which is useful especially with snapshots, because it is destaged cold data to cheap storage like AWS S3, Google Cloud, Azure Blob, IBM Cloud, Alibaba Cloud or on-premise StorageGRID system, etc.

Summary

Even Basic software has a reach functionality on your ONTAP system, you definitely should use NetApp snapshots, and set up application integration to make your snapshot application consistent. With hardware NetApp storage snapshots, you can have 1023 snapshots per volume, create them as fast as you need without sacrificing storage performance, so snapshots will increase your RPO. Application consistency with SnapCreator or any other 3rd party backup software will build confidence that all the snapshots can be restorable when needed.