NetApp & Rubrik announced collaboration. First StorageGRID can be a target for Rubrik archives. And second Rubrik now supports NetApp SnapDiff API. SnapDiff API is a technology in ONTAP which compares two snaps and gives a list of files changed so Rubrik can copy only changed files. While Rubrik is not the first in working with NetApp SnapDiff APIs, others like Catalogic, Commvault, IBM (TSM) and Veritas (NetBackup) can work with it as well, but Rubrik is the first one with backing up data to a public cloud. Will be available in Rubrik Cloud Data Management (CDM) v5.2 in 2020.
An automated installation and deployment of Grafana, NetApp E-Series Web Services, and supporting software for performance monitoring of NetApp E-Series Storage Systems. NetApp intend this project to allow you to quickly and simply deploy an instance of our performance analyzer for monitoring your E-Series storage systems. We incorporate various open source components and tools in order to do so. While they primarily intend it to serve as a reference implementation for using Grafana to visualize the performance of your E-Series systems, I also can be customizable and extensible based on your individual needs.
Kubernetes was originally designed by Google, Google is one of the main contributors to Docker, and obviously the most advanced, mature & stable on the market. If you tried GKE in GCP & other competitive solutions, you know what I’m talking about.
Containers on-premises are difficult when you want to make Enterprise solution for new containerized applications on-premises for number of reasons: Installation, configuration, management, updates of your core infrastructure components, persistent & performant & predictable storage performance, DevOps do not want to deal with infrastructure they want just consume it. These are the key problems to solve and NetApp aims to do it.
Bullet points why Google Anthos on NetApp HCI is an important announcement:
Hybrid cloud. NetApp according to its Data fabric vision, continue to bring hybrid cloud experience to its users in the flash. Now with Anthos on HCI your on-prem data center becomes just another cloud zone. Software updates for GKE & Anthos are on Google’s shoulders, you just consume it. Not just NetApp HCI maintenance like software & firmware updates can be bought as a service, but space as well. You can pay as you go & consume infrastructure as a service: OPEX instead of CAPEX by request with NetApp Keystone
NetApp Kubernetes Services (NKS) In addition to NetApp NKS which allows for the deployment & management of on-premises & in the cloud kubernetes clusters, Anthos provides the ability to deploy clusters on-prem and fully integrated with Google Cloud, including the ability to manage from the GKE console. NKS bundled with Istio, Helm & many other components for your microservices which puts DevOps to the next level. Cloud infrastructure on-premises reached your data center
Storage automation. NetApp Trident is literally the most advanced storage driver for containers at the market so far which brings automation, API and persistent storage to containerization world. NetApp Trident with NKS & Anthos totally make sense. Speaking about Automation, NetApp Ansible playbooks are also the most advanced on the market at the moment with 106 published & supported modules, and SolidFire itself is known as fully API-driven storage, so you can work with it solely through RESTful API
Simple, predictive and performant enterprise storage with QoS whether on-prem or in the cloud: use Trident and Ansible with NetApp HCI on-prem or CVO or CVS in AWS, Azure or GCP, moreover replicate your data to the cloud for DR or Test/Dev
NetApp HCI vs other HCI solutions. One of the most notable HCI competitor is Nutanix so I want to use it as an example. Nutanix’s storage architecture with local disk drives certainly interesting but not unique and obviously have some architectural disadvantages, scalability was one issue to name. Local disk drives are blessing & great news for tiny solutions and not so good of idea when you need to scale it up, cheapness of a small solution with commodity HW & local drives might turn into curse at scale. That’s why Nutanix eventually developed dedicated storage nodes connected over the network to overcome the issue while stepping to the very competitive lend of network storage systems. Because dedicated storage nodes connected over the network is not something new & unique for Nutanix, there are plenty of capable & scalable network storage systems out there. Therefore, most exciting part of Nutanix is their ecosystem & simplicity not the storage architecture though. Now thanks to Anthos, NetApp HCI get in to a unique position with scalability, ecosystem, simplicity, hybrid cloud & functionality for microservices where some other great competitors like Nutanix not reached yet, and that gives NetApp a momentum in the HCI market
Performance. Don’t forget about NetApp’s Max Data software which already working with VMware & SolidFire, it will take NetApp only one last step to bring DCPMM like Intel Optane to NetApp HCI. Note NetApp just announced on Insight 2019 a compute node with Intel Cascade Lake CPUs which required for Optane. Max Data is not available on NetApp HCI yet, but we can clearly see that NetApp putting everything together to make it happen. Persistent memory in form of a file system for a Linux host server with tiering for cold blocks to “slow” SSD storage can put NetApp on top of all the competitors in terms of performance
Speaking about which, take a look on these two performance tests:
Notice how asymmetrical number of storage nodes compared to compute nodes are, and in “Real” HCI architectures with local drives you have to have more equipment, while with NetApp HCI you can choose how much storage and how much compute resources you need and scale them separately. Dedup & compression were enabled in the tests.
This article is for information purposes only, may contain errors and personal opinions. This text neither authorized nor sponsored by NetApp. If you have spotted an error, please let me know.
When used without Max Recovery functionality, NetApp recommends to place DB data files to MAX FS and keep snapshots (MAX Snap) enabled there, then place DB logs on a separate LUN B on ONTAP system. In this case, if persistent memory or the server will be damaged, it will be possible to fully restore data with recovering from a MAX Data snapshot on LUN A and then roll-out latest transactions in the logs from the LUN B to the DB.
Pros & Cons: In this case, transactions executed fastly but confirmed to clients with speed of logs stored on the LUN B, also restoration process might take some time due to storage LUN speed usually much slower than persistent memory. Cheaper since only one server with persistent memory required.
When logs need to be placed on fast MAX Data FS with DB data files to increase overall performance (decrease latency) of the transaction (execution time + confirmation time to clients), NetApp recommends using Max Recovery Functionality which copies data from the primary server’s persistent memory synchronously to a recovery server’s persistent memory.
Pros & Cons: In this case, if a primary server will lose its data due to a malfunction, data can be fastly recovered back to primary server over RDMA connection from the Persistent Memory Tier and restore the primary server normal functioning which takes less time than the first configuration. If data going to be restored completely from a storage it might take a few hours on 1 TB of data versus 5-10 minutes with Max Recovery. Transaction execution latency a bit worse in this configuration for a few microseconds due to added network latency for synchronous replication, but overall transaction latency (execution + client confirmation) is much better than in the first configuration because entire DB including data files and logs stored on the fast persistent memory tier. Those few additional microseconds latency to transaction execution time is a relatively small price in terms of overall transaction latency. Max Recovery requires the other server with the same or greater amount of persistent memory & RDMA connection thus adds costs to the solution, but provide better protection and restoration speed in case of the primary server malfunction. The second configuration provide much better overall transaction latency than if logs placed on a storage LUN.
Some thoughts about RAM
Speaking about MAX Data configuration with enabled MAX Snap where you are putting your DB logs on a dedicated LUN (first configuration). It put me to thinking, what if we use this configuration with ordinary memory instead of Optane?
Of cause, there will be disadvantages, same as in first configuration, but there will be some pros as well: 1) In case of a disaster, all data in the RAM will be lost, so we will need to restore from MAX Snap and then roll out DB logs from the LUN, which will take some time 2) Transaction confirmation speed will be equal to the speed of LUN with logs. However, Transaction execution will be done with the speed of RAM 3) Price for RAM is higher. However, on another hand, you do not need new “special” servers with special CPUs