Enterprise-grade High-availability with Low-complexity and Low Cost
SoftNAS™ SNAP HA™ High Availability delivers a low-cost, low-complexity solution for high-availability clustering that is easy to deploy and manage. A robust set of HA capabilities protect against data center, availability zone, server, network and storage subsystem failures to keep your business running without downtime. SNAP HA for Amazon Web Services includes patent-pending Elastic HA™ technology, providing NAS clients in any availability zone uninterrupted HA access to the storage cluster across availability zones.
SNAP HA monitors all critical storage components, ensuring they remain operational and when there is an unrecoverable failure in a system component, another storage controller detects the problem and automatically takes over, ensuring no downtime or business impacts occur.
SNAP HA works hand in hand with all of other the SoftNAS data protection features, including scheduled snapshots, RAID, automatic error detection and recovery, reducing your operational costs and boosting storage efficiency.
High availability protects companies from lost revenue when access to their data resources and critical business applications would otherwise be disrupted with features that:
1) Protect against unplanned storage outages 24 x 7 x 365.
2) Provide disaster recovery capabilities to quickly resume mission-critical operations in the event of a disaster (e.g., across availability zones or data centers)
3) Ensure failed components are quickly and automatically identified and isolated, so they do not cause data loss, application errors or downtime
4) Replicate data so there is an up-to-the-minute copy of any changes that have taken place
5) Prevent outdated or incorrect data from be made available due to multiple failures across the storage environment
6) Assure business owners that applications and IT infrastructure continue operating uninterrupted by unexpected failures in the storage environment
7) Enable IT administrators to take storage systems offline for maintenance and repair, without disrupting production IT systems or applications.
Minimize Downtime from Host and Storage Failures
SoftNAS™ SNAP HA™ High Availability delivers the availability required by mission-critical applications running in virtual machines and cloud computing environments, independent of the operating system and application running on it. HA provides uniform, cost-effective failover protection against hardware and operating system outages within your virtualized IT and cloud computing environment. HA:
Extend and Enhance Data Protection Across Your Enterprise Infrastructure
Most availability solutions are tied to specialized hardware or require complex setup and configuration. In contrast, an IT administrator configures SoftNAS HA with a few clicks from within the SoftNAS StorageCenter™ client interface. With simple configuration and minimal resource requirements, SNAP HA lets you:
-
Provide uniform, automated data protection and availability for all applications without modifications to the application or guest operating system
-
Establish a consistent first line of data protection defense for your entire IT infrastructure
-
Protect data and applications that have no other failover options, which might otherwise be left unprotected and subject to extended outages and downtime
-
In Amazon Web Services cloud environment, provide storage HA across AWS availability zones
-
Compatible with SoftNAS advanced NFS file servers, Windows CIFS file servers iSCSI SAN servers.
Highly-available NAS Services
SoftNAS SNAP HA provides NFS, CIFS and iSCSI services via redundant storage controllers. One controller is active, while another is a standby controller. Block replication transmits only the changed data blocks from the source (primary) controller node to the target (secondary) controller. Data is maintained in a consistent state on both controllers using the ZFS copy-on-write filesystem, which ensures data integrity is maintained. In effect, this provides a near real-time backup of all production data (kept current within 1 to 2 minutes).
Storage Monitoring
A key component of SNAP HA is the HA Monitor. The HA Monitor runs on both nodes that are participating in SNAP HA™. On the secondary node, HA Monitor checks network connectivity, as well as the primary controller's health and its ability to continue serving storage. Faults in network connectivity or storage services are detected within 10 seconds or less, and an automatic failover occurs, enabling the secondary controller to pick up and continue serving NAS storage requests, preventing any downtime.
Storage Failover
Once the failover process is triggered, either due to the HA Monitor (automatic failover) or as a result of a manual takeover action initiated by the admin user, NAS client requests for NFS, CIFS and iSCSI storage are quickly re-routed over the network to the secondary controller, which takes over as the new primary storage controller. Takeover on VMware typically occurs within 20 seconds or less. On AWS, it can take up to 30 seconds, due to the time required for network routing configuration changes to take place.
Operation in AWS Virtual Private Cloud
In AWS, SNAP HA is applied to SoftNAS storage controllers running in a Virtual Private Cloud (VPC). It is recommend to place each controller into a separate AWS Availability Zone (AZ), which provides the highest degree of underlying hardware infrastructure redundancy and availability.
Each AZ operates on a separate subnet; e.g.. 10.0.1.0/24 and 10.1.0.0/24 (you choose how to organize your subnet addresses in the VPC). An elastic IP provides NAS clients across all AZ's access to HA storage. Elastic IP's are the only IP's available today capable of re-routing network traffic across AZ's. SoftNAS SNAP HA enhances the standard elastic IP provided by AWS, creating a patent-pending "Elastic HA(tm)" (EIP). Elastic HA IP's are managed by the HA controller, ensuring NAS client traffic is properly routed to the active primary storage controller at all times.
HA storage traffic uses a dedicated network interface (interface 1), which further isolates storage traffic.
There's a common misconception that elastic IP's are only useful for Internet-based access to EC2 instances. While that is the most common use case by far, Elastic HA IP addresses are typically configured using a Security Group which restricts access within the AZ private network only. This prevents any possible Internet-based access to Elastic HA IP's.
VPC's can also be configured for use with VPN's, which enables secure access from an administrator's office location to the private network (no other inbound Internet access is typically available). It is possible to attach optional elastic IP addresses to interface 0 on each SoftNAS controller instance for remote administration (restricted IP range access recommended).
Operation in VMware and Hyper-V Private Clouds
On VMware and Hyper-V, it is common to dedicate a non-routable VLAN to storage traffic. The storage VLAN segregates primary storage traffic (e.g., VMDK's attached to VM's over NFS or iSCSI) from other traffic. Data replication traffic can also be placed on its own separate non-routed VLAN. SoftNAS StorageCenter is typically placed on a routable VLAN (the default network), where it can be readily accessed by admins from a web browser from anywhere within the organization (or via a VPN).
A Virtual IP (VIP) address is employed to route NAS client traffic to the primary storage controller. In the event of a failover or takeover, the VIP is reassigned to the other controller, which immediately re-routes NAS client traffic to the proper controller.
High-integrity Data Protection
A number of measures are taken to ensure the highest possible data integrity of the HA storage system. An independent "witness" HA controller function ensures there is never a condition that can result in what is known as "split-brain", where a controller with outdated data is accidentally brought online. SNAP HA prevents split-brain using a number of industry-standard best practices, including use of a 3rd party witness HA control function that tracks which node contains the latest data. On AWS, shared data stored in highly-redundant S3 storage is used. On VMware and Hyper-V, a separate HA Controller VM is used.
Another HA feature is "fencing". In the event of a node failure or takeover, the downed controller is shut down and fenced off, preventing it from participating in the cluster until any potential issues can be analyzed and corrected, at which point the controller can be admitted back into the cluster.
Finally, data synchronization integrity checks prevent accidental failover or manual takeover by a controller which contains data which is out of date.
The combination of high-integrity features built into SNAP HA ensures data is always protected and safe, even in the face of unexpected types of failures or user error.
Scales to Hundreds of Millions of Files
SNAP HA has been validated in real-world enterprise customer environments and is proven to handle hundreds of millions of files efficiently and effectively. The use of block replication instead of file replication supports hundreds of millions of files and directories.