935 shaares
171 private links
171 private links
In this post my intention is just to give some quick points on QRadar High Availability (HA)
1. HA Overview
- Uses Primary and Secondary HA hosts
- Uses Virtual IPs
- Network connectivity is tested via hearbeat (pings) to all managed hosts
- HA Can be configured for either console or managed host
- Both devices must have the same versions of the software
- Both devices must support the same DSM, scanner and protocols RPMs
- Uses data synchronization or shared external storage
- Consistency is maintained locally by using Distributed Replicated Block Device (DRDB)
- If using external storage data consistency is maintained through iSCSI or Fibre Channel
- Data is synchronized in real time
- Note: Asset profiler can impact DRDB speed
- "/store" partition on secondary is automatically replicated to the secondary host
- Ensure min 1 Gbps between primary and secondary HA hosts
- Initial synchronization can take greater than 24 hours
This may be an understatement. I've seen initial synchronization take upwards of 72 hours.
- Secondary host goes into "standby" after synchronization
- Primary HA hosts status becomes "offline" when restored from a failover
- Primary needs to be placed "online" before it becomes active
- Disk replication is enabled while primary is "offline"
- Post disk failover synchronization is faster
- Basically uses deltas
- When the primary host is restored, only the data collected by the secondary during the period the primary was unavailable is synchronized
- Replacing or reformating the disk on the primary can result in longer synchronization time in the event of a failback
IP Considerations
- Uses Virtual IPs
- Needs 3 IP address - VIP, Primary and Secondary
- The IP address initially configured on the primary host is automatically made the cluster VIP
- A new IP will need to be assigned to the primary once HA configuration is started
- Primary host can act as a standby for secondary
- VIP is used by a host that has a status of active
- All IPs must be in the same subnet
- Latency must be less than 2ms for traffic crosing the WAN
HA Wizard
- Used to configure Primary, Secondary and cluster VIP
- Verifies the secondary has a valid HA activation key
- Verifies the secondary is not part of an existing HA cluster
- Verifies software version is the same on both devices
- Verifies external storage (if configured) on primary and then secondary
- Verifies both support the same DSM, scanner and protocol RPMS
Failover scenarios
- Power supply failure
- network failure (detected by connectivity tests)
- OS malfunction that delays or stops hearbeat tests
- RAID failure
- Manual failover
- Management interface failure on primary hosts
- Primary does not take back its role as primary in the case of a failover.
- Secondary stays as primary while primary acts as standy
- Primary must be switched to "active" to take over its role
- No failover for software errors or disk capacity issues
- If both primary and secondary are unable to ping a managed hosts no failover occurs
- If primary cannot but secondary can ping a managed host, failover occurs
HA Failover event sequence
- File systems are mounted
- Management interface alias is created eth0 is eth0:0
- VIP is assigned to the alias
- QRadar services are started
- Secondary connects to console and downloads configuration files
Tips for manual synchronization
- Ensure primary and secondary hosts are sync'd
- Secondary must be in standby
- Secondary to offline and power off the primary
- DO NOT MANUALLY FORCE FAILOVER DURING PATCHES OR SOFTWARE UPGRADES
2. HA Planning
- File systems on both devices much match - ext-3, etc
- Secondary's "/store" partion must be equal to or greater than the primary
- Both devices should have the same number of interfaces
- Both must use the same management interface
- Only 1 VIP
- Port 7789 is needed for Distributed Replicated Block Device (DRDB)
- DRBD traffic is bidirectional
- Disk replication ensures software updates are applied to the secondary
- Ensure the host has a valid activation key
3. HA Management
- Uses System and License management window to:
- monitor HA
- Force failover
- Disconnect cluster
- Modify cluster settings
- Modify heartbeat interval
- Place the device in "offline" mode before maintenance