171 private links
In this post my intention is just to give some quick points on QRadar High Availability (HA)
1. HA Overview
- Uses Primary and Secondary HA hosts
- Uses Virtual IPs
- Network connectivity is tested via hearbeat (pings) to all managed hosts
- HA Can be configured for either console or managed host
- Both devices must have the same versions of the software
- Both devices must support the same DSM, scanner and protocols RPMs
- Uses data synchronization or shared external storage
- Consistency is maintained locally by using Distributed Replicated Block Device (DRDB)
- If using external storage data consistency is maintained through iSCSI or Fibre Channel
- Data is synchronized in real time
- Note: Asset profiler can impact DRDB speed
- "/store" partition on secondary is automatically replicated to the secondary host
- Ensure min 1 Gbps between primary and secondary HA hosts
- Initial synchronization can take greater than 24 hours
This may be an understatement. I've seen initial synchronization take upwards of 72 hours.
- Secondary host goes into "standby" after synchronization
- Primary HA hosts status becomes "offline" when restored from a failover
- Primary needs to be placed "online" before it becomes active
- Disk replication is enabled while primary is "offline"
- Post disk failover synchronization is faster
- Basically uses deltas
- When the primary host is restored, only the data collected by the secondary during the period the primary was unavailable is synchronized
- Replacing or reformating the disk on the primary can result in longer synchronization time in the event of a failback
IP Considerations
- Uses Virtual IPs
- Needs 3 IP address - VIP, Primary and Secondary
- The IP address initially configured on the primary host is automatically made the cluster VIP
- A new IP will need to be assigned to the primary once HA configuration is started
- Primary host can act as a standby for secondary
- VIP is used by a host that has a status of active
- All IPs must be in the same subnet
- Latency must be less than 2ms for traffic crosing the WAN
HA Wizard
- Used to configure Primary, Secondary and cluster VIP
- Verifies the secondary has a valid HA activation key
- Verifies the secondary is not part of an existing HA cluster
- Verifies software version is the same on both devices
- Verifies external storage (if configured) on primary and then secondary
- Verifies both support the same DSM, scanner and protocol RPMS
Failover scenarios
- Power supply failure
- network failure (detected by connectivity tests)
- OS malfunction that delays or stops hearbeat tests
- RAID failure
- Manual failover
- Management interface failure on primary hosts
- Primary does not take back its role as primary in the case of a failover.
- Secondary stays as primary while primary acts as standy
- Primary must be switched to "active" to take over its role
- No failover for software errors or disk capacity issues
- If both primary and secondary are unable to ping a managed hosts no failover occurs
- If primary cannot but secondary can ping a managed host, failover occurs
HA Failover event sequence
- File systems are mounted
- Management interface alias is created eth0 is eth0:0
- VIP is assigned to the alias
- QRadar services are started
- Secondary connects to console and downloads configuration files
Tips for manual synchronization
- Ensure primary and secondary hosts are sync'd
- Secondary must be in standby
- Secondary to offline and power off the primary
- DO NOT MANUALLY FORCE FAILOVER DURING PATCHES OR SOFTWARE UPGRADES
2. HA Planning
- File systems on both devices much match - ext-3, etc
- Secondary's "/store" partion must be equal to or greater than the primary
- Both devices should have the same number of interfaces
- Both must use the same management interface
- Only 1 VIP
- Port 7789 is needed for Distributed Replicated Block Device (DRDB)
- DRBD traffic is bidirectional
- Disk replication ensures software updates are applied to the secondary
- Ensure the host has a valid activation key
3. HA Management
- Uses System and License management window to:
- monitor HA
- Force failover
- Disconnect cluster
- Modify cluster settings
- Modify heartbeat interval
- Place the device in "offline" mode before maintenance
Administrators looking for the latest firmware downloads can review this page to locate firmware updates for QRadar appliances. The installation instructions include a direct download link to the firmware from IBM Fix Central.
How do I locate known issues or open defects logged against QRadar?
What is the impact of initiating a Deploy Full Configuration on QRadar systems?
After Administrative actions a "Deploy Changes" may be required. This article provides information on when to either perform a "Deploy" or "Deploy Full Configuration" and their impact on your QRadar services.
How can you determine the physical specifications of an appliance?
I provide support for european customers of QRadar and all of the family products (QVM, QRM, QRIF, QNI). Nowadays, I am more IT consultant than Civil engineer (MSc Eng), but I still have great passion for all reinforced concrete construction around the world.
The version of the DSA utility differs based Operating systems and appliance Model types. QRadar 7.2.x uses a different build than QRadar 7.3.x. M3 and M4 appliances use a different build of the DSA than M5+ appliances. This technote lists the builds required for your base Operating and Appliance type.
QRadar: HA synchronization progress resets to 0%
What are the sequence of events during an High-Availability (HA) failover and how are these experienced?
Event categories are used to group incoming events for processing by IBM® Security QRadar®. The event categories are searchable and help you monitor your network.
Is there a way to test the high-availability (HA) crossover connection?
You can manually map a normalized or raw event to a high-level and low-level category (or QID).
Some tips about problems most of us should see in our network and security jobs, including troubleshooting, configurations, correlation rules, compliance stuff etc.
Introduction & Processes to upgrading QRadar software.
Asset exclusion rules that are tuned to exclude IP addresses from the blacklist.
Events are being dropped on Console with Pipeline NATIVE_To_MPC messages.
List of Open Mic events and presentations.
What does Rsync do in a High Availability appliance?
How does QRadar HA peers replicate data between Cluster nodes?