7. Storage: You must use shared storage that is compatible with Windows Server 2008 R2
8. Network adapters and cable (for network communication): The network hardware, like other components in the failover cluster solution, must be marked as "Certified for Windows Server 2008 R2." If you use iSCSI, your network adapters should be dedicated to either network communication or iSCSI, not both
9. Account for administering the cluster: When you first create a cluster or add servers to it, you must be logged on to the domain with an account that has administrator rights and permissions on all servers in that cluster. The account does not need to be a Domain Admins account—it can be a Domain Users account that is in the Administrators group on each clustered server. In addition, if the account is not a Domain Admins account, the account (or the group that the account is a member of) must be delegated Create Computer Objects and Read All Properties permissions in the domain
22. It’s the very first thing you do!http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx#BKMK_understanding_tests
23.
24. Windows Server 2008 ile yeni bir Quorum modeli mevcut (Node and Disk Majority), bu sefer Quorum diskin kullanımı biraz farklı oluyor: Quorumu node sayısı ile beraber bir oy hakkı olarak kullanıyoruz..
25. Majority Node Set MNS demokratik bir sistemdir. Quorum da sadece bir oy var ise ve buna sahiplenen cluster a sahiplenebiliyorsa, MNS de çoğunluk clustera sahiplenir. Mesela 5 nodelu cluster da split brain senaryosu yaşanırsa her node toplam kaç node ila haberleşebildiğine bakar. Bir node iki node ile haberleşebiliyorsa, 3 node 5 nodedan çoğunluğu oluşturur ve cluster sahiplenir. Diğer iki node azınlıkta olduklarını anlar ve diğer 3 node un haberleşebildiğini varsayarlar.
29. 4 Quorum TypesNode majority Node and File Share majority Disk only (not recommended) Node and Disk majority Vote Vote Vote Vote Vote
30.
31. No Majority: Disk Only is not recommended, because of the disk subsystem’s single point of failure
32.
33. File Share Witness içerisine de clusdb kopyalanmaktadır. When the computer is started, the Cluster Disk Driver (Clusdisk.sys) reads the following local registry key to obtain a list of the signatures of the shared disks under cluster management:HKEY_LOCAL_MACHINEYSTEMurrentControlSeterviceslusDiskarameters ignatures Recommandation private only hb public mix olmalı
43. Disk üzerinde Turn On maintanence for this disk işaretlersek is alive ve looks alive işlemleri yapılmayacaktır yani diskin statusunu kontrol etmeyecek, diske erişim yapmayacak (içerisine dir çekme) cluster servisi devamli online oldugunu farzeder. The Resource Hosting Subsystem (RHS) conducts periodic health checks of all cluster resources to ensure they are functioning properly. This is accomplished by executing IsAlive and LooksAliveprocesses which are specific to the type of resource
44. Failover Süreci 2 node birbirine ulaşamadiği durumda quarum diskine erişmeye çalışır bu duruma arbitration process denilir. Clusdisk.sys dosyası nodeların ikisininde disklere erişimin engellemek için yönetimi yapar. MNS mimarisi ile birlikte quarum bilgisi register replikasyonu ile sağlanmaktadır. Bu dosyalara %indowsystem32onfig altından erişilebilinir. Cluster açılması esnasında clusdb dosyasını registryden download edilerek cluster işletimi çalışmaya başlar. Bu konfigürasyon dosyasında hangi disklere erişebileceğinin bilgisi yer almaktadır..
45. Cluster Komponentleri OBJECT MANAGER (clussvc.exe) (OM) Şu anki configurasyonu tutar HOST MANAGER (HM) Host ekleme çıkarma, node faile görme, modüller ile birlikte çalışıyor, cluster ayağa kalktı,kim cevap verirse 3343 üzeridnen onunla konuşuyor MEMBERSİP MANAGER (MM) Hklm clussvc altına lokalde yazar sonrada gider object managere ilertir OM bunu ram üzerine alır, Join oldu, evict oldu, MM bunu kayıt altına alır, bilgi paylaşımını sağlar GLOBAL UPDATE MANAGER (GUM) Bütün değişikilklerin replikasyonundan sorumludur Backup – VSS çalışıyor bilgisini diğer nodelar üzerine bildiri böylelikle diğer nodelar üzerinde değişklik yapmanın önüne geçer Tüm updatelerden sorumlu RESOURCE CONTROL MANAGER (RCM) Rsh.exe ile çalışır Dependencilerden bu sorumlu En baba modül :P TOPOLOGY MANAGER NETWORK MANAGER (nm) / INTERFACE MANGER (im) Nic up / fail DATABASE MANAGER Replikasyondan sorumlu Gup.mang. üzerinden yapıyor Logu tutan dm yapmaktadır Registry. Clusdb yüklenmektedir. QUORUM MANAGER Quorum oluştumu, oluşmadımı Hangi quorum modeli olmakta ona bakar Doğru replikeyi seçmekten o sorumlu RCM ile konuşabilir, quoarum oluşruramıyoruz rcm devreye sokup diyoruz ki nerede ise quorum oluşturacaz bize bir vote verebilir misin, 1 eksik miyiz. SECURİTY MANAGER Encryption, kerberos ilişkileri
46. Microsoft Failover Cluster Virtual Adapter Microsoft Cluster ortamlarda “Microsoft Failover Cluster Virtual Adapter” adında bir interface oluşturur, hidden bir interface’dir NetFT (Network Faut Tolerant) dosyasını simüle eder, clusterlar arası iletişimi yürütür, heartbeat için bir redundancy sağlar. Bu interface mevcut interface üzerine bind olur smb’den SAN’e olan trafik bu kart üzerinde utilize edilir. NetFT, ipconfig /All üzerinden görülür kendisine APIPA adresi tahsis (169.254.1.2) eder, bu ip üzerinden aslında data transferi yapılmaz bu IP fiziksel kart üzerine bind olduğunda TM üzerinden utilizasyon görülmektedir.
47. Failover Cluster Kurulum Adımları Failover Cluster Prerequisites Establish a Network Naming Convention TCP/IP Network Configuration Public Network Storage Network Heartbeat Network Procedures Prepare the Failover Cluster Create a Domain User Account Add Nodes to an Active Directory Domain Expose Storage to Cluster Nodes Install the Failover Cluster Feature Run Cluster Validation Create and Configure the Failover Cluster Create a Cluster Set Cluster Network Properties and Apply Naming Convention Create a Highly Available Services -> Create a Highly Available iSCSI Target Configuring Windows Firewall for Microsoft iSCSI Software Target Installing the Microsoft iSCSI Software Target Create the Failover iSCSI Target Resource Group Create an iSCSI Target in the Microsoft iSCSI Target MMC Create and Configure Virtual Disks Connect Initiators Testing Your Failover Cluster Configuration Server Core Installation Option of Windows Server 2008 Step-by-Step Guide: http://technet2.microsoft.com/windowsserver2008/en/library/47a23a74-e13c-46de-8d30-ad0afb1eaffc1033.mspx?mfr=true
48. Troubleshooting Reviewing cluster events Reviewing hardware events Using the Validate a Configuration Wizard Reviewing storage/SAN events Troubleshooting methodologies for cluster issues, whether in Windows 2003 or Windows 2008, are fairly similar. Most of the typical support issues in the cluster category fall under the following categories: · Cluster Service fails to start. · Cluster resources in a failed state or fail to come online. · Determine root cause of cluster failure. · Initial configuration of the cluster The Win 2003 legacy CLUSTER.LOG text file no longer exists. In Win 2008 the cluster log is handled by the Windows Event Tracing (ETW) process. This is the same logging infrastructure that handles events for other aspects you are already well familiar with, such as the System or Application Event logs you view in Event Viewer. Command Line c:gt;cluster log /gen Powershell C:S> Get-ClusterLog ForceQuorum net start clussvc /forcequorum (or /fq)
49.
50. Cluster Eventları Cluster Events Recent Cluster Events üzerinde son 24 saate ait eventlar görünmektedir. Monitoring Cluster Events Fully featured Failover Cluster Management Packs Cluster logging level Set-ClusterLog –level 3
51. Configuring Debug Logging Logging enabled by default Log files stored as .ETL in: %WinDir%ystem32inevtogsicrosoft-Windows-FailoverClustering Default log size is 100 MB Set-Clusterlog –Size 100 Default log level is 3 Set-Clusterlog –Level 3 Up to three log files This means log history can be kept for up to three reboots The number of logs can be modified via the registry: HKLMoftwareicrosoftindowsurrentVersionINEVThannelsicrosoft-Windows-FailoverClustering/DiagnosticileMax Default Can have performance impact
53. Cluster Nodlara bağlanmada yaşanan problemler ‘Create Cluster Wizard’, ‘Validate a Configuration Wizard’, and ‘Add Node Wizard’, so any of the following messages and warnings we list could be due to WMI issues: · "RPC Server Unavailable" error. · Access is Denied. · The computer ‘Node1’ could not be reached. · Failed to retrieve the maximum number of nodes for ‘{0}’. · The computer ‘Node1.contoso.com’ does not have the Failover Clustering feature installed. Use Server Manager to install the feature on this computer. o Note: first confirm you have installed the Failover Clustering feature on this node Troubleshooting Steps 1) Ensure it is not a DNS Issue 2) Check your that WMI is Running on the Node (wbemtest) 3) Check your Firewall Settings 4) Reboot the Node 5) Rebuild a Corrupt WMI Repository · In the Services console, manually stop the WMI service to ensure that dependent services are stopped · Start WMI service again · Launch and elevated CMD or PowerShell · CMD/PS > winmgmt /salvagerepository 6) Patch WMI for Performance Improvements (974930)
56. The temp folder for the Cluster Service account. For example, exclude the lusterserviceaccountocal Settingsemp folder from virus scanning. w2k3http://support.microsoft.com/kb/250355#appliesto
57. Cluster Log Error Anlamları status 170 - Which means "The requested resource is in use." This could be related to Persistent Reservation problems, it can also be MPIO, fibre/HBA drivers and/or some type of lower level file system driver or software such as anti-virus, quota management, open file agent for backup software, etc, etc,: 00000c94.000008d4::<date and time>.585 INFO Physical Disk <Disk Q:>: [DiskArb] Issuing Reserve on signature 33af636f. 00000c94.000008d4::<date and time>.616 ERR Physical Disk <Disk Q:>: [DiskArb] Reserve completed, status 170. 00000c94.000008d4::<date and time>.616 INFO Physical Disk <Disk Q:>: [DiskArb] Arbitrate returned status 170. status 5 - Is usually a permissions related problem, in this case it was a problem with either Cluster Service Account (CSA) username/password were not synchronized between the nodes. This can also happen if the cluster looses it's Secure Channel connection to the DC in order for the CSA to get authenticated. Another situation in which this can occur, is when one of the domain Group Policy Objects (GPO) or one of the Local Policy Objects is missing a User Rights Assignment needed for the CSA to funtion properly. 000014a0.00001460::::<date and time>.629 WARN [JOIN] JoinVersion data for sponsor <Cluster Name> is invalid, status 5.000014a0.000017d0::::<date and time>.629 WARN [JOIN] Unable to get join version data from sponsor 10.7.47.100 using NTLM package, status 5. status 1117 - Which means an ERROR_IO_DEVICE (The request could not be performed because of an I/O device error) when Event ID 1123 occurs 000015a0.000014a8::<date and time>.511 WARN IP Address <IP Address resource name>: IP Interface 4 (address 10.101.160.65) failed LooksAlive check, status 1117, address 0x10119e0, instance 0xf74d6fb8.
What is a quorum? To put it simply, a quorum is the cluster’s configuration database. The database resides in a file named \\MSCS\\quolog.log. The quorum is sometimes also referred to as the quorum log.it tells the cluster which node should be active