COMMENTS:-- It is understandable that without my comments it may be challenging to understand the full content of the presentation.-- For some most controversial slides (e.g. performance, dNFS setup) I put comments in the notes. Please try to check if there is a comment from me.-- I am deliverable using dNFS (small cap ‘d’) in the slides to make it looks close to kNFS-- In most available documentation the DNFS is used-- For the AUSOUG conference I will have 45 minutes only.-- AUSOUG conference is relatively small event (can’t compare to OOW most US events). Therefore there are different level participants. I should cover basics and touch some advanced topics at the same time. Can’t talk details as much as I would on OOW or Collaborate.-- I would like to give more details on performance improvement results however I wouldn't have time to cover the rest than-- Some slides have animation and I would advice you to run though it get an educated guess that I am going to talk about on those slides
NOTES:-- This is true only if there are no other bottlenecks-- We are talking about 1.2ms to 0.5ms difference here-- Most often systems are limited by HW (Network equipment, Storage speed) and therefore we will not see 300% improvement-- You may expect to see performance improvements somewhere between 0% and 300%kNFS Physical reads: 12,042.2 65,923.3db file sequential read 791,093 1,370 2 97.6 User I/ODB CPU 53 3.81.731781219 msavg PIO6.6995916/10000 CPU sec per PIOdNFS Physical reads: 37,298.0 193,878.0db file sequential read 2,326,535 1,229 1 92.5 User I/ODB CPU 312 23.50.52825339 msavg PIO13.410501/10000 CPU sec per PIODirect storage Physical reads: 53,897.5 281,475.3db file sequential read 3,377,685 1,224 0 91.9 User I/ODB CPU 229 17.20.362378375 msavg PIO6.7797915/10000 CPU sec per PIO Elapsed: 1.09 (mins) DB Time(s): 21.4 116.9 0.36 4.94 DB CPU(s): 0.8 4.4 0.01 0.19 Logical reads: 12,140.1 66,458.8 Block changes: 41.8 228.8 Physical reads: 12,042.2 65,923.3db file sequential read 791,093 1,370 2 97.6 User I/ODB CPU 53 3.8awr_0w_22r.20121023_201639.txtTue Oct 23 20:16:40 EDT 2012real 1m13.117suser 0m0.576ssys 0m1.281s Elapsed: 1.04 (mins) DB Time(s): 21.3 110.7 0.13 4.68 DB CPU(s): 5.0 26.0 0.03 1.10 Logical reads: 37,408.2 194,450.9 Block changes: 33.3 173.0 Physical reads: 37,298.0 193,878.0db file sequential read 2,326,535 1,229 1 92.5 User I/ODB CPU 312 23.5awr_0w_22r.20121023_203540.txtTue Oct 23 20:35:40 EDT 2012Elapsed: 1.04 (mins) DB Time(s): 21.3 111.0 0.09 4.69 DB CPU(s): 3.7 19.1 0.02 0.81 Logical reads: 54,761.2 285,985.7 Block changes: 40.5 211.4 Physical reads: 53,897.5 281,475.3db file sequential read 3,377,685 1,224 0 91.9 User I/ODB CPU 229 17.2awr_0w_22r.20121023_183221.txtTue Oct 23 18:32:21 EDT 2012
Picture – source URL: http://www.bannerblog.com.au/news/picts/thumbs_up.jpgGrid DBA – my friend,Leighton L. Nelson (@leight0nn) blogs about how he speeded up Data Pump using dNFS“Direct NFS speeds up Data Pump”http://blogs.griddba.com/2012/02/direct-nfs-speeds-up-data-pump.htmlAny other references are welcome! Let me know about others good examples.Why ASM in NAS ?https://twitter.com/yvelikanov/status/260674761380749312Yury@yvelikanovPersonally I don't see the point building ASM on (d)NFS. ASM suppose to exclude unnecessary layers. In NAS case it adds an additional layer.Kevin Closson @kevinclosson. @yvelikanov @netofrombrazil @leight0nn Simple, a) Standard Edition 1 and b) ASM striping between filers. "a" is mandatory.Leighton L. Nelson @leight0nn@yvelikanov @netofrombrazil easy storage migration?Yury @yvelikanov@leight0nn @netofrombrazil u can use Incr refreshable data files copies to migrate on a FS (short downtime). But I do agree. ASM no downtimehttp://bit.ly/RTTkxnGuenadiJilevski@gjilevski@netofrombrazil @yvelikanov @leight0nnEx.ASM on additional NFS for quorum of the vote disk in extended RAC clusters.https://twitter.com/simon_haslam/status/260892761102901248Simon Haslam@simon_haslam@netofrombrazil @leight0nn @yvelikanovSo you're saying ASM on dNFS is good? Or just dNFS with anything (say, OMF instead of ASM)?Yury @yvelikanov@netofrombrazil @simon_haslam @leight0nn @NetApp The right question is: would you recommend naked NFS or ASM on top of NFS?Yury @yvelikanov@yvelikanov @netofrombrazil @simon_haslam @leight0nnOn top of @NetApp of cause. Or it doesn't matter for you as far as #NetApp is in useneto from Brazil @netofrombrazil@yvelikanov @simon_haslam @leight0nn @netapp NFS. ASM on top of D or K NFS only in special cases IMHOREF:https://plus.google.com/u/1/107075205411714880234/posts/G7EPReaJGvFDirect NFS does not support Oracle Clusterware files. REF: http://docs.oracle.com/cd/E11882_01/install.112/e22489/storage.htm#CDEBGJAGNFS file system on a certified NAS filer => OCR and Voting Disk Files => YeskNFS is supported => dNFS isn't for OCRsGood catch +Arup Nanda
Simplified - TCP still involved kernel THX to @fritshooglandfrom Twitter=B===Martin Bach@MartinDBADNFS Mostly in USER Mode (Good for VM solutions)does it make sense to say that one of the main benefits is that using dNFS you stay (mostly?) in user mod, whereas knfs requires transitions into the kernel (want to trace using perf, but haven't had time).According to James MorledNFS is great for vmware where all user mode code is quick, but kernel transitions (requiring interrupts on x86) take longer since the hypervisor has to "trap" the instruction and translate it to be safe for multiple guests.=E===Martin Bach@MartinDBAAdditional comment from@fritshoogland: Personally, I would draw a nfsd square which gets the line from io client, partly in/partly outside of kernel, and remove that with DNFSCAN kNFS work as well as dNFS?https://twitter.com/yvelikanov/status/260893343150653440Yury@yvelikanov@netofrombrazil @kevinclosson Did I get it right? We can tune kNFS to work on dNFS speed if we invest a lot of NFS expert time?Additional comment from @MartinDBA:The only really cool think worth mentioning is that you have less transitions into kernel code which is good for vitrualized environments. I'm sure if you tested dNFS on a virtual machine it would clearly beat kNFS!Additional comment from @kevinclosson:DNFS addresses circa-2004 NFS and bonding weaknesses specific Oracle I/O profile and OSs. Times change.http://bit.ly/RTRLQ5Kevin Closson @kevinclosson.@netofrombrazil @leight0nn @yvelikanov Even though Solaris needs to die really badly, you might find Sol 11 x64 dNFS and kNFS show parityneto from Brazil @netofrombrazil@kevinclosson @leight0nn @yvelikanov :-) knfs well tuned works good :-)neto from Brazil @netofrombrazil@kevinclosson @leight0nn @yvelikanov OL works pretty well too. I've got 2GBytes per second with KNFS :-)@kevinclosson @yvelikanov agreed you can tune well KNFS but DNFS you can have better optimization. But depends IO that you are generating
=B===Martin Bach@MartinDBAyou could also use ldd oracle to see the libraries compiled in.additionally pmap or /proc/pid/smaps in RHEL 6.x=E===Martin Bach@MartinDBA
This slide is intentionally made overcrowded. I have 2 goals here:Provide references and state that the setup is documented with reasonable level of detailsMake a joke :)
dNFS = Speed = High Availability = Scalability (?reduced CPU as we are skipping longer code path?)http://docs.oracle.com/cd/E11882_01/install.112/e22489/storage.htm#CWLIN2743.2.3.4 Specifying Network Paths with the Oranfstab FileDirect NFS can use up to four network paths defined in the oranfstab file for an NFS server. The Direct NFS client performs load balancing across all specified paths. If a specified path fails, then Direct NFS reissues I/O commands over any remaining paths.Could bonding be a good alternative to dNFS?https://twitter.com/gwenshap/statuses/260668780903026688Gwen (Chen) Shapira@gwenshap@yvelikanov you can bond 1Gb nics, but for each client-server pair, you will still only get 1Gb line. Tricky things :)neto from Brazil @netofrombrazil@yvelikanov @gwenshap @jantup 2 x 1Gbit max you can get 240MB/s either k or d [nfs]https://twitter.com/gwenshap/statuses/260958482059112449Gwen (Chen) Shapira@gwenshap@netofrombrazil @kevinclosson@yvelikanov so this is one of the things you need to get right when configuring knfs, but dnfs handles for youGwen (Chen) Shapira @gwenshap@netofrombrazil @kevinclosson @yvelikanov how do you get bonded 2Gb with knfs? Linux uses 802.3ad bonding. One link per conversation = 1Gb.Details6m43neto from Brazil @netofrombrazil@gwenshap @kevinclosson @yvelikanov LACP mode 4neto from Brazil@netofrombrazil@gwenshap @kevinclosson @yvelikanovand other parameters like backlog, sun rpc etc...https://twitter.com/kevinclosson/status/260959183900405760Kevin Closson@kevinclosson.@gwenshap @netofrombrazil @yvelikanovexactly also dnfs is a combined agg+failover. Simple wire. All my writings from 2006 are sexy now?Kevin Closson@kevinclosson. @yvelikanov @gwenshap good lord, gus, read the paper .. I didn't waste ink:http://www.oracle.com/technetwork/articles/directnfsclient-11gr1-twp-129785.pdf … who wrote that?@netofrombrazil @gwenshap @jantup My questions to you is: Can !!! 1 !!! session get 240MB/s out of 2 x 1Gbit ?Kevin Closson @kevinclosson@netofrombrazil @yvelikanov @gwenshap @jantupdepends on your database host (nfs client) kernelhttps://www.google.com/search?num=100&hl=en&site=&source=hp&q=closson+%2Bdirect+NFS&oq=closson+%2Bdirect+NFS&gs_l=hp.3...2255.7134.0.7296.19.18.0.0.0.0.419.2714.4j5j2j3j1.15.0.les%3Bcesh..0.0...1.1.mDyX7Tu3XpI …Kevin Closson @kevinclosson@gwenshap @jantup @yvelikanov depends on the form of bonding. dNFS is really better for aggregating NICs.neto from Brazil @netofrombrazil@yvelikanov @gwenshap @jantup one tcp session or 1 thread?Yury @yvelikanov@netofrombrazil @gwenshap @jantup 1 database session (unix process, foreground process, full table scan)neto from Brazil @netofrombrazil@yvelikanov @gwenshap @jantup of course you can :-)neto from Brazil @netofrombrazil@netofrombrazil @yvelikanov @gwenshap @jantup one full table scan works if you have the right conf for db multi block readKevin Closson @kevinclosson@yvelikanov @netofrombrazil @gwenshap @jantup a single foregound on modern CPU would have not trouble saturating 2x1GbE (240MB/s)Kevin Closson @kevinclosson. @netofrombrazil @yvelikanov @gwenshap @jantup No surprise. But, hold it, Manly Man doesn't use NFS for Oraclehttps://www.google.com/search?num=100&hl=en&biw=1507&bih=707&q=manly+man+NFS&oq=manly+man+NFS&gs_l=serp.3..0l10.12633532.12640635.0.12640847.13.13.0.0.0.0.453.2518.2j1j5j0j2.10.0.les%3Bcesh..0.0...1.1.6bgcFszDVok …neto from Brazil @netofrombrazil@netofrombrazil @kevinclosson @yvelikanov @gwenshap@jantup for TCP the rule is 1 Hz to process 1 bit - 1GHz process 1Gbit got it?https://twitter.com/Djelibeybi/status/260669539027648512Djelibeybi@Djelibeybi@yvelikanov you can get more bandwidth with a LACP (802.3ad) bond of two or more NICs. Not a linear scale, though and needs switch support.Leighton L. Nelson @leight0nn@Djelibeybi @yvelikanov Read somewhere multiple paths on diff subnet recommended over LACP vif. Could be wrong.
Local: Is IP on the DB server you cant connections to go thoughPath: Is IP on the Filler you can the connections to end upExport, Mount – is a pair that finalise each block of informationYou can have more than one (up to 4) Local/Path parameters specifiedUse Dontroute if you about to use several pairs of IPs from the same networkMnt_timeout is in seconds and defaults to 10 minutes. It seems a bit too high to me. I would set it to 1 minute (disclaimer: didn’t have much experience in failing NFS area)You can specify block per mountI suspect that there is a limit of IP connections each session can have. As of now I didn’t hit this limit. But at the same time I didn’t use too many connections per sesison (4 max as o now). If you have experience please let me know (@yvelikanov)Special thanks to @pioro for comments on UEK
Local: Is IP on the DB server you cant connections to go thoughPath: Is IP on the Filler you can the connections to end upExport, Mount – is a pair that finalise each block of informationYou can have more than one (up to 4) Local/Path parameters specifiedUse Dontroute if you about to use several pairs of IPs from the same networkMnt_timeout is in seconds and defaults to 10 minutes. It seems a bit too high to me. I would set it to 1 minute (disclaimer: didn’t have much experience in failing NFS area)You can specify block per mountI suspect that there is a limit of IP connections each session can have. As of now I didn’t hit this limit. But at the same time I didn’t use too many connections per sesison (4 max as o now). If you have experience please let me know (@yvelikanov)Special thanks to @pioro for comments on UEK
Remove limit on the number of diskmon slaves [bug: 9842238]ORACLE DATABASE WILL NOT OPEN [bug: 14383403]LRGIONFS RUN IS FAILING ON WIN2K8 R2 [bug: 13689216]DATABASE STARTUP AND QUERY TAKING HUGE TIME WHEN DNFS IS ENABLED [bug: 13510654]LGWR hangs for long periods using DNFS - CF waits likely [bug: 9556189]