4036

Rapid increasing er_bad_os at 8 Gbit...
AndyAtEon 10 posts since Jul 12, 2005
HI, I have seen very fast increasing error counter er_bad_os on storage ports. A change of the fill word mode does not fix the issue. The storage vendor recommends to set idle as fill word. We are using FOS 6.2 and 6.3. I would like to understand what is going wrong and what has to be changed to fix the issue, switch or HBA firmware.
Thanks
Tags: 8gbit, dcx-4s, storage, dcx
hemant 422 posts since Mar 3, 2010 1. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 9, 2010 10:21 AM
This also media related issue, change the Fibre Cable, SFP or HBA...
andreas.bergelt 600 posts since Apr 12, 2010 2. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 12, 2010 1:17 AM in response to: hemant
Thanks hemant,
in our case it is not related to media, cabel or SFP. It is related to 8 gbit compatiblity issues. But I don't understand what is going wrong between switch and device. I would like to get a more detailed description why idel nor abrff are not working correctly.
Generated by Jive SBS on 2012-03-12-06:00 1
Andreas .
hemant 422 posts since Mar 3, 2010 3. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 13, 2010 4:20 AM in response to: andreas.bergelt
Hi,
Pls do a portstatsclear and see thorugh portstatsshow and porterrshow , if the same is increasing or not. You may have to upgrade the driver and FW version of the HBAs
andreas.bergelt 600 posts since Apr 12, 2010 4. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 13, 2010 12:56 PM in response to: hemant
Thanks, for your help to find a solution, but my question was to get an explenation for this behavior. By the way a HBA firmware update on a storage array is not possible. The storage array runs the latest code. I would like to know if other customers had the same issue with 8 Gbit storage ports on 8 Gbit Brocade switches.
This parameters is (platform/port specific). did you do a portstatsclear and check it increasing or not ?Check the compatibilty of HBA Driver and Firmware version with Storage microcode. You have to see other parameters also er_enc_out, if all these values are increasing after doing portstatsclear, then you have to change the cable and SFP.BTW which storage is this with 8Gbps CHA ports?There is no such compatibiltiy issue . SAN SW works on AN mode.
All other errcounter are at zero level. So no cabling issue. If I configure the storage port fixed to 4 Gbit everything is fine only at 8 Gbit speed I can see the increasing er_bad_os. Hitachi storage is affected. USP-V and AMS 2500 with 8 Gbit ports. As mentioned above FOS 6.3.1a is running on the switch and latest code on the array. And only the switch port where the array is connected is affected. we have EMULEX LPe12000 and Brocade 815 on the server side without any issues. Data traffic can pass the storage without problems. But I can see the counter increase very quickly...
xwdees02_a1_ds:FID128:A15710> portcfgshow 1/9Area Number: Level: AUTO(HW)Fill Word: 0(Idle-Idle) ....
9Speed
xwdees02_a1_ds:FID128:A15710> statsclearxwdees02_a1_ds:FID128:A15710> portstatsshow 1/9stat_wtx 0 4-byte words transmittedstat_wrx 0 4-byte words receivedstat_ftx 0 Frames transmittedstat_frx 0 Frames receivedstat_c2_frx 0 Class 2 frames receivedstat_c3_frx 0 Class 3 frames receivedstat_lc_rx 0 Link control frames receivedstat_mc_rx 0 Multicast frames receivedstat_mc_to 0 Multicast timeoutsstat_mc_tx 0 Multicast frames transmittedtim_rdy_pri 0 Time R_RDY high prioritytim_txcrd_z 0 Time TX Credit Zero (2.5Us ticks)tim_txcrd_z_vc 0- 3: 0 0 0 0tim_txcrd_z_vc 4- 7: 0 0 0 0tim_txcrd_z_vc 8-11: 0 0 0 0tim_txcrd_z_vc 12-15: 0 0 0 0er_enc_in 0 Encoding errors inside of frameser_crc 0 Frames with CRC errorser_trunc 0 Frames shorter than minimumer_toolong 0 Frames longer than maximumer_bad_eof 0 Frames with bad end-offrameer_enc_out 0 Encoding error outside of frameser_bad_os 409610422 Invalid ordered seter_rx_c3_timeout 0 Class 3 receive frames discarded due to timeouter_tx_c3_timeout 0 Class 3 transmit frames discarded due to timeouter_c3_dest_unreach 0 Class 3 frames discarded due to destination unreachableer_other_discard 0 Other discardser_type1_miss 0 frames with FTB type 1 misser_type2_miss 0 frames with FTB type 2 misser_type6_miss 0 frames with FTB type 6 misser_zone_miss 0 frames with hard zoning misser_lun_zone_miss 0 frames with LUN
zoning misser_crc_good_eof 0 Crc error with good eofer_inv_arb 0 Invalid ARBopen 0 loop_opentransfer 0 loop_transferopened 0 FL_Port openedstarve_stop 0 tenancies stopped due to starvationfl_tenancy 0 number of times FL has the tenancynl_tenancy 0 number of times NL has the tenancyzero_tenancy 0 zero tenancy
Wait some seconds...
xwdees02_a1_ds:FID128:A15710> portstatsshow 1/9stat_wtx 0 4-byte words transmittedstat_wrx 0 4-byte words receivedstat_ftx 0 Frames transmittedstat_frx 0 Frames receivedstat_c2_frx 0 Class 2 frames receivedstat_c3_frx 0 Class 3 frames receivedstat_lc_rx 0 Link control frames receivedstat_mc_rx 0 Multicast frames receivedstat_mc_to 0 Multicast timeoutsstat_mc_tx 0 Multicast frames transmittedtim_rdy_pri 0 Time R_RDY high prioritytim_txcrd_z 0 Time TX Credit Zero (2.5Us ticks)tim_txcrd_z_vc 0- 3: 0 0 0 0tim_txcrd_z_vc 4- 7: 0 0 0 0tim_txcrd_z_vc 8-11: 0 0 0 0tim_txcrd_z_vc 12-15: 0 0 0 0er_enc_in 0 Encoding errors inside of frameser_crc 0 Frames with CRC errorser_trunc 0 Frames shorter than minimumer_toolong 0 Frames longer than maximumer_bad_eof 0 Frames with bad end-of-frameer_enc_out 0 Encoding error outside of frameser_bad_os 716822618 Invalid ordered seter_rx_c3_timeout 0 Class 3 receive frames discarded due to timeouter_tx_c3_timeout 0 Class 3 transmit frames discarded due to timeouter_c3_dest_unreach 0 Class 3 frames discarded due to destination unreachableer_other_discard 0 Other discardser_type1_miss 0 frames with FTB type 1 misser_type2_miss 0 frames with FTB type 2 misser_type6_miss 0 frames with FTB type 6 misser_zone_miss 0 frames with hard zoning misser_lun_zone_miss 0 frames with LUN zoning misser_crc_good_eof 0 Crc error with good eofer_inv_arb 0 Invalid ARBopen 0 loop_opentransfer 0 loop_transferopened 0 FL_Port openedstarve_stop 0 tenancies stopped due to starvationfl_tenancy 0 number of times FL has the tenancynl_tenancy 0 number of times NL has the tenancyzero_tenancy 0 zero tenancyxwdees02_a1_ds:FID128:A15710>
As you can see everything is fine. If I create some load on the port there are also no increasing error count. These "er_bas_os" are not visible to the server and also not visible to the storage array. Hiatchi adviced us to configure the ports to the settings from above. We have no transport erros currently. I am looking for an explenation of this behavior. Thanks, Andreas
yes, definitely it is strange. Have you tried changing the port from this to other ports. you are saying that while putting load does not increase the value. is the server HBA idle .What else the HITACHI people said.we can ignore this value also. unless until you face congestion. this parameter increase only due to server reboot, bad cable, SFP, . Also do a portstatsclear not only statsclear. Is theer any error on porterrshow.what is showing on errdump? I do not hink this is a compatibiltiy issue at CHA port or switch port. you also say that while you set the port to 4 gbps, it is ok. Check portshow also and portloginshow.
Have you tried changing the port from this to other ports --> Yes, the same issue you are saying that while putting load does not increase the value--> I am sorry my english is not very good. The er_bas_os increase in the same speed at the affected storage . The data flow had no problems. Is theer any error on porterrshow --> no you also say that while you set the port to 4 gbps, it is ok --> Yes correct Check portshow also and portloginshow --> No problems ervy thing is fine
What else the HITACHI people said --> Ignore the counter.
But this looks not as a well tested and compatible product combination. Looks like that Brocade is not talking to the rest of the world and make sure that new technology works as planned... Can you ask Brocade the engineering what is going wrong? I think you are working for Brocade, correct? I would like to understand what the problem is and who can fix it.
Thanks, Andreas
Hi,
No, I am not working for Brocade, but I am BCFP, BCSD, BCFD, BCSM(4 & 8 Gbps) certified and working on Brocade Dir class products with a huge 4000 SAN SWs ports for 4. 6 yrs. I have seen these things. I can say just to ignore this.
let me describe :
in portstatsshow we see port hardware statistics counter.some counters are platform and port specific and display only with those platforms and ports.
This parameter wants to say , that any config/parameter is not set correct. That has several reason....
I have seen my friends facing the same :If we change the Speed from 8 to 4 G the counter stops or the other solution is to change the fillword to ARBFF in Link Init, ARBFF as fill word.
Since ordered sets do not contain data, it has nothing to do with the dataflow. So we can ignore these. Remember, ordered sets are purely within the SAN; the OS will never see them.
Have you checked the compatibility matrices between your server/HBA and the switch/FOS level, and between the server/HBA and the storage?
about orderedset:The round trip delay is measured by transmitting a particular Primitive Signal. A Primitive Signal is an Ordered Set used to indicate an event. An Ordered Set is a 4-byte Transmission Word which has the Special Character as its first Transmission Character. An Ordered Set may be a Frame Delimited, a Primitive Signal, or a Primitive Sequence. Ordered Sets are used to distinguish Fibre Channel control information from data. A Transmission Word is a string of four consecutive Transmission Characters--a (valid or invalid) 10-bit character transmitted serially over the fibre. Valid Transmission Characters are determined by the 8B/10B encoding specification. The Special Character is a special 10-bit Transmission Character which does not have a corresponding 8-bit value, but is still considered valid. The Special Character is used to indicate that a particular Transmission Word is an Ordered Set. The Special Character is the only Transmission Character to have five 1's or 0's in a row. The Special Character is also referred to as K28.5 when using K/ D format. For additional explanation of these various terms, one may refer to the Fibre Channel standards, particularly FC-PH, which is ANSI publication X3.230, and is hereby incorporated by reference.
Also we have seen when there is no data tranmission between HBA and storage port this value also increases.
If HITACHI people have said to ignore this then they must have queried Brocade.
TechHelp24 2,605 posts since Feb 23, 2004 10. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 12:32 AM in response to: andreas.bergelt
Andreas, they already gave themselves the answer...
--->>>in our case it is not related to media, cabel or SFP.
"It is related to 8 gbit compatiblity issues."
TechHelp24 2,605 posts since Feb 23, 2004 11. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 12:33 AM
Andreas, they already gave themselves the answer...
--->>>in our case it is not related to media, cabel or SFP.
"It is related to 8 gbit compatiblity issues."
hemant 422 posts since Mar 3, 2010 12. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 8:17 AM in response to: TechHelp24
simply tell them to change the CHA board, or wait till the Microcode upgrade
pmescher 1 posts since May 7, 2009 13. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 16, 2010 3:33 PM
No need to replace hardware. Simply upgrade to a FOS level with the portcfgfillword command. (I think it was introduced somewhere in 6.2) Brocade defaults to ARBff, some
storage devices still expect six IDLEs between frames, and their state machines fail if those IDLEs aren't received. portcfgfillword <port number>, 0 and you will be all fixed.
This problem CAN cause data flow issues due to excess interrupts in HBAs. I've seen it on some QL models.
Note that the fill word was changed for good reasons (ARBff improves signal timing), so the latest FOS versions allow you send six IDLEs to satisfy the state machine, and then send ARBff's. So, you get the IDLEs for devices that require them, and you get the improved signal characteristics of ARBff.
hemant 422 posts since Mar 3, 2010 14. Re: Rapid increasing er_bad_os at 8 Gbit speed Apr 17, 2010 9:00 AM in response to: pmescher
I do not have any idea about portcfgfillword because I have not used it. But at HITACHI level microcode upgrade may resolve the issue.
ploufg 31 posts since Apr 16, 2007 15. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 4, 2010 9:01 AM
Hi
I had the same problem with my DCX and 48000 (FC8-48), and I suggest not to ignored this counters, even though it is not related to data transfer but more to link signal and sync. Id contact my support provider (HP) and the only way they resolve it was re-configured the fill word port mode from 0 to 3. portcfgfillword slot/port, 0 iddle portcfgfillword slot/port, 3 arbff if failed use iddle for devices that expect this signal
Note: that if you run this command It will reset the port (disable-enable) so make sure that you servers have more than one path to the target
This configuration allow devices to used arbff as primitive signal or devices that expect idlle primitive signal (like and auto-negociation). As for technical reason it is related to Electromagnetic Interferance(EMI) and protocol (see T11 updated documentation) From T11 org The following FC-FS-2 proposal is for the purpose of reducing EMI with 8G and higher serial link . It replaces IDLE with the use of ARB(FF) which has a lower transition density. This allows the reduction of EMI without more significant changes that would involve randomizing the data pattern. I suggest that you contact your support provider If I may had a comment i used the er_bad_os to identify bad devices like SFP, Cable, lost db, (fc analyzer), or any
Hop this will help.
andreas.bergelt 600 posts since Apr 12, 2010 16. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 5, 2010 1:57 AM in response to: ploufg
Hello,
thanks for your answer. But in our case the affected storage vendor doesn't support ARB(FF) on the switch side. I assume that the FC ASIC on the storage side is not well coded and tested.
Regards, Andreas
a.adamson 53 posts since
Jun 24, 2009 17. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 9, 2010 1:37 AM in response to: andreas.bergelt
Hi,
I have had the same problem. After making sure I had the correct fill word (according to the manufacturer of the arrays in question), I found the problem disappeared when replacing the cabling that was going via a path panel by a brand-new direct attached cable.
I agree the errors do not necessarily suggest a cabling problem, but that is how we solved it.
Alastair
Mahendran 22 posts since Apr 6, 2010 18. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 12, 2010 4:57 AM in response to: a.adamson
I am facing similar issues on a 48000, whereby er_bad_os is increasing and the ports are flapping as well. Some other ports are facing the same issues as well. Any idea why this is happening?
Firmware : v6.2.2a
Speed:
N4 Online (4Gbps)
portstatsshow 4/1 stat_wtx stat_wrx stat_ftx stat_frx 915040180 4-byte words transmitted 2406319128 4-byte words received 12732318 Frames transmitted 683736493 Frames received
stat_c2_frx stat_c3_frx stat_lc_rx stat_mc_rx stat_mc_to stat_mc_tx tim_rdy_pri tim_txcrd_z er_enc_in er_crc er_trunc er_toolong er_bad_eof er_enc_out er_bad_os er_c3_timeout 0
0 0 0 0 0 0 318 0 0 0 0 84 11245 2366 0 0 0 0 0 0 0 0 0 0 0
Class 2 frames received Link control frames received Multicast frames received Multicast timeouts Multicast frames transmitted Time R_RDY high priority Time BB credit zero (2.5Us ticks) Encoding errors inside of frames Frames with CRC errors Frames shorter than minimum Frames longer than maximum Frames with bad end-of-frame Encoding error outside of frames Invalid ordered set Class 3 frames discarded due to timeout Class 3 frames discarded due to destination unreachable Other discards Class 3 frames discarded due to zone mismatch Crc error with good eof Invalid ARB loop_open loop_transfer FL_Port opened tenancies stopped due to starvation number of times FL has the tenancy number of times NL has the tenancy
683736493 Class 3 frames received
er_c3_dest_unreach er_other_discard er_zone_discard er_crc_good_eof er_inv_arb open transfer opened starve_stop fl_tenancy nl_tenancy portcfgshow 4/1 Area Number: Speed Level: Fill Word:
49 AUTO(HW) 0(Idle-Idle)
AL_PA Offset 13: Trunk Port Long Distance VC Link Init Locked L_Port Locked G_Port Disabled E_Port ISL R_RDY Mode RSCN Suppressed Persistent Disable NPIV capability QOS E_Port Port Auto Disable: Mirror Port F_Port Buffers
OFF ON OFF OFF OFF OFF OFF OFF OFF OFF ON OFF OFF OFF OFF
sfpshow 4/1 Identifier: 3 Connector: 7 SFP LC
Transceiver: 150c402001000000 100,200,400_MB/s M5,M6 sw Inter_dist Encoding: 1 8B10B
Baud Rate: 42 (units 100 megabaud) Length 9u: 0 Length 9u: 0 (units km) (units 100 meters)
Length 50u: 15 (units 10 meters) Length 62.5u:7 Length Cu: 0 (units 10 meters) (units 1 meter)
Vendor Name: FINISAR CORP. Vendor OUI: 00:90:65 Vendor PN: FTLF8524P2BNV Vendor Rev: A Wavelength: 850 (units nm) Options: BR Max: BR Min: 0032 Loss_of_Sig,Tx_Disable 0 0
Serial No: UA80ZN9 Date Code: 060821 Temperature: 34 Centigrade Current: Voltage: RX Power: TX Power: 6.228 mAmps 3291.9 mVolts -5.5 dBm (281.5 uWatts) -4.3 dBm (372.3 uWatts)
andreas.bergelt 600 posts since Apr 12, 2010 19. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 12, 2010 4:35 AM in response to: Mahendran
In your case I would say you have some issues with discards in your fabric. Check your ports for this counter: er_c3_timeout.
You will see on affected ports that servers will have IO errors and a performance issue. I thinks it is a serious issue to have dicards in the fabric.
As an update: With FOS 6.3.2 Brocade shows the "discards direction" if it is on the TX or RX side This is cool :-)
Regards, Andreas
Mahendran 22 posts since Apr 6, 2010 20. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 12, 2010 5:03 AM in response to: andreas.bergelt
Yes. We are seeing discards on the ISL links and we have upgraded from 6.1.2 --> 6.2.2 but seems like its not helping. We have done everything from sfp replacments to cable and move the cables to other ports. No clue on what to do next. Everyday there will be ports have enc out increasing and flapping. Sometimes tx/rx would be 0.0 watts and we replaced the sfps but at time the sfp would be just fine but we still its flapping.
I asume that you came original from a 5.3 version before you update to 6.x, correct? If so than you can reduce the dicards on the ISL and in the fabric if you add additional ISLs. In our case it had fixed the issue. I have also disabled QoS on all ISL but this didn't fix the issue. It was only a recommendation from the OEM. I suggest that you fix first your ISL issues and then do the next step.
Regards, Andreas
Hi Andreas,
The previous version was 6.1.2. We haven't try to add additional ISLs but this is something worth a try. We also have issues with replication ports flapping and there are discards on the MPR 7500 san router connecting the primary and secondary site. We are still pending to upgrade the firmware on the secondary site and I think we will proceed further from there.
Regards, Mahen
Mahendran 22 posts since Apr 6, 2010 23. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 13, 2010 7:39 AM in response to: Mahendran
Hi Andreas,
Apart from that, we are also experiencing enc out on storage ports which belongs to the same USP. Could it be that the ports are set to auto nego?
Regards, Mahen
Hello Mahen,
auto nego is not an issue on a Brocade SAN. USP with 4 Gbit and USP-V with 8Gbit feature cards have no problem with auto.
I prefere to set the ports on both side fixed to the maximum speed.
I assume that you may have a bigger issue in the SAN. Try primay to fix your discard issue.
If you talk about a 7500 router can you check if the router links are maybe overloaded?
It could be possible that this is the reason why you see dropped frames in your SAN.
But finally I would say that this thread has now a complete other direction compared to the initial question which I raised some time ago....
. Regards, Andreas
Yeah agree. Maybe I should open a new thread.
Let it here as it is. What are your next steps to solve your issues?
Andreas
Hi Andres,
I have no clue what to do next. I am actually waiting for all the switches in the primary site and secondary site to be upgraded and proceed to torubleshoot as adviced by the vendor. I am open to any suggestions from you all.
Regards, Mahendran
hemant 422 posts since Mar 3, 2010 28. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 14, 2010 3:33 AM in response to: Mahendran
why do not you check from HBA side.you are getting the value increasing on the ports , connected to only HBAs or Storages also. have you done a portstatsclear and then porterrshow and a portstatsshow again and again. have you checked then, the error is increasing or not?
Mahendran 22 posts since Apr 6, 2010 29. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 14, 2010 4:45 AM in response to: hemant
Hi Hemant,
Let me give an overview of the infra:
We have 3 sites :
Site 1
Fabric 1 = 20 switches Fabric 2 = 20 switches
Site 2
Fabric 1 = 18 switches Fabric 2= 18 switches
Sites 3
Fabric 1 = 2 switches Fabric 2 = 2 switches
------------------------------------------------------------------------
We are experiencing discards on the ISL between 2 dir 4800 switches (Site 1 ; Fabric 1) and there quite number of port generating enc out and disc c3 on most of the switches connected to those dir switches. We have replaced a lot of sfps and cables but nothing is solving this issue. Everyday, there will be ports flapping. Firmware was upgraded from 6.1.0c ---> 6.2.2a but this seems be not solving the issue. We have about 6 more switches to be upgraded.
On your suggestions to replace the HBA, we did replace the HBA seems like the paths on the server goes offline even after the replacement. We have few hundred server facing intermittent paths going offline at the moment.
I have no clue where to start again.
Regards, Mahen
Hi,
If you are facing the issue on ISL path between 2 SWs , then connect 2 more cables to create another trunk.or if you have adjacent port available, then add m2 more cables, once these 2 new cables create a trunk, observe the error. If you will not get any error on these new cables , then remove the old cables and observe through these new trunk
one question, u say that u observe errors on ISL between 2 SWs, so the servers showing intermittent path offline are connected to only these 2 sws. Try to localize the servers and storages also, that means both HBA and controller should be on the same SW.
Also if these are HITACHI controllers, check the HDLM version on hosts, you may have to upgrade the HDLM version with autofailback on and extended I/O settings.
If not the HITACHI arrays, then you should log a call with ur vendor, which will log a call in the backend of Brocade.
One thing you have done that you have upgarded FOS from 6.1 to 6.2, but you have upgrdae all the SWs in that fabric then.do nto keep it like this.
Mahendran 22 posts since Apr 6, 2010 31. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 14, 2010 7:06 AM in response to: hemant
Hi Hemant,
I think I will proceed to propose to add ISLs links between these 2 dir switches. Those servers are connected : servers ---> 32poirt_switch ---> dir1 ----> dir3---->USP . So, all the servers connected via these switches are facing same problem.
Regards,
Mahendran
andreas.bergelt 600 posts since Apr 12, 2010 32. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 14, 2010 2:01 PM in response to: Mahendran
Mahendran,
your are on the right way if you add ISL. Try to create a bigger trunk and not to add more trunks to simplify the routing table and avoid to have a mash or ring topology. The FOS code will manage the balancing by them selfs well.
Did you have on the 32port switch old servers running on 1 or 2 gbit speed with very old PCI bus infrastructure and did they have zones to 8 Gbit storage ports?. If so this can cause back pressure on the ISL which will end in discards somewhere in the fabric. The storage port can overload the server.
Did your problems came up after your FOS code update? Did you have a error history from the time before the update or did you now start the error monitoring since you have the issue?
If the errors came up after your update then FOS code may causes the discards which result in IO errors on the servers which you can see in also as LUN resets on the storage ports.
I have seen the same in my own environment nothing changed only the the FOS code. After adding ISLs every thing went fine as before.
Don't waste your time to look at HDLM or HBA firmware. FIx the DISCARDS on ISL!
Andreas
Hi,
Otherwise if you can localize the servers and storages i.e. connect the HBAs to that SW, where th estorages are connected. Eliminate the hop. That will solve the issue.
a.adamson 53 posts since Jun 24, 2009 34. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 16, 2010 8:38 AM in response to: Mahendran
Mahendran, Mahendran wrote:
I think I will proceed to propose to add ISLs links between these 2 dir switches. Those servers are connected : servers ---> 32poirt_switch ---> dir1 ----> dir3---->USP . So, all the servers connected via these switches are facing same problem. I am confused about your setup. You say a few posts back that you have three dual-frabric SANs. In your second post you mention an MPR 7500. So are all three SANs in a metaSAN? You later say the discards are between two 48000 directors, Are these on the same fabric in the same SAN or across SANs in the meta-SAN, via the routers?
In your first post you show details of a specific port, presumably an e-port. Is this connected to the 7500? What is the distance between sites? Are the inter-site links all in the backbone fabrics on the 7500s?
Thanks, Alastair
pierre.cornet 1 posts since Jan 18, 2010 35. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 16, 2010 11:43 AM in response to: hemant
This error can't still be noticed in a non-isl environment regardless of GBIC speed.
hemant 422 posts since Mar 3, 2010 36. Re: Rapid increasing er_bad_os at 8 Gbit speed Aug 21, 2010 8:02 AM in response to: pierre.cornet
In this case you just can ignore the error, because it will not harm ur data
aart.aalberts1 1 posts since Apr 20, 2010 37. Re: Rapid increasing er_bad_os at 8 Gbit speed Sep 25, 2010 2:59 AM
we have same isl issue between 48k and dcx: v631a with c3 discarded frames set portcfgfillword on 8gbps ports no luck seems to me some internal timings issue between 4 and 8gbps san ports for the rest have no clue as neither emc has
erwin.vanlonden 61 posts since Jul 27, 2009 38. Re: Rapid increasing er_bad_os at 8 Gbit speed Dec 27, 2010 6:13 PM
I saw this one popping by. The below might shed some light on fillwords and "invalid ordered sets". I wrote this explanation in and HDS internal communiqe but it seems most of you will benefit as well. Let me emphasize that this has nothing to do with hardware troubles whatsoever when you see this phenomenon.
-----------------------------
This goes back to the change in fibre channel protocol requirements for 8G and higher linespeeds. On 4G and lower a so called IDLE fill-word is used which starts with a K28.5 and is followed with 3 datawords. (D21.4 D21.5 D21.5) This fill-word is used to maintain bit and word synchronisation between two N-Ports. Due to the higher baud-rate on 8G speeds and the specific bit pattern of this IDLE fill-word it is known that this increases emission of high radiation waves that might result into electrical interference with other equipment.
To circumvent that an other fillword was adopted which was already defined in the FCAL protocol called ARB(ff) (K28.5 D20.4 D31.7 D31.7) . This is a similar fillword but has a better bitpattern to prevent this radiation emission. These fillwords are called ordered sets (a K28.5 and three datawords are an ordered set). The standard defines that during word synchronisation (that is after speed negotiation on 8G and bit/character synchronisation) the ports shall send 6 IDLES upon entering the Active state to obtain word synchronisation and then switch to ARB(ff) as fillword. (I spare you the entire protocol definition on link state changes)
If however the speed is negotiated at 8G during this init/transition sequence and only one port switches to ARB(ff) as fill-word you will see these er_bad_os counter increase very fast. Be aware that even thought no actual frame is sent from an HBA the HBA and switchport still send these fill-words constantly at the negotiated line-speed. Beside the word synchronisation the ports also use actual frames to sync their clockrate by looking at SOF and EOF delimiters. If however an HBA is not sending any frames and the port is not able to determine a sync state within a certain period of time it will do a link reset (LR) and will go through the sync process again.
Brocade FOS pre 6.3.1 had only mode 0 and 1 (either IDLE or ARBff). This meant that if one device was very strict in the standard but another was not it would sync up if the switch port was configured as mode 0 but it would never switch to ARBff as required by the standard. On the other hand if the switchport was configured as mode 1 and the HBA lived by the standard it could never get into a synch state because the switch would only transmit ARBff as fillwords and the HBA would only use IDLEs.
Fillwords can be replaced by other ordered sets (primitive signals or sequences). One of those is very important for buffer credit organisation and is called R_RDY If you loose R_RDY signals the sending device has no knowledge if these .
buffers on the other side have been cleared. This may lead to performance problems etc. Ill spare you the details.
As you can see these fillwords are used between frames. FLOGI and PLOGI are frames so to answer your question , no, changing fillwords has nothing to do with failed FLOGI or PLOGIs. PLOGIs from initiator to target devices might sometimes get dropped as any other frame in class 3 service due to numerous reasons. Physical errors or congestion on ISLs is one of the most likely causes. A FLOGI is one frame going from an N_Port to a F-port controller on a switch which registers it in the fabric controller. That is the only reason why a FLOGI is needed, to obtain a 24 bit fabric address. After the PLOGI and nameserver registration an RSCN is send out to all devices in its zone and the other end-toend queries and registrations begin.
The fun becomes even more apparent next year with 16 and 32 G speeds where we switch from 8b/10 to 62b/64b encoding. This encoding mechanism is already used on 10G FC hence the reason its not interoperable with 1/2/4/8G speeds.
In short if you have 8G ports on HBAs and Storage and have Brocade 8G port with FOS => 6.3.1 use Mode 2. All other linespeeds (1/2/4) still use IDLE fillwords and require Mode 0. If you use FOS <6.3.1 it depends a bit on the implementation of the HBA/Storage vendors. Recommendation is to upgrade to the latest supported firmware levels to be able to adhere to the standard.
You may find that especially when using long distance connections over DWDM where either transponders or TDM multiplexers are used in some occasions these devices have not adhered to the fibre channel standard yet. You should consult with the DWDM provide to upgrade the firmware in those devices to be able to get a reliable connection.
I hope this explains a bit these changes.
-----------------------------------Generated by Jive SBS on 2012-03-12-06:00 26
Cheers E
andreas.bergelt 600 posts since Apr 12, 2010 39. Re: Rapid increasing er_bad_os at 8 Gbit speed Dec 27, 2010 2:46 AM in response to: erwin.vanlonden
Hello Erwin,
many thanks for this details and the very good explanation of ABRff and IDEL.
What happens if you fix the switch port and storage port of an Hitachi Array to 8 Gbit. I assume that both devices have to start with ABRff right from the beginning. Is this right?
Why does this cause problems on the Hitachi arrays to have difficulties to get in sync with the switch port. I have seen switch ports which change to state faulty.
Is this related to miss behavior of the storage port firmware on the arrays?
From a user point of view it is very confusing if some devices work with IDEL and some with ARBff at 8 Gbit speed.
Andreas
erwin.vanlonden 61 posts since Jul 27, 2009 40. Re: Rapid increasing er_bad_os at 8 Gbit speed Dec 27, 2010 6:10 PM
in response to: andreas.bergelt
Andreas,
Hitachi arrays follow the standard. Very strict. End of Story. The problem with the Brocade switches (or rather the administrators) have, is that they have options. :-) If you choose the incorrect one it doesn't work with any vendors array.
According to the standard during init state the ports still have to use IDLE primitives irrespective of speed. Upon entering the Active state both ports have to switch to ARBff primitives after sending at least 6 IDLE and recieving at least 2. (Also normal process in the FC-FS part). So only one mode (2) adheres to that. The reason Brocade came up with mode 3 is because some vendors had 8G implementations while some issues in the standard weren't totally fleshed out. There was one issue were there could be a deadlock during init phase in which both ports could never come online. One port would send a NOS primitive in which the OLS/LR/LRR primitive sequence started off. This then would still end up in this deadlock situation. For this reason mode 3 was implemented where the Brocade switch waits for a NOS and then switches to ARBff.
As you can see all these 4 modes serve their purpose because some vendors had deviations in their implementations during the creation of this part of the standard. Brocade just gave them the options. Only mode 2 adheres strictly to the standard.
Hope this explains some of the reasons behind these options as well as some internals.
Cheers E
BryanO 1 posts since Sep 21, 2010 41. Re: Rapid increasing er_bad_os at 8 Gbit speed Jan 20, 2011 10:20 AM in response to: Mahendran
Mahendran,
Any update on if you were able to resolve your issues?
Thanks
sai.nikhil 4 posts since Dec 17, 2010 42. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 8:30 AM
I have the same issue, but the issue is on the 8G switch port where the VMWare host is connected. Does this problem have a fix?
Can I try to peg down the speed of the port to 4G and see if it resolves the issue?
Thanks, SK
andreas.bergelt 600 posts since Apr 12, 2010 43. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 8:46 AM in response to: sai.nikhil
Depends on your FOS code. Check the fill word setting on the SAN switch port and have a play with it. Try option mode 2 or 1. A change on this settings will disrupt the IO due to a link reset.
Andreas
sai.nikhil 4 posts since Dec 17, 2010 44. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 8:52 AM
in response to: andreas.bergelt
Thank you for the prompt reply, Andreas.
The interesting point I note is , this is happening on the switch ports on the two fabrics where the host is connected to.
One switch is at 6.4.1b and other side is on 6.3.0b.
Ok. Does this change prove to solve the issue?
Currently I would say this is not a real issue. Traffic will flow through the switch without a big performance issue.
I assume that on FOS 6.3.0b this mode is not present. So change it on FOS 6.4.1b and check if everything is OK on the server side.
Andreas
sai.nikhil 4 posts since Dec 17, 2010 46. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 9:05 AM in response to: andreas.bergelt
v6.3.0b has only 2 modes
0 -idle-idle 1 -arbff-arbff
v6.4.1b has 4 modes
0 -idle-idle 1 -arbff-arbff 2 -idel-arbff 3- aa-then-ia
Technically I believe that this problem can be fixed, if the host can be moved to some existing 4G connections. Am I assuming it right?
sai.nikhil 4 posts since Dec 17, 2010 47. Re: Rapid increasing er_bad_os at 8 Gbit speed May 13, 2011 9:11 AM in response to: sai.nikhil
And..to add to that regarding performance. The VM team is seeing an impact in the form of backups running slow. I'm not completely sure, if this would be the reason, but at the moment, I cannot find any other errors.
Thanks, SK
In my case it doesn't had an impact on the performance.
The switch expect other fill words than the HBA is sending and this causes the error counter to increase. Between each data frame the switch expect arbff frames but the HBA is sending idle frames. This is due to different implementations of 8GB FC standard. Brocade found out that abrff are better than idle on 8gbit speed. If you slow down the port speed to 4Gbit everything is fine because both devices are sending idles. Idles frames or arbff frames should not have a performance issue. I suggest to look somewhere else to find the performance issue.
Andreas
1 2 3 4 Previous Next

4036

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

4036

Enviado por

Direitos autorais:

Formatos disponíveis

Rapid increasing er_bad_os at 8 Gbit...

AndyAtEon 10 posts since Jul 12, 2005

Generated by Jive SBS on 2012-03-12-06:00 1

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 2

Rapid increasing er_bad_os at 8 Gbit...

xwdees02_a1_ds:FID128:A15710> portcfgshow 1/9Area Number: Level: AUTO(HW)Fill Word: 0(Idle-Idle) ....

Generated by Jive SBS on 2012-03-12-06:00 3

Rapid increasing er_bad_os at 8 Gbit...

Wait some seconds...

Generated by Jive SBS on 2012-03-12-06:00 4

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 5

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 6

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 7

Rapid increasing er_bad_os at 8 Gbit...

Andreas, they already gave themselves the answer...

--->>>in our case it is not related to media, cabel or SFP.

"It is related to 8 gbit compatiblity issues."

Andreas, they already gave themselves the answer...

--->>>in our case it is not related to media, cabel or SFP.

"It is related to 8 gbit compatiblity issues."

Generated by Jive SBS on 2012-03-12-06:00 8

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 9

Rapid increasing er_bad_os at 8 Gbit...

Hop this will help.

a.adamson 53 posts since

Generated by Jive SBS on 2012-03-12-06:00 10

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 11

Rapid increasing er_bad_os at 8 Gbit...

0 0 0 0 0 0 318 0 0 0 0 84 11245 2366 0 0 0 0 0 0 0 0 0 0 0

683736493 Class 3 frames received

Generated by Jive SBS on 2012-03-12-06:00 12

Rapid increasing er_bad_os at 8 Gbit...

sfpshow 4/1 Identifier: 3 Connector: 7 SFP LC

Transceiver: 150c402001000000 100,200,400_MB/s M5,M6 sw Inter_dist Encoding: 1 8B10B

Generated by Jive SBS on 2012-03-12-06:00 13

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 14

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 15

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 16

Rapid increasing er_bad_os at 8 Gbit...

Yeah agree. Maybe I should open a new thread.

Generated by Jive SBS on 2012-03-12-06:00 17

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 18

Rapid increasing er_bad_os at 8 Gbit...

Let me give an overview of the infra:

Fabric 1 = 20 switches Fabric 2 = 20 switches

Fabric 1 = 18 switches Fabric 2= 18 switches

Fabric 1 = 2 switches Fabric 2 = 2 switches

Generated by Jive SBS on 2012-03-12-06:00 19

Rapid increasing er_bad_os at 8 Gbit...

I have no clue where to start again.

Generated by Jive SBS on 2012-03-12-06:00 20

Rapid increasing er_bad_os at 8 Gbit...

Generated by Jive SBS on 2012-03-12-06:00 21