Tuesday, 17 March 2015

Cache Error on SRX5K SPC II causes flowd process core

Product Affected:

SRX5400/5600/5800 using SPC II (SRX5K-SPC-4-15-320) and running with Junos OS :
  • 12.1X44-D10/D15/D20/D25/D30/D35
  • 12.1X45-D10/D15/D20/D25/D30
  • 12.1X46-D10/D15/D20
  • 12.1X47-D10
Alert Description:
A cache error exception could happen randomly in rare conditions on SRX5K SPC II (Services Processing Card, SRX5K-SPC-4-15-320) when the SPC is referring to an invalid physical address in memory, which triggers flowd process core and all SPCs restart on the local node. If the chassis cluster feature is enabled, the data plane will fail over to the other node.

For example, the following output will be shown when this issue happens.

root@SRX5K> show system core-dumps

-rw-rw----  1 nobody wheel 941023387 May 12 23:45 /var/tmp/flowd_xlr-SPC7_PIC3.core.0.gz

root@SRX5K> show log messages
...
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: cpuid = 26
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3-ADDRESS_ERR: pid 251 (flowd_xlr), uid 0: pc 0xffffffff802927e0 got a read fault at 0xffffffff802927e0
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: Trapframe Register Dump:
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: zero: 0000000000000000  at: fffffffffffffdff  v0: 0000000000000001  v1: 00000001c9322248
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: a0: ffffffff80a00406  a1: 00000000200f09fc  a2: 0000000000000000  a3: 00000000243ab1b8
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: t0: 0000000000009e63  t1: 00000001eb0e4a50  t2: 00000002dab903c0  t3: 00000002dab90390
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: ta0: ffffffffd1ebbad8 ta1: 0000000000000000 ta2: 0000000000000000 ta3: 0000000000000000
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: t8: 000000000000003a  t9: 0000000020105650  s0: 000000000000001a  s1: 00000000243ab288
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: s2: 00000000243ab1b8  s3: 00000000243ab1b8  s4: 000000000000001a  s5: 00000000241d0000
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: s6: 0000000021950000  s7: 00000001c9321e88  k0: 0000000000000000  k1: 0000000000000000
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: gp: 0000000000000000  sp: 0000000fdd5eaea0  s8: 000000000000001a  ra: 00000000200f0bc8
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: sr: 00000000508198f3 mullo: 0000000000000000    mulhi: 0000000000000000
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: pc: ffffffff802927e0 cause: 0000000000000010 badvaddr: ffffffff802927e0
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: pc address 0xffffffff802927e0 is inaccessible, pte = 0x0
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-2: flowd core, 
May 12 23:43:31   fpc7 Cowra: %PFE-3: XLP3 flowd_xlr core dump, current state SPU_STATE_WORKING. 
May 12 23:43:31   fpc7 flowd_xlr coredump start, ecc regs: %PFE-3: 0,0,0,0 
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-2: stop xaui rx and drain packets on lbt cpu 4
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-2: msgring_drain_process: bind thread to hwtid (4) cpuid(4)
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-2: [msgring
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-2: _drain_process]476 msges drained
May 12 23:43:31   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-2: Kernel thread "msgdrainthr4" (pid 41228) exited prematurely.
May 12 23:43:32   fpc7 Cowra: %PFE-3: XLP3 flowd_xlr down, current state SPU_STATE_CRASH. info: Flowd down, flowd_xlr_statusfound flowd in coredump.  
May 12 23:43:32   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: spu_cobar_send_mail_unlocked: New mail (6) tried 2 times to be sent, finally sent
May 12 23:45:05   /kernel: %KERN-4: peer_inputs:4300 VKS0 closing connection peer type 10 indx 31 err 0
May 12 23:45:05   /kernel: %KERN-3: pfe_send_failed(index 31, type 10), err=32
May 12 23:45:05   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-4: peer_inputs:4300 VKS0 closing connection peer type 10 indx 31 err 0
May 12 23:45:05   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 kernel: %USER-3: pfe_send_failed(index 31, type 10), err=32
May 12 23:45:10   /kernel: %KERN-3: ###rdp_usr_detach tcb NULL socket 0xc6a824d4
May 12 23:45:15   fpc7 Cowra: %PFE-3: XLP3 flowd_xlr down complete. 
May 12 23:45:15   (FPC Slot 7, PIC Slot 3) SPC7_PIC3 init: %AUTH-6: flowd_xlr (PID 173) exited with status=0 Normal Exit
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_IFDEV_DETACH_PIC: ifdev_detach_pic(7/3)
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_SNMP_TRAP7: SNMP trap generated: Fru Failed (jnxFruContentsIndex 7, jnxFruL1Index 8, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC: SRX5k SPC II @ 7/*/*, jnxFruType 3, jnxFruSlot 7)
May 12 23:45:16   alarmd[975]: %DAEMON-4: Alarm set: PIC color=RED, class=CHASSIS, reason=FPC 7 PIC 3 SPU flowd core dump complete
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_PIC_OFFLINE_NOTICE: Taking PIC 3 in FPC 7 offline: SPU flowd core dump complete
May 12 23:45:16   craftd[976]: %DAEMON-4:  Major alarm set, FPC 7 PIC 3 SPU flowd core dump complete
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 7 offline: Reset on SPC/SPU failure
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(7)
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 0 offline: Reset on SPC/SPU failure
May 12 23:45:16   chassisd[1506]: %DAEMON-5-CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(0)
....
  1. SPU kernel detected user space address error - %USER-3-ADDRESS_ERR: pid 251 (flowd_xlr), uid 0: pc 0xffffffff802927e0 got a read fault at 0xffffffff802927e0
  2. SPU kernel started to generate flowd core - %USER-2: flowd core`) after collecting registry values
  3. Chassisd detached affected SPC - %DAEMON-5-CHASSISD_IFDEV_DETACH_PIC: ifdev_detach_pic(7/3)
  4. Chassisd offlined affected SPC due to SPU flowd core dump - %DAEMON-5-CHASSISD_PIC_OFFLINE_NOTICE: Taking PIC 3 in FPC 7 offline: SPU flowd core dump complete
  5. Chassisd reset all SPCs - %DAEMON-5-CHASSISD_FRU_OFFLINE_NOTICE: Taking FPC 0 offline: Reset on SPC/SPU failure

This issue can be tracked via PR1005195.


Solution:
This issue is fixed in Junos OS 12.1X44-D40, 12.1X46-D25, 12.1X47-D15 and higher versions.

NOTE: There is no known way to monitor the system status before the flowd core due to cache error and there is no known workaround available. If the system reports a flowd core, please open a "Technical Service Request" on the Case Manager.

Tuesday, 10 March 2015

Junos OS 12.1X46-D30 is not recommended with IDP feature due to significant detection rate drop

Product Affected:

SRX100, SRX110, SRX1400, SRX3400, SRX3600, SRX5400, SRX5600, and SRX5800 running with Junos OS 12.1X46-D30
 
Alert Description:

Due to software DFA (Deterministic Finite Automata), which is used for IDP signature pattern match, code change on Junos OS 12.1X46-D30, IDP detection rate will be dropped significantly when the following conditions are met :
  • Upgraded to Junos OS 12.1X46-D30
  • IDP feature is configured on a security policy with IDP active-policy
  • Software DFA is used

What SRX platforms use the software DFA?
  • SRX100, SRX110, SRX1400, SRX3400, SRX3600, SRX5400, SRX5600, and SRX5800 use software DFA
  • SRX210, SRX220, SRX240 SRX550, and SRX650 use hardware DFA

Solution:
This issue is fixed in Junos OS 12.1X46-D31 (Service Release), 12.1X46-D35 (scheduled to be released in end of April, 2015), and later releases.


12.1X46-D31 Download Links

SRX5400/5600/5800
SRX1400/3400/3600
SRX100/110

Tuesday, 3 March 2015

SRX: JTAC Recommended Junos Software Versions

JTAC recommended versions of Junos software are listed to assist with determining which version of software to download and install.

 SRX Series Services Gateways

Platform JTAC Recommended Junos Software by Platform Release
Type
Last
updated
SRX100B/H Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX100H2 (*1) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX110H Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX110H2 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX210B/H/BE/HE Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX210H2 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX220H Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX220H2 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX240B/H/B2/H2 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX550 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX650 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX1400 (*2) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX1400 w/NP-IOC (See Note 2) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX3400 (*2) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX3400 w/NP-IOC (See Note 2) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX3600 (*2) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX3600 w/NP-IOC (See Note 2) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX5400 (See Note 6) Junos 12.1X46-D30.2 Standard 25 Feb 2015
SRX5600 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX5600 w/SPC II (See Notes 3, 4, 5) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX5600 w/IOC II (See Note 6) Junos 12.1X46-D30.2 Standard 25 Feb 2015
SRX5800 Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX5800 w/SPC II (See Notes 3, 4, 5) Junos 12.1X44-D45.2 Standard 19 Feb 2015
SRX5800 w/IOC II (See Note 6) Junos 12.1X46-D30.2 Standard 25 Feb 2015
loading...