Download This File                   Download All Documents                        Go to Home Page

Chapter 7

Special Features:

The Hung Trunk detector

One of the most feared problems that long distance providers face is what is known as hung trunks. This is simply a call that finished, but for some reason it is still holding the one or the two channels involved. The origin may be due to equipment failure or human error (the caller failed to hang up), but the result is equally disastrous in either case. Not only is the hung trunk using up valuable resources, but also it renders huge billings that upset the clients generating distrust.

Omnibox includes a feature that goes testing every channel in use, if the channels shows some inconsistency, this is for example: an outbound channel routed to an inbound that is disconnected or even already routed to a different outbound channel. Of course inconsistencies shouldn't happen but a switch scenario is very complex and states can follow a myriad of different paths and there's no way of being too careful.  Still, if an outbound channel has passed the inconsistency test but it has been connected for more than a specified time (StartAnalysis in table dia_InCh), then it is when real hung trunk analysis starts.

OmniBox will spin a thread for each outbound channel to be analyzed for hung trunk. The process will go like this: a voice resource will be requested to listen to this channel and determine what is going on, if a voice or FAX is heard, then it is OK, the resource is returned and it waits for 15 seconds before testing again.  But if:

There are these cases where a caller's disconnect won't generate the signal needed to terminate the call, then the hung trunk detector can work as a "hung up" detector as well. The only difference being that to detect hung trunks the analysis may start after 5-10 minutes, while for hung up detection you need to bring this down to 1-2 minutes. Hung up detector will use more voice resources and CPU time than the hung trunk, but that's it.

Being this feature a rather sophisticated one, it frequently becomes a prime suspect on the "Calls are being cut trouble ticket" case. The OmniBox suite provides a mean of proving its innocence, there's an action in the Monitor menu that can disable the feature, if the problem is still there, and believe me, it usually is, then all the "serial Call Killer" charges must be dropped. 

The hung trunk problem involving IVR’s

Channels doing IVR functions, always own a voice resource when they are active. This voice resource is used for prompting and getting digits. Even after prompting is over and there’s a conversation, the voice resource is always waiting for a digit (## terminates a call and returns the caller to the menu)  If  the hung trunk were to request a resource to do analysis, the resource administrator will say “You already got one, use it!”,  but doing that, will truncate whatever it was playing back or any get-digit operation. So, the hung trunk detector can not be allowed to analyze channels doing IVR functions.

Hung trunk analysis hit a similar problem with channels that have attached resources like analog lines or time slots of E1 using R2MF. You can not borrow their attached resources because, that’s why they must be attached, they need them all the time.

The rule is simple, if the channel already owns a resource, skip the analysis.  This means that the hung trunk method is limited to resource-less channels.

However, OmniBox has developed a defense against IVR hung trunks. Since prompt playback and digit collecting operations are subject to specified termination conditions, OmniBox will use these as an alternative hung trunk detector. The terminating conditions specified by the OmniBox IVR are:

  1. MAX_DTMF = 1                                                                 Early return with a digit
  2. MAX_TIME  = 7 or 3s                                                         Return with no digit
  3. MAX_SILENCE = MAX_TIME – 0.1 seconds                    Increase the Strike-Count in one, return
  4. FAX_TONE                                                                         Early return with no digit
  5. BUSY_TONE1                                                                     Hang up
  6. BUSY_TONE2                                                                     Hang up

If a busy tone, as defined in the CDP file (See chapter 3 dia_V_Boards), is detected while waiting for a digit, the call is dropped. This is specially useful when calls are originated in an FXO channel bank that generate no signaling upon disconnect, but use a busy cadence as a disconnection tone instead..

A fax tone produces an early return without a digit, this can trigger special functions in the IVR to deal with faxes.

A DTMF detection will terminate the get-digit of playback and return with a digit that will produce the corresponding state change in the IVR state machine.

If the channels has remained silent for MAX_TIME - 0.1s., then a “Strike” counter will be incremented. The same counter will be incremented by the hung trunk if it finds silence (or SIT tones) in the outbound. If the hung trunk finds that this counter over a hard coded internal threshold, it will truncate the call.

If none of the above, not even silence, is detected then the function will normally return after MAX_TIME (or playback end) with no digits but no increment to the “strike” counter.

 

The flow diagram for the hung trunk detector:

 

 The WatchDog for The OmniBox

 There is not such thing as a failure proof communication system. There is so much that can go wrong in a complex system that the best you can do is “be the first to know” and know it fast. That is what this Watchdog feature is all about. It will page everyone in a list when some something goes abnormal.  

 

(The Watchdog feature in version 3+ of the OmniMonitor makes the approach in this article somewhat obsolete, please check E-mail Paging in chapter 6)

 

The pages can be made through a Modem whose COM port is specified in the environmental variable WD_COMPORT. If this is a positive value, like 1, 2…etc it will be interpreted as an actual modem being connected to that comport in the computer were the Watchdog is running, if negative, the Analog board in the OMNIBOX will be used for paging.. But this small saving is the only advantage of this approach since it is more reliable to have the Watchdog watch from a different computer in the network, just in case the OMNIBOX computer goes down.

 The Watchdog is an independent application that is running on a computer, preferably different to the one running OMNIBOX. The Watchdog is notified by the OMNIBOX on any abnormality through UDP of Window Sockets.

 Currently there are 9 types of pages:

 

Description

Num Message

Parameters

T1 alarm

0*B*C*E

B= Board Number; C = Alarm code;

E = WD Engine ID(for all messages)

Low Completion Rate

1*Cr*E

Cr = Complerion Rate %

Test page

2*E

Ch = Channel Number

NT event Log had an entry

3*Ev*E

Ev = Event ID

OmniBox  is dead

4*E

 

Too many bad calls in outbound Ch.

5*Ch*E

Ch = Channel Number

To many bad calls in an outbound range

6*Rng*E

Rng=Range ID

Too many short calls

7*Sh

Sh = % rate of how many of the good calls are short

Low traffic

8*Exp*E

Exp = Expected seconds to the next call.

System unloaded

9*E

 

Exception in the Watchdog app

10*Asc

Asc = ASCII code for alarm ID that cause the exception

Too many non routed Calls

11*D*E

D = Domain ID

More than 20 replication failures

12*E

 

Seized more than 3 times being excluded

13*Ch*E

Ch=Inbound Ch

 The first number identifies the message type, the rest are parameters with meanings that depend on the message type. Channel numbers and boards are 1 based. The ‘*’ may show as a dash or a space in some pagers. A message for lost sync T1 (code – 10) alarm at board 3 in engine 0 will show on the pager like:

 0-03-10-0

 The watch dog has an engine ID, normally the ID of Engine it is watching plus 100. The criteria for low, high, max’s and min’s are read by OmniBox from dia_InCh and dia_OutCh. Port numbers are read from the  sys_Parameter table in a database pointed to by the WatchdogSrc ODBC source. Normally it is a local MS JET engine file named Watchdog.mdb. Here’s an example for the watchdog of Engine 10.

 

ParamID

ParamNumValue

ParamStringValue

Description

EngID

8

1028

 

Socket port number

110

9

1026

 

Engine socket number for WD.

110

 

The people to be paged are stored in the WD_Pagees table:

PagerNumber

ExcludedMsgs

PageeName

99999999

11*5*

Controller1

88888888

 

Administrator

 Not all the page types have to reach all of the pagees, all the prefixes to the numeric messages that you type under ExcludedMsg separated by “*” or any other suitable separator like, space, comma, colon, etc, will be excluded from the page list if such event would happen. Controller1 won’t get news on inbound excluded channels (11*) or entries to the NT event log (3*). Also the Watchdog won’t page with the same numeric message twice in a ten minute span or issue more than 3 pages in the same time span.

 The 0 Page

The Alarm Codes will be:

 

0x00

0

Out of frame error; count saturation.

0x01

1

Initial loss of signal detection.

0x02

2

Driver performance monitor.

0x03

3

Bipolar violation count saturation.

0x04

4

Error count saturation.

0x05

5

Receive yellow alarm.

0x06

6

Receive carrier loss.

0x07

7

Frame bit error.

0x08

8

Bipolar eight zero substitution dtct.

0x09

9

Receive blue alarm.

0x0A

10

Receive loss of sync.

0x0B

11

Got a red alarm condition.

 If the condition is restored then add 16 to the code, this is:

0x10

16

Restored out of frame error; count saturation.

0x11

17

Restored initial loss of signal detection.

0x12

18

Restored driver performance monitor.

0x13

19

Restored bipolar violation count saturation.

0x14

20

Restored error count saturation.

0x15

21

Restored receive yellow alarm.

0x16

22

Restored receive carrier loss.

0x17

23

Restored frame bit error.

0x18

24

Restored bipolar eight zero substitution dtct.

0x19

25

Restored receive blue alarm.

0x1A

26

Restored receive loss of sync.

0x1B

27

Restored got a red alarm condition.

 The 1 Page

To calculate completion rate the Watchdog count good calls and bad calls (busy, no answer, no ring back or no dial tone). If the good count to the total count rate is below the value specified in the sys_Parameters table (parameter 9), the page procedure is fired. If the counter went counting forever the method would become insensitive to an abnormal situation, so the counters must be reset to half the count when a maximum is reached to keep it low enough to be sensitive but high enough to be statistically significant. This compromise depends on traffic, so this number must be set for each range in the WD_SampleSize field of table dia_OutCh.

 The 2 Page

OmniBox will issue a test page upon a command from the Monitor.

 The 3 Page

 Throughout the OMNIBOX code there are quite a few traps for abnormal situations that are logged into the NT Event Log. If something is logged the Watchdog is notified and pages issued. The Event ID number has the following convention.

App Level events                                0   - 9

Data Interface                                     10  - 99

Pool level events                                100 - 999

Ch events                                X000 + Ch#(0 - MaxCh)

DTI events                                     30000  - 30999

VOX       events                            31000  - 35999

Specifics                                       36000  - 37999

 

 The actual table follows:

 

Event ID

Description

0

System unloaded

1

System loaded

3

Could not read sys_Parameters, defaults in effect

4

Exception at InitRecChannels AswSup not loaded

5

No database connection

10

No link to database could be stablished

11

Exception opening database

12

Database error (error message logged)

13

Access violation in database thread

14

Error in select query

15

Timeout in select query

16

Error opening cursor

17

Time out opening cursor

18

Error in get record

19

Timeout in GetRecord

20

Error in action query

21

Timeout in action query

100

Could not register any tones!

101

Socket on Receive error

103

SetSockOpt socket failed

110

RAS Socket created

200

+ev | Unknown alarm ev

400

Exception in Analog Thread.

430

+B | Exception doing whole T number B.

460

+ B| Exception in alarm event handler on board B.

500

T1 alarm on board b

1000

+Ch | Channel Ch init failed!

2000

+Ch | Time slot for Ch open fail!

3000

+Ch | Error in dt_setsigmode

7000

+b | T1 alarm mask could not be set on board b

8000

+Ch | Signal event mask coud not be set on Channel Ch

9000

+b | Board b open failed;

13000

+Ch | Channel Ch failed to set hook state

14000

+Ch | Event mask con Winkwait could no be set on Ch

15000

+Ch | Event mask con Winkwait could no be reset on Ch

16000

+Ch | Wink failed on Ch

30000

+Ch | Receiving thread Exception

31000

+v | Voice resource v failed init.

31300

+Ch | Voice resource routing to Ch failed

31600

+v | Voice resource v open failed!

31900

+v | blddt failed on voice resource v

32200

+v | Add double tone to voice resource v failed

32500

+v | bldst failed on voice resource v

32800

+v | Add single tone to voice resource v failed

33100

+v | Init call Perfect failed on v

33380

+Th | Exception in playtone thread # Th

34000

+rs | Voice resources exhausted of type rs

34010

+rs | Exception while getting voice resource of type rs

34020

+rs | Timeout waiting for voice resource type rs at the mutex

34030

+rs |  Timeout while getting voice resource of type rs

34299

Exception in PostDial delay thread

36000

+Ch | Channel Ch tested good and is back in service

36001

+ Ch | Call ID for Channel Ch returned < 0!

36300

+Ch | Channel Ch tested bad and has set out of service

36800

+Ch | Call on channel Ch could not get logged

36900

+Ch | Unknown event received on channel Ch

37200

+Ch | Failed to set Hook State to Ch at event handler

37500

+Ch | Exception in event handler

37800

+Ch | Exception in ReceiveProc for channel Ch

38000

+Ch | While preparing resume Call on Ch

38999

Commit Db Changes was hit %d (outB), %d (InB)

39000

+Ch | Exception in MarkAsDead

*There are two entries that are excluded from paging:

500 – T1 alarms are reported directly to the Watchdog with numeric information absent in the Event ID.

0 – System Unload, is sent directly also to avoid interference with the unloading process.

The events in bold and italics are the most frequently encountered.

The 4 Page

The Watchdog queries the OMNIBOX every minute, if the application is hung then it won’t respond, the Watchdog will then fire a 4 page procedure.

 The 5 Page

Each outbound channel has a bad call (busy, no answer, no ring back or no dial tone) counter that is reset on every good call. If the count makes it over the WD_MaxChBadInARow field in dia_OutCh, the 5 page procedure is triggered.

 The 6 Page

When there is no incoming call for more than n times the ETBC (Expected Time Between Calls) the Page is triggered. The number of times n, is read from WD_Tolerance in dia_InCh. The ETBC  is the average time between calls in the last m calls, were m is WD_SampleSize in the dia_InCh table. ETBC is also corrected by the trend. The trend being the change in this average in the last 2*m calls. The higher m the more statistically stable and so more unlikely to get a false alarm on a fluctuation, but to high it may become insensitive. The number must be set as high as the traffic volume allows it. 

 The Log_Pages Table

 A log for each page issued is stored in the Log_Pages table.  This is useful to know the cause if something is going wrong with the pages ot to know if pages  of certain type were issued upon an event that is know to have happened or how many pages of some type where issued in a given time interval.

An example of the table follows:

 

PageID

PageTime

PageeID

PageMsg

PageResult

464

1/3/00 7:27:02 AM

3077208

8*220*0

VCOM

463

8/9/99 8:19:02 AM

3077208

8*111*0

VCOM

462

8/6/99 3:01:33 PM

3077208

3*1*0

VCOM

461

8/6/99 1:57:41 PM

3077208

8*1257*0

VCOM

460

8/6/99 1:37:01 PM

3077208

3*1*0

NO DIALTONE

 

 

Analog Service Inbound Ports

 

 If the Dialogic analog boards  (D/41SC or D/160SC) are included in the system, the following features are can be enabled for one or more of the analog lines:

1.        Listen into a specified channel

2.        Setup a call to a specified outbound channel

3.        Setup a call into a route

4.        Setup a call into an inbound trunk group (Domain)

 

These features allow the following tasks to be performed.

 Tasks:

·         If you want to listen to calls to test voice quality or check what is going on, use feature 1

·         To check why calls are failing or short use feature 1

·         If you want to test a particular channel, use feature 2

·         If you want to make a phone call to actually talk to somebody, use feature 3.

·         If you need to test digit processing settings, use feature 4

 

Test Line Conversation Flow

 First prompt:

 Enter 1 to Listen to a Ch, 2 for Placing a call in a Ch, 3 to place a call in a route, 4 to place a call as a Domain, hit star to go back to previous menu

 If you select 1:

                Enter channel number followed by the '#' sign

As soon as you do that, you will start listening to what ever is going on in that channel, every time you hit ‘*’, the listening will be switched to the other side of the conversation (can’t listen to both at a time). If you enter a  number <100 but also less than the number of spans in your system, it will be interpreted as you wanting to listen to the last call received or made in that span.  A voice message will tell you the channel number before connection. If you dial a zero, it will be interpreted as the previous span specified, if there's no previous, you will listen to the last call made or received in the Box. If you dial a number that can’t be interpreted a as span number, you get an ‘Invalid” message..

 

 If you select 2:

Enter channel number followed by the '#' sign

And then

Enter the number to be dialed followed by the '#' sign + Dial Tone

                After you enter the number + #…

                If a number+# is entered

                    Connecting

                    Call progress + Conversation

                If just # is entered (Robbed bit or Analog)

                    The selected channel is seized and routed full duplex, 

                    you must then do your dialing or what ever. 

 If you select 3

                 Enter route index followed by the '#' sign

                Given the name of the destination, route index can be found in the IdxLookUp table, if you know the Range (or Oubound trunk group) ID, then you may use the RouteTable. The Range ID also shows in the Stats Window of the Monitor as, for example, G 4.

And then

Enter the number to be dialed followed by the '#' sign + Dial Tone

                After you enter the number + #…

                Connecting

                Call progress + Conversation

 If you select 4

                Enter domain ID followed by the '#' sign

                The Domain ID can be found in the dia_InCh table.

And then

Enter the number to be dialed followed by the '#' sign + Dial Tone

                After you enter the number + #…

                Connecting

                Call progress + Conversation

If the call attempt fails, you will get a spoken message with the detailed result.

 

      Download This File                   Download All Documents                        Go to Home Page