You are here: Home Documents Tracing the Agent

Tracing the Agent

seeing how Agent works with EOS through Tracing class
(verified 2010 Feb. / EOS 4.3.0 / DCS-7120T-4S 03.01)

Embedded tracing feature

Agent program on Arista has Tracing feature to see the behavior of the program. You can see trace0( ) or some other trace function calls in a lot of Agent programs. These functions has on/off of status bits switch to activate them and it is able to control by system wide variables. It has also message logging feature to the file.

In this document, show how to change the status bit and see the behavior of Agent programs. For example, we use fan extraction/insertion cause it is the easy example to change the condition of the Agent.

 

Tracing function call in Thermostat Agent

Check Thermostat Agent is running now.
$ ps -ef --forest
UID        PID  PPID  C STIME TTY          TIME CMD
....
root         1     0  0 Mar11 ?        00:00:34 /sbin/init
....
root      1473     1  0 Mar11 ?        00:00:00 ProcMgr-mast
root      1478  1473  0 Mar11 ?        02:41:55  \_ ProcMgr-work
....
root      1522  1478  0 Mar11 ?        01:17:10      \_ Thermostat
Look at Thermostat Agent had been invoked by Process Manager. The program of Thermostat Agent is /usr/bin/Thermostat and it is Python script. But It just loads Thermostat class and make it run.
Thermostat class is located under /usr/lib/python2.5/site-packages/ directory.
On the top of the script, you can see the following expressions;

bash-3.2# cat /usr/lib/python2.5/site-packages/Thermostat.py
....
import Tracing
__defaultTraceHandle__ = Tracing.Handle( 'Thermostat' )
t0 = Tracing.trace0   # Traces the fan speed algorithm
t1 = Tracing.trace1   # Traces configuration changes from the user
t3 = Tracing.trace3   # Cooling alarm level changes
t7 = Tracing.trace7
t8 = Tracing.trace8   # fru shutdown messages
t9 = Tracing.trace9
(....snip)

First, the program imports Tracing class then make aliases t0 to t9 functions for short expression.
You can see a lot of usage cases to call tracing function like follows;

def _handleTimeout( self ):
t0( "FanSpeedAlgorithm: start" )
It means, if function _handleTimeout() will be called, t0( ) function writes "FanSpeedAlgorithm: start" message to log file.

Set the tracing level

Tracing class

The code of Tracing class is /usr/lib/python2.5/site-packages/Tracing.py . You can see the following comment on the top of it.

-----
To use this module::
import Tracing
Tracing.trace3( "some message at level 3" )
Each of the trace<n> functions takes a variable number of arguments, which are
converted to strings using str() and concatenated to form the final trace string.
-----

It means, to use the trace feature, just import Tracing class and call the tracing function of the appropriate level.
After this comment, you can see how to turn on and off of the tracing feature of the specified level.
There are two methods to handle it, one is TRACE environment variable base and the other is using TRACE ccommand.

-----
*  Using the TRACE, TRACEFILE and TRACEFORMAT environment variables, like this::

export TRACE=*/01,Ebra/*,Fru/3,Artest*/1-5
export TRACEFILE=/tmp/trace.out
export TRACEFORMAT=time,file/15,func,line-number
This example turns on levels 0 and 1 for all facilities, all levels for Ebra,
level 3 for Fru, and levels 1 to 5 for any facility that starts with "Artest".
......(snip)

*  Using the Tracing CLI to modify the configuration of a running agent.
-----
When you start Agent script from bash, environment variable is easier to use. If you turn on the tracing status while Agent is running, TRACE command from EOS CLI is better.

TRACE command

Change EOS CLI mode to configure mode from enable mode.
localhost> enable
localhost# configure
localhost(config)#
Do SHOW TRACE to check the current status. You need to specify the name of the Agent.
localhost(config)#show trace Thermostat
Global trace settings for agent Thermostat
-----------------------------------------------
Tracing sent to stderr
date:     enabled
time:     enabled
PID:           enabled
facility name: enabled  (width 20)
trace level:   enabled                             <=== #1
filename:      disabled (width 20)
line number:   disabled
function name: disabled (width 20)
object name:   disabled (width 20)
Trace facility settings for agent Thermostat
-----------------------------------------------
Activity             enabled  ............
Agent                enabled  ............
.... snip .....
Thermostat           enabled  ............     <=== here is!
....
#1. this parameter (trace level) sets only enable/disable to write a log or not. Not for the tracing level.

The status of Thermostat agent is now enabled but the line of dots are important.
Try to enable the tracing level 0 of Thermostat Agent as follows;
localhost(config)#trace Thermostat enable Thermostat levels 0
After that, SHOW TRACE THERMOSTAT result will be modified from;

Thermostat           enabled  ............

to

Thermostat           enabled  0...........

The 0 bit had been turned on. And to remove this bit 0, you do the same command with "NO" like this;

localhost(config)#no trace Thermostat enable Thermostat levels 0

See the line of dots and zero sign status. 0 (zero) means the level 0 is "on". And the dot ( . ) means is the other level is "off".

If you turn level 1 on, instead level 0, The result should be;

Thermostat           enabled  .1..........
Well, these status bits hold several slots (bits) and these slots are selectable on/off independently each other.

As you can guess by the number of dots, 12 levels are selectable and you can see the level name through the help message of TRACE command. 

localhost(config)#trace Thermostat enable Thermostat levels ?
0         Enable tracing at level 0
1         Enable tracing at level 1
2         Enable tracing at level 2
3         Enable tracing at level 3
4         Enable tracing at level 4
5         Enable tracing at level 5
6         Enable tracing at level 6
7         Enable tracing at level 7
8         Enable tracing at level 8
9         Enable tracing at level 9
coverage  Enable tracing at level coverage
function  Enable tracing at level function
Actually I am not sure about the level "coverage" and "function". When you set it by "trace Thermostat enable Thermostat levels coverage" command, the result is here;
Thermostat           enabled  ..........c.

I cannot guess what does it mean.

Log file


TRACE feature will write a message to log file under /var/log/agents/ as text format. The file name has Agent name and the pid. For example, if you do "trace Thermostat enable Thermostat levels 0", the following message will be written.

 

[admin@localhost tmp]$ cat /var/log/agents/Thermostat-32378
2009-02-25 18:26:17.062900 32378 Thermostat           0 FanSpeedAlgorithm: start
2009-02-25 18:26:17.068373 32378 Thermostat           0 FanSpeedAlgorithm: setting fan Fan2/1 to 38 based on max inlet temperature
2009-02-25 18:26:17.068891 32378 Thermostat           0 FanSpeedAlgorithm: setting fan Fan4/1 to 38 based on max inlet temperature
2009-02-25 18:26:17.069364 32378 Thermostat           0 FanSpeedAlgorithm: setting fan Fan3/1 to 38 based on max inlet temperature
2009-02-25 18:26:17.069822 32378 Thermostat           0 FanSpeedAlgorithm: setting fan Fan1/1 to 38 based on max inlet temperature
2009-02-25 18:26:17.070279 32378 Thermostat           0 FanSpeedAlgorithm: setting fan Fan5/1 to 38 based on max inlet temperature

Source code

It is interesting to see the code of Agent. Here is the fragment of Thermostat.py code.
Find "based on max inlet temperature" text from the source code. You can see it includes in _handleTimeout() function of ThermostatConfigReactor class as follows;

-----
class ThermostatConfigReactor( Fwk.Notifiee ):
.......
def _handleTimeout( self ):
t0( "FanSpeedAlgorithm: start" )
.........
for fanConfig in self.coolingConfig_.fan.values():
............
fanConfig.speed = fanSpeed
t0( "FanSpeedAlgorithm: setting fan", fanConfig.name, "to", fanSpeed,
"based on max inlet temperature" )
-----

You can see "FanSpeedAlgorithm: start" message on the top of the log file. And from the time stamp of the log message, you know the interval of messages are 5 seconds.
And you can check the value of fanConfig.name and fanSpeed variables.

Making an event

When you extracts any fan, you can identify where the program is running through. The following log lines are the result of the fan extraction.

2009-02-25 18:37:08.995441  1139 Thermostat           0 FanSpeedAlgorithm: start
2009-02-25 18:37:08.999087  1139 Thermostat           0 FanSpeedAlgorithm: cooling alarm level is fanTrayFailedOrMissing setting fans to full speed
Yes, after this event, program branches to the following part (inside _handleTimeout() function);
elif systemCoolingAlarmLevel != "coolingOk":
t0( "FanSpeedAlgorithm: cooling alarm level is", systemCoolingAlarmLevel,
"setting fans to full speed" )
_configureAllFans( 100 )
And this time, you can see the variable of systemCoolingAlarmLevel has "fanTrayFailedOrMissing" value. So well, you also can see _configureAllFans( 100 ) expression. Yes, the rest of fans start 100% power. (You can see the status of it to type "show environment cooling" command of EOS CLI)

Adding a trace point

It is usual to add more trace points to see more detail behavior of Agents. It is possible to do that just add trace functions to source code then restart Agent as follows;

Modify the code

Here is an example to modify the source code. I moved Thermostat.py to /tmp for safety operation.
bash-3.2# cp /usr/lib/python2.5/site-packages/Thermostat.py /tmp/
bash-3.2# vi /tmp/Thermostat.py
bash-3.2# diff /tmp/Thermostat.py /usr/lib/python2.5/site-packages/Thermostat.py
1479c1479
<       t0( "FanSpeedAlgorithm: start ***" )
---
>       t0( "FanSpeedAlgorithm: start" )
bash-3.2#
This time, to make a simple example, it just one line modification. But you can add tracing function to anywhere in the code.

remount partition as read write mode

After the modification, you need to overwrite it to the original Thermostat.py but target directory had been mounted read-only filesystem initially. So it needs to remount as read write mode.

bash-3.2# mount -o,rw,remount /
bash-3.2# cp /tmp/Thermostat.py /usr/lib/python2.5/site-packages/Thermostat.py
bash-3.2#
In Python way, there are .pyo and .pyc files (compiled objects) beside the .py source program. It is okay to leave them because the modification date is older than .py source. Therefore .pyo and .pyc files will be ignored from Python interpreter. Of course it is also okay to delete by manually.

Restart Agent


After you finish to update, restart Agent program. It is okay to use kill command to stop the Agent process as follows;

bash-3.2# ps -ef | grep Thermostat
root      1522  1478  0 Mar11 ?        01:17:49 Thermostat
root      6224  6103  0 19:39 pts/2    00:00:00 grep Thermostat
bash-3.2# kill 1522
bash-3.2#
But after you killed it, you will observe Thermostat Agent with different pid.

bash-3.2# ps -ef | grep Thermostat
root      6226  1478 54 19:39 ?        00:00:00 Thermostat
root      6229  6103  0 19:39 pts/2    00:00:00 grep Thermostat
bash-3.2#
It means Process Manager detects the absence of the Agent and restart it automatically. (Please remember, you see the Agent had been invoked Proc-Mgr. Try ps --forest again to see it.)
And this happening will be logged to messages file.

bash-3.2# tail /var/log/messages
......
Apr 13 19:39:53 localhost ProcMgr-worker: %PROCMGR-6-PROCESS_TERMINATED: 'Thermostat' (PID=1522) has terminated.
Apr 13 19:39:53 localhost ProcMgr-worker: %PROCMGR-6-PROCESS_RESTART: Restarting 'Thermostat' immediately (it had PID=1522)
Apr 13 19:39:53 localhost ProcMgr-worker: %PROCMGR-6-PROCESS_STARTED: 'Thermostat' starting with PID=6226 (PPID=1478) -- execing '/usr/bin/Thermostat'
bash-3.2#

Reset the Trace setting

Please mind that all tracing level settings that you have setted by TRACE command will be initialized, by restarting the Agent process.
There is no way to recover it, just set them again to type "trace Thermostat enable Thermostat levels 0" or some. After this resetting, a new log file will be created under /var/log/agents/ directory with new pid of the Agent.

bash-3.2#  ls /var/log/agents/Thermostat-*
/var/log/agents/Thermostat-1522  /var/log/agents/Thermostat-6226
bash-3.2#
To see this new log file, you can confirm the last modification will be reflected correctly. (see the first log line, "***" mark is included. )

bash-3.2# cat /var/log/agents/Thermostat-6226
==== Output from /usr/bin/Thermostat [] (PID=6226) started Apr 13 19:39:53 ===
2009-04-13 19:43:53.667903  6226 Thermostat           0 FanSpeedAlgorithm: start ***
2009-04-13 19:43:53.673337  6226 Thermostat           0 FanSpeedAlgorithm: setting fan Fan2/1 to 37 based on max inlet temperature
2009-04-13 19:43:53.673852  6226 Thermostat           0 FanSpeedAlgorithm: setting fan Fan4/1 to 37 based on max inlet temperature
......

remount to read only mode again

For safe operation, remount back the root directory to read only mode.

bash-3.2# mount -o,ro,remount /
bash-3.2#
If read only remounting had been done before the restarting the Agent, .pyc file (complied object) will not be replaced. (Following list shows normal case, .pyc recompiled and the time stamp of it is older than the .py source)
bash-3.2# ls -l /usr/lib/python2.5/site-packages/Thermostat.*
-rw-r--r-- 1 root root 75613 Apr 13 19:29 /usr/lib/python2.5/site-packages/Thermostat.py
-rw-r--r-- 1 root root 52723 Apr 13 19:58 /usr/lib/python2.5/site-packages/Thermostat.pyc
-rw-r--r-- 1 root root 52022 Nov 22  2009 /usr/lib/python2.5/site-packages/Thermostat.pyo
bash-3.2#

But even if .pyc file was note replaced, it is not a serious problem. Agent always runs from new source code.
But if you restart the system, all modificatons will be vanished and the system restarts as initial setup from their flash storage. It means if you want to enable your modification after rebooting, we need another works.


(see Extensions and Switching image and Aboot document. )

Filed under: