<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>ClearNet Security: Tag profiling</title>
    <link>http://blog.clearnetsec.com/articles/tag/profiling</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description></description>
    <item>
      <title>Intro to systemtap</title>
      <description>&lt;h2&gt;What is systemtap?&lt;/h2&gt;
&lt;p&gt;
&lt;a href=http://sources.redhat.com/systemtap/&gt;Systemtap&lt;/a&gt; is a tool built upon the kprobes framework for probing, debugging and analyzing the &lt;a href=http://www.kernel.org/&gt;linux kernel&lt;/a&gt; at runtime.   &lt;a href=http://www.ibm.com&gt;IBM&lt;/a&gt; contributed &lt;a href=http://sources.redhat.com/systemtap/kprobes/&gt;kprobes&lt;/a&gt; to linux and had it included during the 2.5 development tree in 2002.   It's a supported and standard portion of the Linux kernel.  Systemtap builds upon kprobes and creates a fairly easy to use tool that let's you probe just about anything and everything in the linux kernel with fairly simple commands, you can almost effortlessly modify any variable you wish on a running kernel.  All with minimal performance impact.  Essentially it's the linux version of dtrace the differences being that systemtap is probably more flexible in kernel space and dtrace currently does a lot more for probing userspace applications.
&lt;p&gt;
&lt;h2&gt;So how do I get this working?&lt;/h2&gt;
&lt;p&gt;
It's fairly easy and straight forward to install.  There are RPMs for Fedora, Redhat Enterprise and Suse based linux distributions.   There is a slight trick, at least with Fedora Core 5,  you need to have the kernel debug RPM also installed.   Basically, it's just a set of object code files with debug symbols.   I couldn't find it in the latest Fedora Core 5 and upon closer look the kernel RPM didn't have the debug package enabled by default.   I placed the following in my kernel-2.6.spec from Fedora Core 5 and was able to build the kernel debug symbol package.
&lt;p&gt;
&lt;pre&gt;
%define _enable_debug_packages 1
&lt;/pre&gt;
&lt;p&gt;
With that added to your kernel spec file you can use rpmbuild and rebuild it and it will emit the debug package.
&lt;p&gt;
&lt;h2&gt;
What can you do now?
&lt;/h2&gt;
&lt;p&gt;
Now you can start probing stuff.   The systemtap site has some fairly simple demos,  one is to print the top 20 most common syscalls every 5 seconds.
&lt;p&gt;
&lt;pre&gt;
#!/usr/bin/env stap
#
# This script continuously lists the top 20 systemcalls on the system
#

global syscalls

function print_top () {
        cnt=0
        log ("SYSCALL\t\t\t\tCOUNT")
        foreach ([name] in syscalls-) {
                printf("%-20s\t\t%5d\n",name, syscalls[name])
                if (cnt++ == 20)
                        break
        }
        printf("--------------------------------------\n")
        delete syscalls
}

probe kernel.function("sys_*") {
        syscalls[probefunc()]++
}

# print top syscalls every 5 seconds
probe timer.ms(5000) {
        print_top ()
}
&lt;/pre&gt;

&lt;p&gt;
This little probe will sleep for 5 seconds counting each system call as it happens and then dumps the list out and  starts counting again.  Here is a sample output:
&lt;p&gt;
&lt;pre&gt;
SYSCALL                         COUNT
sys_gettimeofday                  771
sys_futex                         637
sys_clock_gettime                 214
sys_rt_sigprocmask                141
sys_select                        136
sys_poll                          106
sys_newstat                        84
sys_ioctl                          62
sys_read                           48
sys_write                          47
sys_fcntl                          47
sys_rt_sigaction                   38
sys_time                           30
sys_getppid                        26
sys_setitimer                      16
sys_recvfrom                       13
sys_rt_sigreturn                   13
sys_nanosleep                      12
sys_close                          11
sys_open                           10
sys_newfstat                        6
&lt;/pre&gt;

&lt;p&gt;
Try altering the load on your system and seeing how that affects things.   That's kind of interesting but not terribly useful.  Try this version:
&lt;p&gt;
&lt;pre&gt;
#! /usr/bin/env stap
#
# This script continuously lists the top 20 systemcalls on the system
#

global syscalls

function print_top () {
        cnt=0
        log ("SYSCALL\t\t\t\tCOUNT")
        foreach ([name] in syscalls-) {
                printf("%-20s\t\t%5d\n",name, syscalls[name])
                if (cnt++ == 20)
                        break
        }
        printf("--------------------------------------\n")
        delete syscalls
}

probe kernel.function("sys_*") {
        if (target() == tid())
                syscalls[probefunc()]++
}

# print top syscalls every 5 seconds
probe timer.ms(5000) {
        print_top ()


}
&lt;/pre&gt;

&lt;p&gt;
I added a line to the probe which checks to see if the target is a value that system let's you pass in.    Now you can add a -x argument and the pid of a process and it will perform the same probes only against that particular process.  I have a postgresql database running and ps tells me that pid 16679 is the writer thread.
&lt;p&gt;
&lt;pre&gt;
...
postgres 16679  0.0  4.4 199916 91000 ?        S    Jul21   0:00 postgres: writer process
...
&lt;/pre&gt;
&lt;p&gt;

Let's look at it.
&lt;pre&gt;
SYSCALL                         COUNT
sys_getppid                        25
sys_time                           25
sys_select                         25
&lt;/pre&gt;
&lt;p&gt;
over and over and over, since the database isn't doing anything this makes sense, it's just waiting for something to tell it what to do.  You can attach strace to that process (strace -p 16679) and see that it is correct.  Let's distburb the database and make it do some I/O and see how it behaves.  I'll insert a record into a table.
&lt;p&gt;
&lt;pre&gt;
SYSCALL                         COUNT
sys_getppid                        25
sys_time                           25
sys_select                         25
sys_lseek                           4
sys_write                           4
&lt;/pre&gt;

&lt;p&gt;
That makes some sense,  one record insereted resulted in 4 seeks and 4 writes.  Maybe that's a write to an index, a couple writes to b+tree nodes and then an actual record being written, or maybe it is writing to a journal file.  I don't know, but it sounds kind of reasonable and I haven't looked at the postgresql code yet.  In fact we haven't looked at any source code yet or really compiled anything.   This is kind of a cheap example since strace and the ptrace API can provide all of this information (although there is considerable impact on system performance with them)  so here is something a bit harder to do.  
&lt;p&gt;
&lt;pre&gt;
#! /usr/bin/env stap
global comps
global compsizes
probe kernel.function("memcmp")
{
                comps++
                compsizes = compsizes + $count
}

probe timer.ms(5000)
{
        printf("memcmp call count      bytes compared\n")
                    printf("%d                    %d\n", comps, compsizes)
        comps = 0
        compsizes = 0
}
&lt;/pre&gt;
&lt;p&gt;
This probe monitors the kernel's memcmp function which compares two buffers of memory.  $count is an argument to that function which contains the number of bytes to compare.  

&lt;pre&gt;
memcmp call count      bytes compared
17151                    114214
memcmp call count      bytes compared
5455                    34990
memcmp call count      bytes compared
17165                    114269
memcmp call count      bytes compared
5398                    34817
&lt;/pre&gt;

&lt;p&gt;
As you can see memcmp is called a lot and on average to compare about 6 bytes worth of data.    I wonder what it sort of oscillates?  I'll have to look in to that.
&lt;p&gt;
Here is another simple one, __kmalloc is used to allocate memory in the kernel.
&lt;pre&gt;
#! /usr/bin/env stap
global allocs
global allocsizes


probe kernel.function("__kmalloc")
{
        allocs ++
        allocsizes = allocsizes + $size
}


probe timer.ms(5000)
{
        printf("malloc count      bytes malloced   \n")
        printf("%d                     %d              \n", allocs, allocsizes)
        allocs = 0
        allocsizes = 0
}

&lt;/pre&gt;

which produces something looking like this:
&lt;pre&gt;
malloc count      bytes malloced 
16                     30128              
malloc count      bytes malloced
36                     69312              
malloc count      bytes malloced    
76                     64469             
malloc count      bytes malloced    
28                     56651              
&lt;/pre&gt;
So every 5 seconds this particular system is doing about 40 mallocs and it's averaging to about 5K per malloc if my napkin math is correct. 
&lt;p&gt;
That's just a taste, next week I'll do a more interesting example.


</description>
      <pubDate>Wed, 30 Aug 2006 17:13:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:faa4a008-f42a-40e1-88d9-6827e25aceca</guid>
      <author>ian@ClearNetSec.com (Ian S. Nelson)</author>
      <link>http://blog.clearnetsec.com/articles/2006/08/30/intro-to-systemtap</link>
      <category>linux</category>
      <category>systemtap</category>
      <category>kernel</category>
      <category>profiling</category>
    </item>
  </channel>
</rss>
