Intro to systemtap
What is systemtap?
Systemtap is a tool built upon the kprobes framework for probing, debugging and analyzing the linux kernel at runtime. IBM contributed kprobes to linux and had it included during the 2.5 development tree in 2002. It's a supported and standard portion of the Linux kernel. Systemtap builds upon kprobes and creates a fairly easy to use tool that let's you probe just about anything and everything in the linux kernel with fairly simple commands, you can almost effortlessly modify any variable you wish on a running kernel. All with minimal performance impact. Essentially it's the linux version of dtrace the differences being that systemtap is probably more flexible in kernel space and dtrace currently does a lot more for probing userspace applications.
So how do I get this working?
It's fairly easy and straight forward to install. There are RPMs for Fedora, Redhat Enterprise and Suse based linux distributions. There is a slight trick, at least with Fedora Core 5, you need to have the kernel debug RPM also installed. Basically, it's just a set of object code files with debug symbols. I couldn't find it in the latest Fedora Core 5 and upon closer look the kernel RPM didn't have the debug package enabled by default. I placed the following in my kernel-2.6.spec from Fedora Core 5 and was able to build the kernel debug symbol package.
%define _enable_debug_packages 1
With that added to your kernel spec file you can use rpmbuild and rebuild it and it will emit the debug package.
What can you do now?
Now you can start probing stuff. The systemtap site has some fairly simple demos, one is to print the top 20 most common syscalls every 5 seconds.
#!/usr/bin/env stap
#
# This script continuously lists the top 20 systemcalls on the system
#
global syscalls
function print_top () {
cnt=0
log ("SYSCALL\t\t\t\tCOUNT")
foreach ([name] in syscalls-) {
printf("%-20s\t\t%5d\n",name, syscalls[name])
if (cnt++ == 20)
break
}
printf("--------------------------------------\n")
delete syscalls
}
probe kernel.function("sys_*") {
syscalls[probefunc()]++
}
# print top syscalls every 5 seconds
probe timer.ms(5000) {
print_top ()
}
This little probe will sleep for 5 seconds counting each system call as it happens and then dumps the list out and starts counting again. Here is a sample output:
SYSCALL COUNT sys_gettimeofday 771 sys_futex 637 sys_clock_gettime 214 sys_rt_sigprocmask 141 sys_select 136 sys_poll 106 sys_newstat 84 sys_ioctl 62 sys_read 48 sys_write 47 sys_fcntl 47 sys_rt_sigaction 38 sys_time 30 sys_getppid 26 sys_setitimer 16 sys_recvfrom 13 sys_rt_sigreturn 13 sys_nanosleep 12 sys_close 11 sys_open 10 sys_newfstat 6
Try altering the load on your system and seeing how that affects things. That's kind of interesting but not terribly useful. Try this version:
#! /usr/bin/env stap
#
# This script continuously lists the top 20 systemcalls on the system
#
global syscalls
function print_top () {
cnt=0
log ("SYSCALL\t\t\t\tCOUNT")
foreach ([name] in syscalls-) {
printf("%-20s\t\t%5d\n",name, syscalls[name])
if (cnt++ == 20)
break
}
printf("--------------------------------------\n")
delete syscalls
}
probe kernel.function("sys_*") {
if (target() == tid())
syscalls[probefunc()]++
}
# print top syscalls every 5 seconds
probe timer.ms(5000) {
print_top ()
}
I added a line to the probe which checks to see if the target is a value that system let's you pass in. Now you can add a -x argument and the pid of a process and it will perform the same probes only against that particular process. I have a postgresql database running and ps tells me that pid 16679 is the writer thread.
... postgres 16679 0.0 4.4 199916 91000 ? S Jul21 0:00 postgres: writer process ...
Let's look at it.
SYSCALL COUNT sys_getppid 25 sys_time 25 sys_select 25
over and over and over, since the database isn't doing anything this makes sense, it's just waiting for something to tell it what to do. You can attach strace to that process (strace -p 16679) and see that it is correct. Let's distburb the database and make it do some I/O and see how it behaves. I'll insert a record into a table.
SYSCALL COUNT sys_getppid 25 sys_time 25 sys_select 25 sys_lseek 4 sys_write 4
That makes some sense, one record insereted resulted in 4 seeks and 4 writes. Maybe that's a write to an index, a couple writes to b+tree nodes and then an actual record being written, or maybe it is writing to a journal file. I don't know, but it sounds kind of reasonable and I haven't looked at the postgresql code yet. In fact we haven't looked at any source code yet or really compiled anything. This is kind of a cheap example since strace and the ptrace API can provide all of this information (although there is considerable impact on system performance with them) so here is something a bit harder to do.
#! /usr/bin/env stap
global comps
global compsizes
probe kernel.function("memcmp")
{
comps++
compsizes = compsizes + $count
}
probe timer.ms(5000)
{
printf("memcmp call count bytes compared\n")
printf("%d %d\n", comps, compsizes)
comps = 0
compsizes = 0
}
This probe monitors the kernel's memcmp function which compares two buffers of memory. $count is an argument to that function which contains the number of bytes to compare.
memcmp call count bytes compared 17151 114214 memcmp call count bytes compared 5455 34990 memcmp call count bytes compared 17165 114269 memcmp call count bytes compared 5398 34817
As you can see memcmp is called a lot and on average to compare about 6 bytes worth of data. I wonder what it sort of oscillates? I'll have to look in to that.
Here is another simple one, __kmalloc is used to allocate memory in the kernel.
#! /usr/bin/env stap
global allocs
global allocsizes
probe kernel.function("__kmalloc")
{
allocs ++
allocsizes = allocsizes + $size
}
probe timer.ms(5000)
{
printf("malloc count bytes malloced \n")
printf("%d %d \n", allocs, allocsizes)
allocs = 0
allocsizes = 0
}
which produces something looking like this:
malloc count bytes malloced 16 30128 malloc count bytes malloced 36 69312 malloc count bytes malloced 76 64469 malloc count bytes malloced 28 56651So every 5 seconds this particular system is doing about 40 mallocs and it's averaging to about 5K per malloc if my napkin math is correct.
That's just a taste, next week I'll do a more interesting example.
Déjà vu with big time port scanning
We have a potential pen. test project coming up with crazy numbers: 260,000+ range of publicly addressable IPs (spread over several non-contiguous blocks). Our previous experience with scanning large blocks aggressively via the internet has made us, well, super sensitive. It’s freakin’ hard.
260,000 x 65535 TCP ports x 2 (number of port query attempts) = 34,078,200,000 TCP SYN packets (nmap, default).
That is ~4x larger compared to our last "large" effort (~70,000 IPs).
No info yet on the bandwidth available per block, latency conditions, or other factors, but let’s run some numbers to see how this would work.
From recall, the Sun v40z’s were averaging ~3500pps (packets per second) outbound per server. I didn’t use packets per second to estimate previous effort, I created a quick excel chart similar to below to get a ballpark number (updated using 260,000 for the total number of IP addresses to scan):
| approx. number of IP addresses to scan | 260,000 |
|
| number of TCP ports to query per IP address | 65,535 |
|
total # of TCP ports to query |
17,039,100,000 |
|
| time-out setting per port query | 1.25 |
seconds (allow 1.25 seconds for query response) |
| number of query attempts per port | 2 |
which implies we need 34,078,200,000 SYN requests |
total seconds per port query |
2.50 |
seconds |
| number of parallel port queries | 2 |
(we set it to 100, but empirical evidence shows this to be ~2) |
| total # of hours to complete single IP port scan | 22.76 |
|
| total number of seconds to perform scan (if sequential) | 21,298,875,000 |
|
| total number of hours to perform scan (if sequential) | 5,916,354 |
|
| number of scans in parallel | 1,600 |
(2 servers, each running 8 unique nmap processes with min_hostgroup set to 100) |
| number of hours to complete scan | 3,698 |
|
| number of days to complete scan | 154 |
Now I realize pps is an important factor. Remember these are still pretty fast servers: quad dual core opterons with 16 GB RAM. Doing a sanity check with our previous results of obtaining ~3500 pps per server goes as follows:
| total # of SYN requests | 34,078,200,000 |
|
| average outbound packets per second / per server | 3500 |
|
| number of dedicated servers | 2 |
|
| total number of seconds to complete scan |
|
|
| number of hours to complete scan |
|
|
| number of days to complete scan | 56 |
So the numbers don't match. The first chart says it'll take ~154 days, the second says ~56 days. In the first chart I set "number of scans in parallel" to 1600. I got that by having 8 unique nmap processes each running with min_hostgroup of 100 (800 hosts being scanned simultaneously per server; we have two dedicated scanners, so the total is 1600 hosts). The number that is probably way off is the "number of parallel port queries" in the first chart. Although we had set the value to 100 (meaning scan 100 ports simultaneously on each host) it often seemed like we were only getting 2 or so in parallel (observed from watching tcpdump output). That was probably after running into the memory issues I reported with nmap (which Fyodor subsequently reported to fix).
I just ran a few quick checks on a Sun Fire x2100 we have (dual core opteron 175 / 4 GB of RAM) using a newer nmap version (4.11). Firing off a bunch of nmap scans on one server with parameters similar to below resulted in a sustained 9000+ pps (~4.5 Mbits/s).
/usr/local/bin/nmap -vv -sS -P0 -p 1-65535 -n --min_hostgroup 100 --max_rtt_timeout 1250 --min_parallelism 100 <a_/24_block>
I didn't let the test run for very long to see if issues arise like before. If we could sustain 9000 pps per server and be allowed to push ~9Mbits/s, then the overall time is greatly reduced. It drops to ~22 days. I really doubt we'll be able to or allowed to push 9Mbits/s, but at least we now have ballpark figures to play with.
Who else has had this kind of crazy port scanning fun?
Related entry: maximizing nmap scans for accuracy
The VA and Bureaucracy Part 3
In part 2 of my VA auditing experience I told you all about our "training" for the VA assessment. I am going to finish this out with my thoughts on the first site experience. If you missed it here is part 1. With all the things that had gone on with this project I was very interested in how the actual audit was going to go for each site. Before I could think long on it I was off to the wonderful state of Maine in February.
Now I live in Colorado and most people's preconception of Colorado in the winter is exactly what Maine was... Cold, snowy, and dark. For those of you that don't know, Denver Colorado has a very mild winter and snow barely stays a week on the ground. In the mountains is a different story but Denver is on the plains not the mountains.
So back in Virginia we were told that we needed to car pool with the other auditors and that each auditor was responsible for ensuring the whole team got to the site. This was interesting to say the least as the audit teams were thrown together maybe 2 days before we actually flew out. Each trip I went on had a team with different people. This fact was great for meeting new people but horrible for car pooling as the one person who had the car was expected to ferry us around! Now the issue that greeted me first was that I got to Portland, Maine at about 11:00 PM EST and had to get to Augusta which is about 1 1/2 hours away. Trying to get ahold of the guy with the car did not happen as it went to VM suprisingly enough. Suffice to say I had to take a taxi to Augusta which costs about 170 dollars, footed by the tax payers of course. For people that don't know Maine, Portland is in the south and Augusta, the capital, is in the lower center of the state so a taxi ride was costly.
The second issue was that none of the audit staff could get ahold of each other. In fact I didn't even get to the facility till later on Monday cause we all were staying at different hotels. Hotels, flights, and rental cars were chosen by the coordinators not the auditors so this was not negotiable. Anyhow we were scheduled to be at the facility for 4 days and leaving the 5th day so I was already thinking of how much fun I was going to have.
Onto Monday we go! After I get to the facility with my chauffeur. I finally find out how many computers we are testing. Lets see the audit team had 3 "windows testers" including me so that means we can get pretty good coverage in 4 days right? Well we had to test a grand total of 26 computers and all the mobile nursing stations for a grand total of 30. Remember the checklist, the one that takes about 20 minutes per computer max? 30 / 3 = 10 computers over 4 days. So doing some more math we can estimate about a 4 hour work day including lunch. Now this facility was pretty big. So big that I would have easily gotten lost without my VA companion. Off I went to verify the VA is secure with my clipboard! Suffice to say that my VA companion was pleased to only waste 4 hours running MBSA and Dumpsec.
At this point I am sure a few of you are thinking that it was easier for me to test this minuscule amount of computers and then just chill till it is time to leave but it wasn't. We were not allowed to have cell phones on in the building because of possible interference with medical equipment, we were not allowed to go onto the VA network with our laptops, which makes sense, and we were in the middle of nowhere. Luckily we got to go home on the 3rd day meaning that we had only spent 4 days total in snowy Maine.
A few thoughts on my whole VA auditing experience. First, I did actually like meeting the other auditors and the technical VA personnel. They were great and made the whole project actually move forward. I also got to go to places I would never have gone to if not on business. What a waste of money the whole endeavor was. As Bruce Schneier likes to always say, this definitely had the perception of being a proactive security measure but that is all it was, a perception. I think that there were some serious loopholes somewhere that allows this sort of thing to go on. Like I said earlier, if this kind of project happened elsewhere everyone would be fired, unless of course they are interested in the perception. We ended up doing 10 facilities before we just could not take it anymore. We were not alone in that feeling as I think every team I was on had people that were new who had replaced someone that went to the "training".
Books on Reversing
I hope all of you moved over to our new blog server without issue. The other blog software was causing us some issues so we decided to move to another setup. This one runs on Typo so it is more suited for us. Both Tate and I know Ruby somewhat so we should be able to keep this up and running.
I have been on a study path to try and reinforce my class I took at Blackhat on Reverse Engineering. Here is the list of books I have been using:
| Book Title | Author | Book Cover |
|---|---|---|
| "Reversing Secrets of Reverse Engineering" | Eldad Eilam | ![]() |
| "Exploiting Software" | Greg Hoglund and Gary McGraw | ![]() |
| "Hacker Disassembling Uncovered" | Kris Kaspersky | ![]() |
| "Microsoft Windows Internals 4th Edition" | Mark Russinovich and David Solomon | ![]() |
| "The Art of Assembly Language" | Randall Hyde | ![]() |
| "Write Great Code Volume 1" | Randall Hyde | ![]() |
| "Write Great Code Volume 2" | Randall Hyde | ![]() |
If you are interested in disassembly or reversing then I highly recommend these books. The main book I am using is the "Reversing, Secrets of Reverse Engineering" and then I am following up with the other books as needed. The one book that might be disheartening is "The Art of Assembly Language". This book first teaches you a special language called High Level Assembly (HLA) and then slowly drops you down to low level assembly for the X86 thereby making you learn two languages. This is why it is so big..... I believe the reason is that it is hard to actually do something in assembly without knowing most of assembly so the author uses HLA to bridge the gap. I thought it worked out fine but I wish I had known that I had to learn HLA first then Assembly. By the time I realized this I was too far to stop.
Does security need to be designed in from the start?
The logical follow-up is how do you iterate security in to a product after the fact, if that's a valid way to do it? Any thoughts or experiences?
Sorry for duplicates / Built new server stack
Sorry for the duplicate entries in your RSS reader. We migrated Sunday to a new server and built a new stack: Apache -> mod_proxy_balancer, mod_rewrite -> Mongrel, Ruby, Rails, Typo -> PostgreSQL. We used mod_rewrite to hopefully keep the RSS links and others working (old permalinks end up dropping you on the latest entry). We haven't added all the comments back yet.
If you want to update your settings, our new RSS link is:
http://blog.clearnetsec.com/xml/rss20/feed.xml
Cory's Blackhat Training Day 1

At Black Hat I took the "Reverse Engineering on Windows: Application in Malicious Code Analysis" course. The class was about reverse engineering malicious executable programs on the Windows platform just like the anti virus guys do for big companies like Symantec and iDefense. This was a very fun class for me as my background is not in reverse engineering or it's associated technologies like:
I have taken many courses before for security and network related subjects but never from Black Hat. This was even my first Black Hat Briefings so I was very excited to see how it would turn out. Many times before I was disappointed with the classes I took as they tended to have a very interesting syllabus but then the classes ended up shallow on the technical depth that I thought I would get. This very thought process had me skeptical on the Black Hat course as well but man was I wrong! I think it took about 7 minutes into the class before we were launching IDA Pro and digging into the configuration of the tools we would use for the next 2 days. We didn't even introduce ourselves leaving us to guess who was even teaching us.
My teachers, Pedram Amini and Ero Carrera, were some very bright and intelligent guys that had been in the malware/virus reverse engineering circles for years. They were so adept at reversing things that they have written many tools to help with reversing such as:
As you can see they were both Python fans. Too bad Ruby rocks Python so much that Python needs to go bash the Perl guys to feel better.
So by the first half day I was already disassembling the Mydoom.A virus looking for what it was made of, what it did, how it worked etc... Now this was not a "live" analysis but rather it was loading the binary executable up in a disassembler/debugger called IDA Pro and dissecting the binary code looking for things of interest. Basically we learned two main methods of doing analysis, "top down" and "bottom-up".
Top down was where you start at the program's main function and start labeling all the functions that are called to see what they do. For example you eyeball a function and decide it is gathering the time from the system so you label the function "Gets the time" and so on. Once you do this for all defined functions you can then concentrate on the ones that perform actions of interest like opening sockets or creating processes.
The bottom up approach was where we look for interesting code snippets, items in the Import Address Table (IAT) and strings table. For example if you find a call to "htons" and above it you see the number 80 (in hex of course) being placed into a register, you can deduce that it is making a call out to port 80 on the network.
Yes I know this sounds hard and it was..... But anybody could possibly learn this skill with practice. I will try to write up some good snippets from my class if anybody is interested. Here are some interesting sites to peruse if you are intersted in reversing:
- OpenRCE - A site about reverse engineering software
- Rootkit.com - A site about malware things
- The Reversers Vault - A site with tutorials on reversing things
security tools to checkout (new to us; from blackhat/defcon)
I'm going through my notes from the conferences and here is my beginning list of tools to checkout (some new):
- MatriXay: web app and db pen-test tool (in beta, not free, not sure of cost, but it looked interesting)
- PDB: protocol debugger (free)
- tattoo: traffic analysis toolkit (free)
- Suru: mitm proxy and web app fuzzer (not free, but cheap - $200)
- Crowbar: web brute force tool (free)
- SP_LR (can't find a link yet): proxy framework targeted for malware analysis and fuzzing
The Exploit Laboratory class at BlackHat Training was great.
If you want to bump up your exploit writing skills – Saumil Udayan Shah is an excellent teacher. His style of teaching brought out memories of my time as an ECE student at CU, Boulder. He presented very clearly, kept the pace moving, and quipped often. Great class.
The majority of time is spent on using GDB and WinDBG to inspect Intel 32-bit x86 CPU registers for opportunities. The end game was always accompanied by netcat and metasploit (along with a decent amount of scripting to facilitate quick retries when trying to line up all the exploit code to ensure success).
Here is the full class description: http://www.blackhat.com/html/bh-usa-06/train-bh-us-06-ss-el.html
BlackHat/Defcon quickies
I don’t want to repeat what everyone else is writing about regarding attending BlackHat and Defcon, but several were freakin’ cool:
- Joanna Rutkowka’s Blue Pill stuff. Totally own x64 Vista on AMD (Pacifica) using the new AMD processors virtual machine technology. Undetectable. “Writing signatures to detect things is rookie” -- an awesome quote by Joanna.
- johnny cache and David Maynor’s layer 2 exploit. Get remote shell root access to a Mac, Windows, or whatever if the wireless card is simply ON (no need to associate or anything). Damn I would love to have this exploit on hand.
- HD Moore’s talks:
- Thermoptic Camoflauge: IDS and IPSes suck for lots of reasons. Signature based IDS and IPS systems really suck. Joanna’s quote from above kind of says it all, “rookie”. With the new metasploit, you’ll be able to evade anything and everything on the market.
- Six Degree of XSSploitation: Cross site scripting is freakin’ dangerous. Douse with lots of browser vulns, and well, it’s getting ridiculous to have fun on the Internet. Nothing is safe, so unplug.
- Metasploit Reloaded. The metasploit story is just getting better – it is the best framework to build exploits. The 3.0 version is being completely rewritten in Ruby so that is good for us.
- Jeremiah Grossman’s Hacking Intranet Websites from the Outside. I haven’t seen this before – using JavaScript to serendipitously enumerate internal IP addresses, perform port scans, retrieve portions of the user’s browser history via checking CSS values, and even login and modify the DMZ rules in home DSL routers to allow external connections to a particular ‘live’ internal device. All done without exploiting anything – just using plain valid JavaScript.







