It’s hard to build a smart SIEM
If it is good at doing A, it sucks at doing B. This is the banal trade-off of security point solutions.
WSJ recently featured SIEM in an article titled: Looking for Trouble
What is a good SIEM? I would say one that is an anomaly and misuse detection system, a sink for other like systems, and a sink for other observable facts (e.g. logs).
What does a good SIEM do? For most I believe the best answer is one that only taps on your shoulder when there is a real problem.
What does a SIEM need to do to be good? Tricky question. I would say one that understands which streams of incoming data are good for doing A (or identifying A), understands why certain streams are bad for making inferences (e.g. it's not good to automatically infer something is really important because an IDS is sending 1000 alerts per minute), and one that’s forged an algorithm mix that works.
This post is really about the answer to the last question above.
SIEMs rely on both sides of the detection coin:
- Misuse: good at detecting known attacks using signatures
- Anomaly: good at detecting unknown attacks by modeling behavior
The misuse side of the coin is clean and shiny; you can see a picture of the SNORT icon. SNORT is an example of a solid misuse detection system.
The anomaly side of the detection coin is dirty - it’s hard to see anything clearly. Why?
It’s because there is no single anomaly technique representing perfection. Stated in another way, if you fall into the hole of anomaly detection techniques you’ll never hit bottom, the hole as no bottom.
- Statistics
- Probability
- Machine Learning: there are literally hundreds (maybe thousands?) of papers applying machine learning techniques to computer and network security
Anomaly detection is compounded by the fact that algorithms are often combined in different ways to detect different types of anomalies. The gigantic streams of comingled and fragmented data (e.g. logs, xflows, IDS alerts, HIDS alerts) means huge numbers of permutations.
Circling back, to build a smart SIEM that excels at its job, it must employ and combine algorithms in way that it focuses on using the good at doing A information. This means you have to experiment with mixing algorithms and chain them together so the output only taps you on the shoulder when there is a problem.
1. A simple example is taking the deluge of alerts a snort instance emits and wrapping them up in a statistics model. The better value may come from recognizing numerical changes (min, max, median, mean, standard deviation, etc.).
2. A more complicated example may be applying NLP (natural language processing) techniques to analyze logs and extract user information, coalesce misuse detection alerts, associate statistical values derived from modeling xflows, then layering additional algorithms on top to correlate and present compelling evidence of strange behavior (i.e. a problem).
We’ve heard and recently noticed companies scaling back their investments in the research needed to advance solving the hard problems the developers of SIEM face. It may indicate the bigger players are planning to coast for a while on the past decade’s techniques. Fall behind though, and you’re out.
Given the advances in computing power (a big reason why AI and machine learning are so hot), it is also becoming acceleratingly more difficult to keep up, understand, and evaluate techniques beneficial to both the builders and consumers of SIEM systems.
If you find yourself evaluating SIEM products, dig in and investigate how each works - you don’t want yesterday’s product.
Trackbacks
Use the following link to trackback from your own site:
http://blog.clearnetsec.com/trackbacks?article_id=1561
