Center for Computer Security Research, Mississippi State University Attacks on High Performance Linux Clusters
Introduction
Publications
Attacks
Related Data Sets
Contact Information
Links

Integrating Intelligent Anomaly Detection Agents into Distributed Monitoring Systems

Anomaly Detection With Function Call Logs

The following applications were selected to test the anomaly detectors with Ganglia:

  • FFT - An implementation of the Fast Fourier Transform
  • IS - An NPB benchmark based on a bucket sort
  • LU - An implementation of the LU Factorization method for solving systems of linear equations
  • LL - LLCbench from the MP-Bench benchmarks suite
  • NetPIPE - The Network Protocol Independent Performance Evaluator

The data sets contained within this page have been executed on an 8-node Linux cluster called microcosm that is used at the Center for Computer Security Research. The names microcosm0 through microcosm7 are in reference to the individual nodes of the cluster.

The data on this page records the series of library functions called by the individual programs on each execution as an integer for the sake of the model. The correlation between integers and function names (and other information about each function) can be found in the following file:

This file is pipe-delimited with each line representing a library function. First is the library's ID, followed by return type, and most importantly, the function name followed by the function number as used in the data files on this page. The remainder of the function prototype follows, with names for each argument, and a "key" argument that is recorded for each call in some of the more complex data sets. For more detail, see the Functions.gfl file's comment block.

The data files on this page consist of one integer per line referring to the functions above. For example, if you see:

33

You can look up this number in Functions.gfl to find more information about the function:

LIBMPI|int|MPI_Comm_size|33|MPI_Comm comm,int *size|comm,size|-1|

Training Data

Library function number per line format:

For the training data, a more complete description of each function call is also available. An example:

30562,11,3,1078791711659440,129

In this example, 30562 is the process ID that called the library function. 11 is the function number, 3 is the "key" argument to the function (these arguments are defined in the Functions.gfl file for each function), 1078791711659440 is the time the function was called in Unix time (seconds-since-epoch) format, and 129 is the time elapsed since the last call.

The data above that simply lists the function number was extracted from this data using the StripCalls.sh script:

Training data in this format is available here:

Test Data

Normal

Library function number per line format:

Abnormal

Library function number per line format:


Questions and comments about this web site may be directed to the webmaster at rwm8@cse.msstate.edu