i'm doing project on host based intrusion detection using ADFA-LD dataset ,now i'm doing feature extraction module. i constructed the phrase dictionary which consists of system call phrases of length 4. And now for feature extraction ,i need to compare the phrases with the new system call traces (following are some samples):
sys_clock_gettime sys_poll sys_poll sys_clock_gettime sys_poll sys_poll
sys_poll sys_clock_gettime sys_poll sys_clock_gettime sys_poll sys_poll sys_poll sys_poll sys_poll sys_poll sys_poll sys_poll sys_socketcall.......
What i need is, how can i compare these phrases with the new traces. i'm doing in java.
my phrase dictionary:
sys_socketcall-sys_poll-sys_clock_gettime-sys_poll
sys_clock_gettime-sys_poll-sys_poll-sys_socketcall
sys_poll-sys_socketcall-sys_poll-sys_clock_gettime
sys_poll-sys_clock_gettime-sys_clock_gettime-sys_clock_gettime
sys_clock_gettime-sys_clock_gettime-sys_socketcall-sys_clock_gettime
sys_socketcall-sys_clock_gettime-sys_poll-sys_poll
sys_poll-sys_poll
i'm using '-' as separator on comparing these phrases with the new traces, so i joined unique system calls with '-'.
解决方案
It seems like your desired words are divided by space. In that case just read your file line by line, and then get your words using String.split(" ").
Here is the one i might think of:
public class FileSplitter {
public static void main(String[] args) throws IOException {
File file = new File("input_file.txt");
LinkedList words = new LinkedList();
int i = 0;
Files.lines(file.toPath()).
forEachOrdered(line -> words.
addAll(Arrays.asList(line.split(" "))));
for(String word:words){
if(word.trim().length() > 0){
System.out.print(word.trim() + " ");
if(i++ >= 3){
System.out.println();
i = 0;
}
}
}
}
}
For your example it returns this:
sys_clock_gettime sys_poll sys_poll sys_clock_gettime
sys_poll sys_poll sys_poll sys_clock_gettime
sys_poll sys_clock_gettime sys_poll sys_poll
sys_poll sys_poll sys_poll sys_poll
sys_poll sys_poll sys_socketcall