Jorge Guerra Torres, Carlos Catania, Veas Eduardo Enrique
2019
Modern Network Intrusion Detection systems depend on models trained with up-to-date labeled data. Yet, the process of labeling a network traffic dataset is specially expensive, since expert knowledge is required to perform the annotations. Visual analytics applications exist that claim to considerably reduce the labeling effort, but the expert still needs to ponder several factors before issuing a label. And, most often the effect of bad labels (noise) in the final model is not evaluated. The present article introduces a novel active learning strategy that learns to predict labels in (pseudo) real-time as the user performs the annotation. The system called RiskID, presents several innovations: i) a set of statistical methods summarize the information, which is illustrated in a visual analytics application, ii) that interfaces with the active learning strategy forbuilding a random forest model as the user issues annotations; iii) the (pseudo) real-time predictions of the model are fed back visually to scaffold the traffic annotation task. Finally, iv) an evaluation framework is introduced that represents a complete methodology for evaluating active learning solutions, including resilience against noise.
Jorge Guerra Torres, Veas Eduardo Enrique, Carlos Catania
2019
Labeling a real network dataset is specially expensive in computer security, as an expert has to ponder several factors before assigning each label. This paper describes an interactive intelligent system to support the task of identifying hostile behavior in network logs. The RiskID application uses visualizations to graphically encode features of network connections and promote visual comparison. In the background, two algorithms are used to actively organize connections and predict potential labels: a recommendation algorithm and a semi-supervised learning strategy. These algorithms together with interactive adaptions to the user interface constitute a behavior recommendation. A study is carried out to analyze how the algo-rithms for recommendation and prediction influence the workflow of labeling a dataset. The results of a study with 16 participants indicate that the behaviour recommendation significantly improves the quality of labels. Analyzing interaction patterns, we identify a more intuitive workflow used when behaviour recommendation isavailable.