Abstract:
In this thesis, we address the problem of analysing IoT data with a focus on
anomaly detection in data streams and behaviour analysis. Unsupervised learning
is highly preferred for real-life applications, especially in anomaly detection since
there is a lot of data without labels in this scenario. We propose an Enhanced
Locally Selective Combination in Parallel outlier ensembles (ELSCP) technique.
We define an unsupervised data-driven methodology and apply it in three case
studies; detection of crop damage in crop dataset, application to GPS logs of
combine harvesters and application to Cooperative Intelligent Transport System
(C-ITS) messages. The focus is the identification of anomalies that can be linked to
crop state/health during harvest, those that have an impact on harvest efficiency
and those impacting road safety and efficiency. Based on our results, it is possible
to link anomalies extracted to damaged crop state at the end of harvest. Also,
we were able to detect deviant behaviour of combine-harvester and to identify
anomalies on the roads. Therefore, anomaly detection could be integrated in the
decision process of farm and road operators to improve harvesting efficiency, crop
health, road safety and traffic flow.
Secondly, we considered the analysis of speed signatures generated from CITS messages with the aim of understanding driving behaviour evolution under
a naturalistic driving environment. We have shown that with the application of
segmentation and aggregate statistics, one is able to get a better understanding of
general driving behaviour and infer information that relates to the road condition
and traffic situation. Finally, we considered the trajectory-linking problem and
applied it to C-ITS messages. Based on our analysis, it is possible to link trajectories to the generating users if other distinguishing attributes and background
knowledge on generation of the messages are considered during similarity analysis