Research Work

My current research interest revolves around NLP and how it can be used to tackle tasks like information extraction/retrieval. Active Learning is an area that I have explored a bit and have played around with. I have also previously worked on classification and information extraction from audio data. I am open to collaborating on any interesting research projects.

In this paper, we perform an extensive and systematic comparison of different log parsing techniques and systems based on machine learning approaches. These include baseline learning solutions such as Perceptron, Stochastic Gradient Descent, Multinomial Naive Bayes, a graphical model: Conditional Random Fields, a pre-trained sequence-to-sequence model: NERLogParser, and a pre-trained language model: BERT. Moreover, we experiment with the Transformer Neural Network, modelling the Named Entity Recognition task as a sequence-to-sequence generation task, an approach not previously tested in this domain. An extensive set of experiments is carried out in in-scope and out-of-scope datasets aiming at estimating the performance in log files from known and unknown log sources. We use multiple evaluation schemes in order to: (i) compare the different systems; and (ii) understand the quality of the information extracted, providing deeper insights on the advantages and disadvantages of the different systems.

In this paper, we propose a Heterogeneous Ensemble and Active Learning (HEAL) system, a novel tool that incorporates the implementation of a dynamic heterogeneous ensemble model with active learning capabilities. This ensures a solution that: i) adapts to changes in data through time ii) remains robust providing good performance iii) handles a continuous flow of data iv) requires less human intervention when compared against pure active learning solutions. HEAL system uses multiple individual base models to build a heterogeneous ensemble learner that adapts to the specific data characteristics. Then, active learning is applied to the ensemble so that it is retrained and re-evaluated with respect to time and new instances. Instances, where the model has low confidence, are labelled by a domain expert and a new model is retrained with these instances and its performance is evaluated. The deployed model is replaced when the new model exhibits performance advantages.

Music is a way to express our creativity. As an art form, music can go beyond the limits of human imagination. When one hears a piece of music or sounds, the human brain releases the chemical dopamine. Hearing sounds again and again repetitively allows us to remember the characteristics and nature of sound in a very efficient way. It allows, for example, immediate recognition of sounds or voices which become familiar through experience. The exact same principle can be implemented using Machine Learning. In this paper, a Drum Instrument Classification Model is implemented using Machine Learning. The data is self-prepared by recording samples and by using a Drum Simulator. The initial dataset contains only audio files in .wav format. The pivotal task is to perform Feature Extraction from the audio files and use them to train the Machine Learning model. Finally, a model is created which is capable of classifying various drum instruments when provided with an audio input.

Home

Research Work