My primary research interests are Data Analytics with focus on building large scale analytics platforms and frameworks, and Software Analytics. I have been conducting research and supervising students in analytics over the last several years that involves the capture, storage, sharing, and analysis of large datasets (e.g., agricultural data, genome data, hydrological and environmental data and software repositories) using computational tools, and machine/deep learning technologies. I along with my HQP and collaborators have been working on driving novel analytic techniques, such as computational architectures for analyzing data, high-throughput workflows for analysis and computation of large datasets, computational and statistical methods for analysis, data provenance, and distributed processing of heterogeneous data. As part of software analytics, my research aims to study cost/benefit and risk issues using controlled experiments, user studies, mining software repositories and empirical studies in interactive and collaborative software developments, in order to propose new theories, programming abstractions and concepts, techniques and tools to assist developers in the cost-effective and predictable design, implementation, reengineering, maintenance and evolution of scalable, sustainable and trustworthy software systems.
My research is multifaceted in the sense that a part of my research focusses on data analytics to better support multi-disciplinary scientists to handle complex workflows for data-intensive discovery. I aim to provide a user friendly, reliable, collaborative, and scalable computational environment to scientists for modeling, executing, tracking, debugging, and analyzing scientific experiments. This research is driven by my active involvement with two CFREFs projects (Food security and Water Security) over the last several years and my PhD research on collaborative software engineering. The another part of my research focuses on software analytics for developing predictive machine learning models and APIs, and tools that can help developers with different things such as bug inducing commit detection, explainable machine learning, automated release note generation, and human centric source code comprehension. This part of my research is driven by my active involvement with the NSERC CREATE grant on Software Analytics Research (SOAR) and the migration and reverse engineering project of Water Security CFREF over the last several years.
Current projects
- Consistency Handling in Collaborative Scientific Workflow: One of the main challenges of scientific collaborative system is consistency management – in the face of conflicting concurrent operations by the collaborators. The existing research works use locking techniques where a collaborator gets exclusive Write access to a part of the workflow to facilitate the consistency management . we o work on efficient locking algorithms that can reduce average waiting time of the collaborators and thus can improve the usability of a collaborative scientific workflow management system.
- ProvMod-Viz: Workflow provenance is important for workflow behavior analysis, data quality measurement, usage pattern mining, fault detection, monitoring, providing user recommendations, resource management and so on. Data intensive workflow systems are never complete without provenance support. We have been developing a workflow programming model that is based on the Python Programming Language, extendable to a broad range of use cases, adaptable to third party tools and offers automated provenance, easy configuration and provenance querying via data visualizations.
- Cross Language Workflow tools recommendation: As workflow management systems include software tools across various programming, languages, we are working on developing a tool that can detect similar software applications written in various programming languages.
- Meta data handling: We have been working on creating a dictionary based website for describing P2IRC-metadata.