Research – iSE

My primary research interests are Data Analytics with focus on building large scale analytics platforms and frameworks, and Software Analytics. I have been conducting research and supervising students in analytics over the last several years that involves the capture, storage, sharing, and analysis of large datasets (e.g., agricultural data, genome data, hydrological and environmental data and software repositories) using computational tools, and machine/deep learning technologies. I along with my HQP and collaborators have been working on driving novel analytic techniques, such as computational architectures for analyzing data, high-throughput workflows for analysis and computation of large datasets, computational and statistical methods for analysis, data provenance, and distributed processing of heterogeneous data. As part of software analytics, my research aims to study cost/benefit and risk issues using controlled experiments, user studies, mining software repositories and empirical studies in interactive and collaborative software developments, in order to propose new theories, programming abstractions and concepts, techniques and tools to assist developers in the cost-effective and predictable design, implementation, reengineering, maintenance and evolution of scalable, sustainable and trustworthy software systems.

My research is multifaceted in the sense that a part of my research focusses on data analytics to better support multi-disciplinary scientists to handle complex workflows for data-intensive discovery. I aim to provide a user friendly, reliable, collaborative, and scalable computational environment to scientists for modeling, executing, tracking, debugging, and analyzing scientific experiments. This research is driven by my active involvement with two CFREFs projects (Food security and Water Security) over the last several years and my PhD research on collaborative software engineering. The another part of my research focuses on software analytics for developing predictive machine learning models and APIs, and tools that can help developers with different things such as bug inducing commit detection, explainable machine learning, automated release note generation, and human centric source code comprehension. This part of my research is driven by my active involvement with the NSERC CREATE grant on Software Analytics Research (SOAR) and the migration and reverse engineering project of Water Security CFREF over the last several years.

Current projects

VizSciFlow: A visually guided script-based framework for supporting composition of complex scientific workflows with minimal cognitive load and concisely but precisely. Project details.

Hossain MM, Roy B, Roy C, Schneider K. (2020). VizSciFlow: A Visually Guided Scripting Framework for Supporting Complex Scientific Data Analysis. Journal Proceedings of the ACM on Human-Computer Interaction (EICS 2020). 34 pages. Accepted (Journal).

SciWorCS: A cloud-based framework for supporting real-time collaboration in scientific workflow management system. Project details.

Mostaeen G, Roy B, Roy C, Schneider K. (2019). Designing for Real-Time Groupware Systems to Support Complex Scientific Data Analysis. Journal Proceedings of the ACM on Human-Computer Interaction. 3(EICS): 9:1–9:28. (Journal).

RISP: Recommending Intermediate States for Pipelines/Workflows. We are developing a data management scheme that will allow us handle intermediate states intelligently or optimally. This scheme will make sure whether intermediate states should be reused by a workflow or regenerated during the execution time. Project details.

Debasish Chakroborti, Manishankar Mondal, Banani Roy, Chanchal K. Roy, Kevin A. Schneider:
Optimized Storing of Workflow Outputs through Mining Association Rules. BigData 2018: 508-515.

Software Reference Architecture and Platform for Hydrological Modelling and software co-evolution: Core computing team of GWF project had been working on migrating CRHM Borland to a modern platform. CRHM is a state-of the-art legacy software tool in North America for generating hydrological cycles which differ both naturally and operationally depending on geographical locations, environmental variabilities or parameters throughout the world. Canada’s hydrology produces one of the most complex cycles known to us. CHRM is designed keeping this variability in mind. Migration became essential for the system as the Borland C++ compiler is getting outdated. We have been working on producing the next generation CRHM. We have adopted different migration strategies, including separate code between Core and GUI components, develop a console version of CRHM 2018 that runs in a standard C++ environment, and create APIs to access CRHM core data structures, remove Borland dependencies and minimize MFC dependency. We have partially migrated CRHM where different functionalities are working (such as opening, and viewing projects, constructing a new project and macro functionality for combining and changing existing projects). We are also working on developing an automated testing framework to make sure CRHM is working as expected which includes unit testing, system testing and user acceptance testing. Project details.

Consistency Handling in Collaborative Scientiﬁc Workﬂow: One of the main challenges of scientific collaborative system is consistency management – in the face of conﬂicting concurrent operations by the collaborators. The existing research works use locking techniques where a collaborator gets exclusive Write access to a part of the workﬂow to facilitate the consistency management . we o work on efficient locking algorithms that can reduce average waiting time of the collaborators and thus can improve the usability of a collaborative scientific workflow management system.
ProvMod-Viz: Workflow provenance is important for workflow behavior analysis, data quality measurement, usage pattern mining, fault detection, monitoring, providing user recommendations, resource management and so on. Data intensive workflow systems are never complete without provenance support. We have been developing a workflow programming model that is based on the Python Programming Language, extendable to a broad range of use cases, adaptable to third party tools and offers automated provenance, easy configuration and provenance querying via data visualizations.
Cross Language Workflow tools recommendation: As workflow management systems include software tools across various programming, languages, we are working on developing a tool that can detect similar software applications written in various programming languages.
Meta data handling: We have been working on creating a dictionary based website for describing P2IRC-metadata.