Research – Banani Roy, Director, iSE lab

My primary research interests are Data Analytics with focus on building large scale analytics platforms and frameworks, and Software Analytics. I have been conducting research and supervising students in analytics over the last several years that involves the capture, storage, sharing, and analysis of large datasets (e.g., agricultural data, genome data, hydrological and environmental data and software repositories) using computational tools, and machine/deep learning technologies. I along with my HQP and collaborators have been working on driving novel analytic techniques, such as computational architectures for analyzing data, high-throughput workflows for analysis and computation of large datasets, computational and statistical methods for analysis, data provenance, and distributed processing of heterogeneous data. As part of software analytics, my research aims to study cost/benefit and risk issues using controlled experiments, user studies, mining software repositories and empirical studies in interactive and collaborative software developments, in order to propose new theories, programming abstractions and concepts, techniques and tools to assist developers in the cost-effective and predictable design, implementation, reengineering, maintenance and evolution of scalable, sustainable and trustworthy software systems.

My research is multifaceted in the sense that a part of my research focusses on data analytics to better support multi-disciplinary scientists to handle complex workflows for data-intensive discovery. I aim to provide a user friendly, reliable, collaborative, and scalable computational environment to scientists for modeling, executing, tracking, debugging, and analyzing scientific experiments. This research is driven by my active involvement with two CFREFs projects (Food security and Water Security) over the last several years and my PhD research on collaborative software engineering. The another part of my research focuses on software analytics for developing predictive machine learning models and APIs, and tools that can help developers with different things such as bug inducing commit detection, explainable machine learning, automated release note generation, and human centric source code comprehension. This part of my research is driven by my active involvement with the NSERC CREATE grant on Software Analytics Research (SOAR) and the migration and reverse engineering project of Water Security CFREF over the last several years.

VizSciFlow: Scientific workflow management systems such as Galaxy, Taverna and Workspace, have been developed to automate scientific workflow management and are increasingly being used to accelerate the specification, execution, visualization, and monitoring of data-intensive tasks. For example, the popular bioinformatics platform Galaxy is installed on over 168 servers around the world and the social networking space myExperiment shares almost 4,000 Galaxy scientific workflows among its 10,665 members. Most of these systems offer graphical interfaces for composing workflows. However, while graphical languages are considered easier to use, graphical workflow models are more difficult to comprehend and maintain as they become larger and more complex. Text-based languages are considered harder to use but have the potential to provide a clean and concise expression of workflow even for large and complex workflows. A recent study showed that some scientists prefer script/text-based environments to perform complex scientific analysis with workflows. Unfortunately, such environments are unable to meet the needs of scientists who prefer graphical workflows. In order to address the needs of both types of scientists and at the same time to have script-based workflow models because of their underlying benefits, we propose a visually guided workflow modeling framework that combines interactive graphical user interface elements in an integrated development environment with the power of a domain-specific language to compose independently developed and loosely coupled services into workflows. Our domain-specific language provides scientists with a clean, concise, and abstract view of workflow to better support workflow modeling. As a proof of concept, we developed VizSciFlow, a generalized scientific workflow management system that can be customized for use in a variety of scientific domains. As a first use case, we configured and customized VizSciFlow for the bioinformatics domain. We conducted three user studies to assess its usability, expressiveness, efficiency, and flexibility. Results are promising, and in particular, our user studies show that VizSciFlow is more desirable for users to use than either Python or Galaxy for solving complex scientific problems.
SciWorCS: Scientific Workflow Management systems (such as Galaxy, iPlant, Taverna, Keplar, etc) is different than usual software systems as the workflows are executed in a very structured way and processes which form the workflows are dependent with a particular dataflow direction. In such systems, we have been investigating whether real-time collaboration can increase the usability and efficiency of the systems.
Consistency Handling in Collaborative Scientiﬁc Workﬂow: One of the main challenges of scientific collaborative system is consistency management – in the face of conﬂicting concurrent operations by the collaborators [23], [24]. The existing research works use locking techniques where a collaborator gets exclusive Write access to a part of the workﬂow to facilitate the consistency management [19], [14]. I want to work on efficient locking algorithms that can reduce average waiting time of the collaborators and thus can improve the usability of a collaborative scientific workflow management system.
ProvMod-Viz: Workflow provenance is important for workflow behavior analysis, data quality measurement, usage pattern mining, fault detection, monitoring, providing user recommendations, resource management and so on. Data intensive workflow systems are never complete without provenance support. We have been developing a workflow programming model that is based on the Python Programming Language, extendable to a broad range of use cases, adaptable to third party tools and offers automated provenance, easy configuration and provenance querying via data visualizations.
Cross Language Software Similarity Detection (CRopSIM): As workflow management systems include software tools across various programming, languages, we are working on developing a tool that can detect similar software applications written in various programming languages.
Intermediate Big Data Management in Distributed Programming Platforms: We are developing a data management scheme that will allow us handle intermediate states intelligently or optimally. This scheme will make sure whether intermediate states should be reused by a workflow or regenerated during the execution time.
Meta data handling: We have been working on creating a dictionary based website for describing P2IRC-metadata.
CRHM Migration: We have been working on migrating and re-engineering a legacy hydrological software, named CRHM.

Past projects

Academic
Department of Computer Science and, Global Institute of Food Security, U of S.
Cloud-based frameworks and tools for P2IRC

Department of Computer Science, U of S and School of Computing, Queen’s University.
DiscoTech Toolkit (in C#)

School of Computing, Queen’s University, Kingston, Canada.
LIAV: Life is a village, an exercise based game, Groupware Architecture , and Movie Recommendation System (Java and JSP).

Khulna University of Engineering & Technology (KUET), Khulna, Bangladesh.
Component based software development (in Visual Basic), Natural Language Processing System (in C++ and ASP) and Finger Print Recognition Algorithm (in C++).

Industry
Institute of Building Materials Research, RWTH Aachen University
Developing Software for 3D Image Processing

Dohatec Software Developers, Dhaka, Bangladesh.
Store Management System in Visual Studio and MS SQL Server