Kaiaulu

Project Summary

Kaiaulu is an open-source R toolkit developed at the University of Hawaiʻi at Mānoa for mining and analyzing software repositories. It supports a wide range of data sources, including Git logs, Jira issue trackers, mbox mailing list archives, and third-party static analysis tools like DV8, Understand, and Depends, enabling empirical software engineering research at scale.

Overview

As part of a research team led by Carlos Paradis (Data Scientist at KBR, Inc., Ph.D. in Computer Science from UH Mānoa), I helped build a pipeline to determine the causal effect of negative sentiment on code contribution in open-source communities. My work focused on extending Kaiaulu’s graph infrastructure and tool integrations to support the data collection and network construction stages of that pipeline.

My Contributions

S3 Graph Model System

The original codebase built and passed networks inconsistently across analysis functions, requiring callers to manually specify what kind of network they were working with. I overhauled this by introducing two standardized S3 graph constructors:

Both constructors encode graph properties (direction, weight, and node structure) directly into the S3 class vector, so downstream functions can dispatch on them without any manual specification from the caller.

I then refactored core analysis and projection functions throughout the package to use S3 generic dispatch, so they automatically detect the type of network they receive and handle it correctly, removing the need for callers to pass that information manually each time.

Custom Edge Weight Parameters

I added weight and weight_agg parameters to Kaiaulu’s network transform functions, enabling graphs to be weighted by any column in the underlying edgelist rather than just co-occurrence counts. In practice this allows networks to be weighted by sentiment and emotion scores derived from developer communication (e.g., mailing list or issue comment data), opening up affective network analysis in addition to structural analysis.

Extended DV8 Integration

I extended Kaiaulu’s integration with DV8, a third-party architecture analysis tool, to handle more complex nested data structures, specifically parsing DV8 cluster hierarchies recursively rather than assuming a flat structure. I also parameterized DV8 filenames throughout the pipeline so that different DV8 outputs can be targeted without hardcoding paths.

New Vignettes

I added three new R Markdown vignettes to the package:

What I Learned

Working on Kaiaulu gave me hands-on experience with R’s S3 object system, including how to design generic functions that dispatch on class to achieve clean polymorphic behavior without formal class hierarchies. It also deepened my understanding of empirical software engineering research and how version control history, issue trackers, mailing lists, and static analysis tools can be combined into unified network representations to study how software teams work. Contributing 51 commits to an active research tool reinforced the importance of designing consistent, well-documented APIs that other researchers can build on.

Source: sailuh/kaiaulu