UMass Boston CS

PhD Defense: Jenna Kim on Wednesday, 4/8 at 11AM

2026-04-08T00:00:00+00:00

Streamlined Biomedical Image Processing Pipelines

PhD Thesis Defense by Jenna Kim

Committee Members: Prof. Daniel Haehn (Chair), Prof. Jae W. Song, Prof. Tales Imbiriba, Prof. Nurit Haspel

GPD: Prof. Dan Simovici

When: 11:00 AM, April 08, 2026 (Wednesday)

Where: Pomplun Lab

Zoom Link: https://umassboston.zoom.us/my/jennajkim

This dissertation focuses on advancing carotid artery analysis through a series of visualizations and deep learning tools for calcified plaque assessment and related biomedical imaging tasks. Accurate plaque evaluation is essential, but current workflows depend on slow, clinician-dependent manual review. To address these limitations, this work introduces the CACTAS framework, a set of tools and methods that enable fast and reliable plaque segmentation for clinicians.

The first study, the CACTAS-Tool, provides a web-based labeling tool that enables clinicians to label plaque directly in three dimensions through a streamlined one-click interface. This tool significantly reduces the effort required to generate high-quality annotations by replacing slice-by-slice labeling with a more intuitive 3D interaction model, offering a faster alternative to existing manual workflows.

The second project, CACTAS-AI, automates plaque segmentation through a two-step deep learning approach. The system first segments the carotid artery to focus the search space and then segments calcified plaque within this anatomically relevant region.

The third study, CACTAS-UQ, investigates beyond segmentation by showing how confident the AI is in its predictions. To make this information accessible, the study introduces an integrated visualization that combines plaque composition, prediction probability, and uncertainty into a single unified view. Uncertain regions are visually flagged, giving clinicians a clearer picture of prediction reliability and supporting more informed decision-making.

This dissertation extends these visualization concepts to a microscopy-based biomedical setting through an ongoing collaboration with MYOTWIN, which serves as an extension of a DAAD-funded research exchange in Germany, where I was a visiting researcher in Summer 2025. This project focuses on the interactive visualization and analysis of calcium transients in engineered heart tissue, leveraging fluorescence imaging to study cardiac behavior.

Together, these studies advance the role of visualization and machine learning in medical image analysis, presenting practical, interpretable, and clinician-centered tools that enable more transparent and efficient assessment of vascular and cardiac imaging data. These contributions lay the foundation for future applications in clinical decision support.

MS Thesis Defense: Avanith Kanamarlapudi

2026-03-31T00:00:00+00:00

OMAMA-DB: The Oregon–Massachusetts Mammography Database

When: 11:00 AM, March 31, 2026 (Tuesday)

Where: Pomplun Lab

Speaker: Avanith Kanamarlapudi

Committee Members: Prof. Daniel Haehn (Chair), Prof. Dan Simovici, Prof. Nurit Haspel

GDP: Prof. Dan Simovici

Abstract

Purpose: Public datasets for training AI models in breast cancer screening are limited in size and quality, making it difficult to develop reliable systems. We introduce OMAMA-DB, an extensive publicly available collection of 2D mammograms and 3D tomosynthesis volumes.

Approach: Starting from 967,991 images, we created a curated set of 231,080 images using a multi-stage filtering process that removes missing labels, uncommon dimensions, rare scanner types, duplicate studies, and invalid DICOM files. All 2D images then undergo additional outlier detection using histogram filtering and a variational autoencoder to remove low-quality outliers. OMAMA-DB includes pathology-based cancer labels and automated lesion annotations generated using DeepSight. We also provide a web-based annotation tool for expert validation. To demonstrate usability, we fine-tuned MedGemma on a balanced subset of OMAMA-DB. We conducted a preliminary user study comparing human and automated classification of real and synthetic mammograms.

Results: OMAMA-DB contains 231,080 images, including 7,351 2D and 374 3D cancer cases. Fine-tuned MedGemma achieved 0.989 accuracy, 0.997 sensitivity, and a F1 score of 0.989 on a balanced validation set of 2,942 images. In real-versus-synthetic classification, humans achieved 0.485 accuracy, while Logistic Regression and CNN achieved 0.972 and 0.997.

Conclusions: OMAMA-DB provides a large mammography dataset with pathology-based labels and automated lesion annotations to support medical imaging research. Fine-tuned foundation models demonstrate strong cancer classification performance, while the gap between human and automated detection of synthetic images highlights the importance of real clinical data. All data, models, and parameters are openly available for research use.

Description

Zoom link: https://umassboston.zoom.us/j/94251400964

Meeting ID: 942 5140 0964

Passcode: 432711

CS Department is Organizing NEDB Day 2026!

2026-01-06T00:00:00+00:00

The Department of Computer Science is organizing the 16th North East Database (NEDB) Day on our beautiful campus of UMass Boston on January 16, 2026! Our Assistant Professor, Tarikul Islam Papon, is the lead organizer. NEDB Day is an all-day conference-style event where participants from the research community and industry in the northeast region of North America can come together to present ideas and discuss their research and experiences. This includes talks throughout the day, keynotes, as well as a poster session in the afternoon. NEDB Day is the largest regional database-focused event of its kind and typically attracts 150–170 participants, including faculty members, graduate students, and industry researchers. While many attendees come from Massachusetts institutions such as MIT, Harvard, Northeastern, and BU, the event also draws participants from nearly all major universities in the region—including Yale, Brown, Columbia, NYU, Rochester, Stony Brook, Penn State, and UPenn—as well as top industry organizations such as Google, Microsoft Research, Intel, Meta, Oracle, InfluxDB, and CockroachDB. If you plan to attend, you can register anytime here. Registration fee is $25 for students and $100 for others.

Time: January 16, 2026 at 8:00 AM Eastern Time

Location: UMass Boston – Campus Center Ballroom

Website: https://nedbday.github.io/2026/

Keynote Speakers: Ippokratis Pandis (Databricks), Nesime Tatbul (Intel, MIT) and Ryan Marcus (University of Pennsylvania)

Registration: https://commerce.cashnet.com/COMPSCIpay

Happy Fall Semester!

2025-09-02T00:00:00+00:00

Welcome back to campus everyone!! We are excited for this semester and hope you are too :D!

Remembering Marc Pomplun

2025-03-12T00:00:00+00:00

We are deeply saddened to announce that Marc Pomplun, the Computer Science department chair and a beloved colleague, passed away on January 18, 2025. He was 55. Marc was admired by his colleagues and students alike. He enjoyed teaching and mentoring students and passionately incorporated creative ways to bring to life the concepts he taught in the classroom. He was also widely appreciated for his major contributions to UMass Boston having served as department chair since Summer 2019.

Marc was born in Ratzeburg, Germany and grew up in Lübeck, Germany. He was the only child of his mother Petra (neé Liesener), who is 82 and lives in Lübeck, and his late father Hasko Pomplun. He completed his undergraduate and graduate studies at Bielefeld University in Bielefeld, Germany. He conducted his doctoral research under Professor Helge Ritter and later completed postdoctoral work at the University of Toronto before coming to UMass Boston in 2002.

At the start of his career at UMass Boston, Marc’s research focused on developing computational models of visual attention. His more recent work centered on facial recognition, object detection, tracking and recognition, and LLM aided image, video, and data analysis. He received numerous awards and grants for his research, including the UMass Boston Outstanding Achievement Award for Research, and grants from the U.S. Department of Education, NIH/National Eye Institute, and NSF. Marc taught a variety of courses in computer science, including, but not limited to, Introduction to Software Engineering, Introduction to Artificial Intelligence, Computer Vision, and Applied Discrete Mathematics.

Marc married Michelle Umali in 2007 at the New York Botanical Garden. She describes Marc as a very thoughtful and loving husband. He was the proud papa of their cats Magnus and Hans, who would occasionally appear in lecture slides. He was also a caring son who spoke with his parents regularly in Germany. Marc has remained in close contact with his childhood friends from Germany and would often meet with them over Zoom. His favorite football team was Bayern-Münich, and in his free time he enjoyed playing online chess. His most recent hobby included collecting acorns during walks in Central Park with Michelle and planting them. So far, he has had one tiny success, which they hope will grow into a small oak tree. Marc famously had what Michelle and his family call “German humor,” with the humor lying in the fact that the jokes were “genuinely terrible.”

Everyone who knew Marc describe him as exremely kind, bright and genuine, and as a wonderful colleauge and mentor. He will be profoundly missed by his family, his colleagues here at the University and elsewhere, and by his former students.

The Art of Designing Distributed Algorithms for Large-scale Machine Learning

2025-02-25T00:00:00+00:00

CS Faculty Candidate Talk: Dr. Xinwei Zhang

Title: The Art of Designing Distributed Algorithms for Large-scale Machine Learning

Time: February 25, 2025 at 10:00 AM Eastern Time (US and Canada)

Location: Healy Library 10th floor seminar room 0025E Refreshments will be served.

Click Here to Join Zoom Meeting

Abstract: Distributed algorithms are fundamental to modern applications in machine learning, signal processing, and control systems. However, designing and analyzing these algorithms remains challenging due to the intricate interplay of communication, computation, and scalability.

Bio: Xinwei Zhang is a Postdoctoral Fellow in Prof. Meisam Razaviyayn’s group, in the Department of Industrial and System Engineering at University of Southern California. He received Ph.D. and M.S. degree in Electrical Engineering at the University of Minnesota advised by Prof. Mingyi Hong and Sairaj Dhople in 2023 and 2022, respectively. He received his B.S. degree in Automation at University of Science and Technology of China in 2018. His research focuses on contemporary issues in differential privacy and distributed optimization, including differential privacy for large-scale training machine learning models, the theoretical aspect of federated learning, decentralized optimization, and distributed machine learning system design. His broad research interest lies at the intersection of machine learning, signal processing, and control theory. He has published over 20 papers in conferences and journals, including ICML, ICLR, NeurIPS, IEEE TSP, IEEE SPM, and SIOPT.

Building a Collaborative and Interactive Data System to Broaden Access to Data Science, AI, and ML

2025-02-21T00:00:00+00:00

CS Faculty Candidate Talk: Dr. Yicong Huang

Title: Building a Collaborative and Interactive Data System to Broaden Access to Data Science, AI, and ML

Time: February 21, 2025 at 10:00 AM Eastern Time (US and Canada)

Location: M03-721. Refreshments will be served.

Click Here to Join Zoom Meeting

Abstract: In an era where data-driven decision-making shapes industries, governments, and everyday life, the ability to leverage data science has become an essential skill. Modern data science techniques, including artificial intelligence (AI), machine learning (ML), and large language models (LLMs), offer advanced capabilities but often require programming expertise, limiting accessibility for a broader audience.

Bio: Yicong Huang is a final-year Ph.D. candidate from the Information Systems Group (ISG) in the Computer Science Department, University of California, Irvine. Under the guidance of Dr. Chen Li, his research focuses on big data management, data-processing systems, and systems for data science, AI and ML. Yicong has made significant contributions to the Texera project. He has published in top-tier database venues such as SIGMOD and VLDB. His interdisciplinary research spans venues such as TOCHI, PNAS Nexus, JAMIA, AMIA, and PLOS ONE. Yicong completed research internships at ByteDance, VISA, and Observe, where he contributed to patents and papers. At SIGMOD, his research has received a Best Demo Runner-Up Award. He received honors such as the 2024 Graduate Dean’s Dissertation Fellowship and the 2023 Public Impact Fellowship from UCI. For more information about his work, please visit yicong-huang.github.io.

Advancing Approximate Queries with Innovative Data Summaries and Generative Models

2025-02-20T00:00:00+00:00

CS Faculty Candidate Talk: Dr. Fuheng Zhao

Title: Advancing Approximate Queries with Innovative Data Summaries and Generative Models"

Time: February 20, 2025 at 10:00 AM Eastern Time (US and Canada)

Location: Campus Center meeting room 4201 on the fourth floor. Refreshments will be served.

Click Here to Join Zoom Meeting

Abstract: The exponential growth of data has introduced significant challenges for traditional query processing systems, creating a pressing need for faster and more resource-efficient approaches. Approximation techniques have emerged as a promising solution, striking an optimal balance between accuracy and performance. Data summaries, such as samples, sketches, and histograms, play a crucial role in this paradigm by condensing large datasets into compact representations and maintaining critical insights. Additionally, recent advances in generative models (including large language models) open new possibilities for handling incomplete information, accommodating diverse data types, and approximating complex computations at scale. In this talk, I will discuss my research on theoretically grounded data summarization methods, as well as my latest efforts to integrate generative models into data systems. Together, these contributions advance approximate query processing toward realistic, high-impact applications in modern data analytics.

Bio: Fuheng Zhao is a Ph.D. candidate at the University of California Santa Barbara, advised by Professor Divyakant Agrawal and Professor Amr El Abbadi. His research has been recognized at top database and machine learning conferences such as VLDB, NeurIPS, and CIDR. He is a recipient of the Microsoft Ph.D. Fellowship, the Charles Dana Fellowship, and was honored with the Outstanding Paper Award from UCSB’s Computer Science Department.

Optimizing Data Systems for Modern Storage and Memory Technology

2025-02-19T00:00:00+00:00

CS Faculty Candidate Talk: Dr. Tarikul Islam Papon

Title: Optimizing Data Systems for Modern Storage and Memory Technology

Time: February 19, 2025 at 10:00 AM Eastern Time (US and Canada)

Location: M03-732/Web lab. Refreshments will be served.

Click Here to Join Zoom Meeting

Abstract: Data-intensive applications stress the memory hierarchy with unnecessary data movement and the need to integrate new storage technologies. My research addresses these challenges through two main approaches: unlocking the potential of modern storage devices via faithful modeling and minimizing data movement through hardware specialization.

Solid-state drives (SSDs), now dominant in secondary storage, exhibit read/write asymmetry and access concurrency. Most storage-intensive applications overlook these characteristics, leading to suboptimal performance. I propose a new storage modeling approach capturing these properties. Using this model, I have developed (i) an asymmetry & concurrency-aware DBMS bufferpool management (that uses the device's write concurrency to amortize the asymmetric write cost), (ii) a concurrency-aware graph manager, and (iii) a reinforcement learning based data placement policy for tiered storage architecture. This research paves the way for SSD-aware designs, allowing more systems and components to benefit from this approach.

Moving up the memory hierarchy, data movement is a key bottleneck exacerbated by static layout decisions. To address this, we leverage hardware specialization by developing a custom FPGA-based hardware through software/hardware co-design. Our proposed hardware performs fast on-the-fly data transformation closer to data in memory based on the query access pattern to minimize cache pollution. This design brings a lot of opportunities for simplicity and innovation in the entire data system software stack.

In this talk, I will present some of my research on (i) SSD-aware data system design and (ii) hardware/software co-design for database operations.

Bio: Tarikul Islam Papon is a final-year PhD candidate in Computer Science at Boston University (BU), advised by Manos Athanassoulis. His research focuses on hardware-aware data management challenges stemming from the evolution of storage and memory devices. Papon's work has been published in top-tier database conferences (SIGMOD, VLDB, ICDE, EDBT) and journals (ACM TODS, IEEE TKDE). He has received several awards, including the Best Demo Award (VLDB '23), the Best Vision Paper Award (ICDE '23), and BU’s CS Research Excellence Award. He has also served as a graduate-level course instructor at BU and interned at Microsoft Research and Intel Labs during his PhD. Before joining BU, Papon served as a Lecturer for four years at the CSE Department at Bangladesh University of Engineering and Technology (BUET), working on machine learning and embedded systems.

Optimizing Irregular Data Movement at Scale

2025-02-18T00:00:00+00:00

CS Faculty Candidate Talk: Dr. Ke Fan

Title: Optimizing Irregular Data Movement at Scale

Time: February 18, 2025 at 10:00 AM Eastern Time (US and Canada)

Location: M03-721. Refreshments will be served.

Click Here to Join Zoom Meeting

Abstract: Rapid advancements in computing technologies, especially the broad adoption of heterogeneous computing and the imminent arrival of Exascale machines, are pushing the frontiers of computational sciences in terms of both the scale and the complexity of problems that can be studied. However, these growing possibilities also pose the critical challenge of optimizing computational resources to efficiently manage data movement, particularly sparse and irregular patterns associated with unbalanced workloads. Many modern high-performance computing (HPC) applications, such as parallel machine learning (ML) and graph mining, exhibit some degree of sparsity and irregularity characterized by unpredictable memory access patterns, complex network communication behaviors, and imbalanced workloads or I/O per process. These characteristics present significant scalability challenges for large-scale systems. To mitigate these challenges and maximize resource utilization and performance, I focus on two primary areas: (1) collective communication, focusing on optimizing data exchange among processes with non-uniform, sparse data distributions over networks, and (2) parallel file I/O, addressing unbalanced data exchange effectively between compute nodes and parallel file systems. In addition, while dealing with high degrees of parallelism, performance analysis of irregular HPC applications becomes increasingly crucial. The scalable performance analysis frameworks for characterizing the behavior of unstructured large-scale applications can offer valuable insights into data movement patterns that inform further optimizations.

Bio: Ke Fan is currently a Ph.D. candidate in Computer Science at the University of Illinois Chicago under the mentorship of Dr. Sidharth Kumar. Her research lies in the area of high-performance computing (HPC), with a particular emphasis on three key areas: optimizing the performance of MPI collectives, enhancing the performance of irregular parallel I/O operations, and improving the scalability of performance introspection frameworks. Throughout her doctoral journey, she has made significant contributions to the field of HPC, which is reflected in her publications at top-tier HPC conferences such as HPDC, HiPC, and ISC. Her poster presentation was recognized as the Best Poster Finalist at Supercomputing (SC) 2023. Further cementing her impact in the field, she was awarded the esteemed 2024 ACM/IEEE-CS George Michael Memorial High-Performance Computing Fellowship, one of the most prestigious honors in H