My primary research interest lies in building robust, trustworthy, and interpretable system by understanding implicit inductive biases of machine learning models and algorithms.
National Yang Ming Chiao Tung University Doctor of Medicine
Aug. 15 - Jun. 22
News
[2024/04] One paper on Mamba for DNA accepted at ICML'24.
[2023/11] One paper on biology LLM accepted at AAAI'24 workshop.
[2023/08] Begin PhD journey at Cornell.
[2023/05] One paper on Federated Learning submitted to arXiv.
[2022/06] Pass the Taiwan Medical Licensing Examination.
[2021/10] One paper accepted at ICLR'22 as poster.
[2021/10] One paper accepted at NeurIPS'21 workshop as oral paper.
[2021/06] One paper accepted at MICCAI'21 as oral paper.
Selected Publications
Caduceus: Bi-directional equivariant long-range dna sequence modeling
Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov
| abstract |
arxiv |
github |
Accepted by ICML'24 as poster.
Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off the long-range Mamba block, and extends it to a BiMamba component that supports bi-directionality, and to a MambaDNA block that additionally supports RC equivariance. We use MambaDNA as the basis of Caduceus, the first family of RC equivariant bi-directional long-range DNA language models, and we introduce pre-training and fine-tuning strategies that yield Caduceus DNA foundation models. Caduceus outperforms previous long-range models on downstream benchmarks; on a challenging long-range variant effect prediction task, Caduceus exceeds the performance of 10x larger models that do not leverage bi-directionality or equivariance.
FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning Chia-Hsiang Kao, Yu-Chiang Frank Wang
| abstract |
arxiv |
github |
Submitted to arXiv.
Federated Learning (FL) offers a collaborative training framework, allowing multiple clients to contribute to a shared model without compromising data privacy. Due to the heterogeneous nature of local datasets, updated client models may overfit and diverge from one another, commonly known as the problem of client drift. In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift. FedBug adaptively leverages the client model parameters, distributed by the server at each global round, as the reference points for cross-client alignment. Specifically, on the client side, FedBug begins by freezing the entire model, then gradually unfreezes the layers, from the input layer to the output layer. This bottom-up approach allows models to train the newly thawed layers to project data into a latent space, wherein the separating hyperplanes remain consistent across all clients. We theoretically analyze FedBug in a novel over-parameterization FL setup, revealing its superior convergence rate compared to FedAvg. Through comprehensive experiments, spanning various datasets, training conditions, and network architectures, we validate the efficacy of FedBug. Our contributions encompass a novel FL framework, theoretical analysis, and empirical validation, demonstrating the wide potential and applicability of FedBug.
MAML Is a Noisy Contrastive Learner in Classification Chia-Hsiang Kao, Wei-Chen Chiu, Pin-Yu Chen
| abstract |
arxiv |
poster |
github |
paper explained |
Accepted by ICLR'22 as poster.
Accepted by NeurIPS'21 workshop as oral presentation.
Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays, which achieves remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates which respectively govern the task-specific and meta-model-centric learning, the underlying learning objective of MAML still remains implicit and thus impedes a more straightforward understanding of it. In this paper, we provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function, where the query features are pulled towards the support features of the same class and against those of different classes, in which such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, our analysis reveals that the vanilla MAML algorithm has an undesirable interference term originating from the random initialization and the cross-task interaction. We therefore propose a simple but effective technique, zeroing trick, to alleviate such interference, where the extensive experiments are then conducted on both miniImagenet and Omniglot datasets to demonstrate the consistent improvement brought by our proposed technique thus well validating its effectiveness.
Demystifying T1-MRI to FDG18-PET Image Translation via Representational Similarity Chia-Hsiang Kao, Yong-Sheng Chen, Li-Fen Chen, Wei-Chen Chiu
| abstract |
paper|
Accepted by MICCAI'21 as oral representation.
Earned the Student Travel Award in MICCAI'21.
Recent development of image-to-image translation techniques has enabled the generation of rare medical images (e.g., PET) from common ones (e.g., MRI). Beyond the potential benefits of the reduction in scanning time, acquisition cost, and radiation exposure risks, the translation models in themselves are inscrutable black boxes. In this work, we propose two approaches to demystify the image translation process, where we particularly focus on the T1-MRI to PET translation. First, we adopt the representational similarity analysis and discover that the process of T1-MR to PET image translation includes the stages of brain tissue segmentation and brain region recognition, which unravels the relationship between the structural and functional neuroimaging data. Second, based on our findings, an Explainable and Simplified Image Translation (ESIT) model is proposed to demonstrate the capability of deep learning models for extracting gray matter volume information and identifying brain regions related to normal aging and Alzheimer's disease, which untangles the biological plausibility hidden in deep learning models.
Awards and Scholarships
[2021/06] Student Travel Award, MICCAI'21. (To reward the best, e.g. highest scoring, first author students)
[2020/08] Undergraduate Research Fellowship, National Science and Technology Council, Taiwan.
[2018/08] Undergraduate Research Fellowship, National Science and Technology Council, Taiwan.
[2018/06] Summer Research Fellowship, National Health Research Institutes and the Foundation of Health Sciences, Taiwan.
Services
[2022/08] Reviewer, Computer Vision and Image Understanding.
[2022/04] Reviewer, AutoML'22 Conference.
[2021/09] Junior Reviewer, Workshop on Meta-Learning, NeurIPS'21.
Writings
[2022/03]
ENPaper Explained — MAML Is a Noisy Contrastive Learner in Classification
[2021/10]
ENWhen a Man in the White Coat Codes. (II)
[2021/09]
ENOn Two Perspectives of Contrastive Divergence Algorithm.