Xifeng Yan
Areas of Expertise
Research
The primary objective of my research is to explore foundational models in artificial intelligence, customize these models for data mining, and examine their application across domains such as finance, healthcare, and science. Historically, we have made significant advancements in graph data mining and we have inadvertently pioneered the use of Transformer-based methods in time series forecasting.
My research focus in AI for Science encompasses three key areas: (1) Develop foundational models to leverage open-source materials structures, existing benchmarks, and millions of publicly available research articles to fine-tune Vision Language Models (VLMs) for knowledge retrieval, property prediction, and inverse design. These fine-tuned models can be further enhanced using in-house datasets and electronic lab notebooks (ELNs) to address domain-specific inquiries. (2) Utilize Transformer models as a tool for data analysis, with a particular emphasis on multimodal data mining and transfer learning. Our successful applications of these models for classification and forecasting span finance, healthcare, and scientific domains, where multimodal data are often predominant. (3) Explore the use of multimodal large language models (LLMs) to assist users in completing physical tasks when they require support.
Recent research demonstrates that the Transformer model underlying ChatGPT can perform tasks that go beyond information retrieval and simple property calculations. Its distinctive capability to digest and integrate scientific knowledge from research texts exceeds the limitations of traditional computational methods. This creates a significant opportunity for innovation by combining the strengths of both Transformer models and conventional computational approaches, potentially leading to unexpected and valuable outcomes.
While many existing scientific methods focus on a single data type for analysis, we aim to adopt a multimodal approach that leverages numerical, textual, and structural data at scale. The Transformer architecture is particularly well-suited for processing diverse data types, enabling both prediction and synthesis within a unified framework. Current proprietary vision language models fall short in effectively analyzing figures and establishing meaningful connections. To address this gap, we will develop representations using graph neural networks to convert figures into vector form, integrating them into large language models.
Large language models are inherently predictive, which can result in inaccuracies in their responses. Our methodology involves incorporating a chain-of-thought approach, providing reasoning steps that allow scientists to follow the logical process and draw their own conclusions. This not only assists researchers but also serves as a source of inspiration for generating new hypotheses.
We have been developing the Transformer model as a unified approach for data mining, with a specific focus on large-scale and multimodal data mining and transfer learning. Take time series data as a pertinent example. In 2019, we introduced the first-of-its-kind transformer-based forecasting method that achieved state-of-the-art performance and has since inspired further research in this area. Our time series forecasting model was successfully applied to COVID-19 forecasting, where reliable predictions are crucial for resource allocation and administrative planning. The results from compartmental models such as SIR and SEIR are popularly referred by CDC and news media. We posed the questions: Is there a data-driven method to forecast without explicitly modeling disease transmission dynamics? Can such an approach outperform the well-established compartmental models and their variants? Our findings demonstrated that this is indeed possible. This example underscores the significant potential of leveraging the Transformer model for non-text data analysis.
We are also developing systems utilizing multimodal LLMs as task assistants. These systems are capable of extracting knowledge from training video clips, augmenting this information with manuals and additional knowledge sources, perceiving task execution, reasoning about task status, and providing conversational guidance. These task assistants empower users to perform tasks by delivering just-in-time feedback, such as identifying and correcting errors during task execution, instructing users on the next steps, and answering their questions. It can ensure that users have access to the necessary knowledge to successfully complete their tasks. In addition, large language models can facilitate lab work by automatically registering anomalies, thereby aiding in the discovery of unknown phenomena.
Selected Publications
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
by Z. Li, X. Yang, K. Choi, W. Zhu, R. Hsieh, HJ Kim, JH Lim, S. Ji, B. Lee, X. Yan, L. Petzold, S. Wilson, W. Lim, W. Wang
AI4MAT'24 (AI for Accelerated Materials Design) 2024 Spotlights
Language Models Augmented with Decoupled Memory,
by W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, F. Wei
NeurIPS'23 (The Thirty-seventh Annual Conference on Neural Information Processing Systems), 2023
Guiding Large Language Models via Directional Stimulus Prompting,
by Z. Li, B. Peng, P. He, M. Galley, J. Gao, X. Yan
NeurIPS'23 (The Thirty-seventh Annual Conference on Neural Information Processing Systems),2023
Time Series as Images: Vision Transformer for Irregularly Sampled Time Series,
by Z. Li, S. Li, X. Yan,
NeurIPS'23 (The Thirty-seventh Annual Conference on Neural Information Processing Systems), 2023
Improving Medical Predictions by Irregular Multimodal Electronic Health Records Modeling
by X. Zhang, S. Li, Z. Chen, X. Yan, L. Petzold, 2022
ICML'23 (The Fortieth International Conference on Machine Learning)
PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks
by Y. Sun, J. Han, X. Yan, P. S. Yu, T Wu,
VLDB'11 (Proc. 2011 Int. Conf. on Very Large Data Bases), Aug 2011 (VLDB 2022 Test of Time Award)
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting,
by S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, X. Yan
NeurIPS'19 (The Thirty-third Annual Conference on Neural Information Processing Systems)
The Genome of the Jellyfish Aurelia and the Evolution of Animal Complexity,
by D. Gold, T. Katsuki, Y. Li, X. Yan, M. Regulski, D. Ibberson, T. Holstein, R. Steele, D. Jacobs, and R. Greenspan,
Nature Ecology and Evolution, 2018.
Substructure Similarity Search in Graph Databases,
by X. Yan, P. S. Yu, and J. Han,
SIGMOD'05 (Proc. of 2005 Int. Conf. on Management of Data), 2005.
gSpan: Graph-Based Substructure Pattern Mining,
by X. Yan and J. Han,
ICDM'02 (Proc. of 2002 Int. Conf. on Data Mining), 2002.
(IEEE ICDM 10-year Highest Impact Paper Award)