TopPotho

Material/Chemical informatics



Weilin Yuan, Yusuke Hibi, Ryo Tamura, Masato Sumita, Yasuyuki Nakamura, Masanobu Naito, and Koji Tsuda "Revealing Factors Influencing Polymer Degradation with Rank-based Machine Learning",
Patterns, 4, 100846(2023).
(Open access)

The efficient treatment of polymer waste is a major challenge for marine sustainability. It is useful to reveal the factors that dominate the degradability of polymer materials for developing polymer materials in the future. The small number of available datasets on degradability and the diversity of their experimental means and conditions hinder large-scale analysis. In this study, we have developed a platform for evaluating the degradability of polymers that is suitable for such data, using a rank-based machine learning technique based on RankSVM. We then made a ranking model to evaluate the degradability of polymers, integrating three datasets on the degradability of polymers that are measured by different means and conditions. Analysis of this ranking model with a decision tree revealed factors that dominate the degradability of polymers.



Ryo Tamura, Kei Terayama, Masato Sumita, Koji Tsuda "Ranking Pareto optimal solutions based on projection free energy",
Phys. Rev. Mater., 7, 093804(2023).
(Open access)

Based on available datasets prepared by numerical simulations and machine learning, maps of properties for materials that have not yet been synthesized can be developed. These maps can be used to select promising materials for synthetic experiments. With a single objective function, the ranking of the optimal solutions can be simply obtained based on the values of the target property. However, applications with multiple target properties require the calculation of Pareto optimal solutions to visualize trade-offs. These solutions are generally ranked manually, selecting the weight of the multiple objectives based on prior knowledge. In this study, to provide an automated ranking of Pareto solutions, we introduced the most-isolated Pareto solution (MIPS) score, which is defined by a projection free energy. Using the MIPS ranking, it is possible to appropriately select the most isolated materials predicted in the property space. To verify the effectiveness of the proposed method, we used a database of semiconductors created by density-functional theory. Our method was able to correctly select and rank the most isolated solutions in both convex and concave two-dimensional Pareto frontiers, outperforming the most relevant outlier detection methods. We also demonstrated that our approach can be easily extended to three-dimensional property spaces.



Li Jiawen, Masato Sumita, Ryo Tamura, Koji Tsuda "Interpretable Fragment-based Molecule Design with Self-learning Entropic Population Annealing",
Adv. Intell. Syst., 5, 2300189(2023).
(Open access)

Self-learning entropic population annealing (SLEPA) is a recently developed method used for achieving interpretable black-box optimization via density-of-states estimation. Applying SLEPA to a chemical space is not straightforward, however, because of its dependence on Markov chain Monte Carlo sampling in the space of generated entities. Herein, SLEPA is applied to optimal molecule generation by combining an irreducible Markov chain in the space of fragment multisets and a probabilistic fragment assembler such as MoLeR. The weighted samples from SLEPA are used to identify salient fragments for the highest occupied molecular orbitals-lowest unoccupied molecular orbitals (HOMO-LUMO) gap maximization and minimization, and the relationship between the identified fragments and the electronic structures is elucidated. This approach offers a viable platform to reconcile the incompatible goals of optimization and interpretation during molecular design.



Shoichi Ishida, Tanuj Aasawat, Masato Sumita, Michio Katouda, Tatsuya Yoshizawa, Kazuki Yoshizoe, Koji Tsuda, Kei Terayama "ChemTSv2: Functional Molecular Design Using a de novo Molecule Generator",
WIREs Comput Mol Sci., 13, e1680(2023).
(Open access)

Designing functional molecules is the prerogative of experts who have advanced knowledge and experience in their fields. To democratize automatic molecular design for both experts and nonexperts, we introduce a generic open-sourced framework, ChemTSv2, to design molecules based on a de novo molecule generator equipped with an easy-to-use interface. Besides, ChemTSv2 can easily be integrated with various simulation packages, such as Gaussian 16 package, and supports a massively parallel exploration that accelerates molecular designs. We exhibit the potential of molecular design with ChemTSv2, including previous work, such as chromophores, fluorophores, drugs, and so forth. ChemTSv2 contributes to democratizing inverse molecule design in various disciplines relevant to chemistry.



Ryo Tamura, Masato Sumita, Kei Terayama, Koji Tsuda, Fujio Izumi, Yoshitaka Matsushita "Automatic Rietveld refinement by robotic process automation with RIETAN-FP",
Science and Technology of Advanced Materials: Methods, 2, 435-444(2022).
(Open access)

Rietveld analysis necessitates the manual trial-and-error refinement of various parameters. To reduce human costs and resources, we have developed a robotic process automation (RPA) system for the Rietveld analysis program RIETAN-FP. By executing our proposed RPA programs, the background parameters of intensities can be determined, and black-box optimizations used to automatically search for peak profile parameters with a small Rwp value. Additionally, we evaluated the programs for analyzing X-ray powder diffraction patterns of anatase TiO2, Ca5(PO4)3F, and BaSO4. Consequently, it was verified that RPA can be utilized to automate Rietveld analysis in simple cases. This indicates that RPA can be used for conducting black-box optimizations for graphical user interface (GUI) applications in materials science.



Masato Sumita, Kei Terayama, Ryo Tamura, and Koji Tsuda "QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization",
Journal of Chemical Information and Modeling, 62, 4427-4434(2022).
(Open access)

To obtain observable physical or molecular properties such as ionization potential and fluorescent wavelength with quantum chemical (QC) computation, multi-step computation manipulated by a human is required. Hence, automating the multi-step computational process and making it a black box that can be handled by anybody are important for effective database construction and fast realistic material design through the framework of black-box optimization where machine learning algorithms are introduced as a predictor. Here, we propose a Python library, QCforever, to automate the computation of some molecular properties and chemical phenomena induced by molecules. This tool just requires a molecule file for providing its observable properties, automating the computation process of molecular properties (for ionization potential, fluorescence, etc.) and output analysis for providing their multi-values for evaluating a molecule. Incorporating the tool in black-box optimization, we can explore molecules that have properties we desired within the limitation of QC computation.



Takehiro Fujita, Kei Terayama, Masato Sumita, Ryo Tamura, Yasuyuki Nakamura, Masanobu Naito, and Koji Tsuda "Understanding the evolution of a de novo molecule generator via characteristic functional group monitoring",
Science and Technology of Advanced Materials, 23, 352-360(2022).
(Open access)

Recently, artificial intelligence (AI)-enabled de novo molecular generators (DNMGs) have automated molecular design based on data-driven or simulation-based property estimates. In some domains like the game of Go where AI surpassed human intelligence, humans are trying to learn from AI about the best strategy of the game. To understand DNMG's strategy of molecule optimization, we propose an algorithm called characteristic functional group monitoring (CFGM). Given a time series of generated molecules, CFGM monitors statistically enriched functional groups in comparison to the training data. In the task of absorption wavelength maximization of pure organic molecules (consisting of H, C, N, and O), we successfully identified a strategic change from diketone and aniline derivatives to quinone derivatives. In addition, CFGM led us to a hypothesis that 1,2-quinone is an unconventional chromophore, which was verified with chemical synthesis. This study shows the possibility that human experts can learn from DNMGs to expand their ability to discover functional molecules.



Masato Sumita, Kei Terayama, Naoya Suzuki, Shinsuke Ishihara, Ryo Tamura, Mandeep K. Chahal, Daniel T. Payne, Kazuki Yoshizoe, and Koji Tsuda "De novo creation of a naked eye–detectable fluorescent molecule based on quantum chemical computation and machine learning",
Science Advances, 8, abj3906 (2022).
(Open access)

Designing fluorescent molecules requires considering multiple interrelated molecular properties, as opposed to properties that straightforwardly correlated with molecular structure, such as light absorption of molecules. In this study, we have used a de novo molecule generator (DNMG) coupled with quantum chemical computation (QC) to develop fluorescent molecules, which are garnering significant attention in various disciplines. Using massive parallel computation (1024 cores, 5 days), the DNMG has produced 3643 candidate molecules. We have selected an unreported molecule and seven reported molecules and synthesized them. Photoluminescence spectrum measurements demonstrated that the DNMG can successfully design fluorescent molecules with 75% accuracy (n = 6/8) and create an unreported molecule that emits fluorescence detectable by the naked eye.



Xiaolin Sun, Ryo Tamura, Masato Sumita, Kenichi Mori, Kei Terayama, and Koji Tsuda "Integrating Incompatible Assay Data Sets with Deep Preference Learning",
ACS Medicinal Chemistry Letters, 13, 70-75 (2022).
(Open access)

A large amount of bioactivity assay data is already accumulated in public databases, but the integration of these data sets for quantitative structure–activity relationship (QSAR) studies is not straightforward due to differences in experimental methods and settings. We present an efficient deep-learning-based approach called Deep Preference Data Integration (DPDI). For integrating outcome variables of different assay types, a surrogate variable is introduced, and a neural network is trained such that the total order induced by the surrogate variable is maximally consistent with given data sets. In a task of predicting efficacy of factor Xa inhibitors, DPDI successfully integrated 2959 molecules distributed in 129 assay data sets. In most of our experiments, data integration improved prediction accuracy strongly in interpolation and extrapolation tasks, indicating that DPDI is an effective tool for QSAR studies.



Kei Terayama, Masato Sumita, Michio Katouda, Koji Tsuda, and Yasushi Okuno "DEfficient Search for Energetically Favorable Molecular Conformations against Metastable States via Gray-Box Optimization",
J. Chem. Theory Comput., 17, 5419 (2021).
(Open access)

In order to accurately understand and estimate molecular properties, finding energetically favorable molecular conformations is the most fundamental task for atomistic computational research on molecules and materials. Geometry optimization based on quantum chemical calculations has enabled the conformation prediction of arbitrary molecules, including de novo ones. However, it is computationally expensive to perform geometry optimizations for enormous conformers. In this study, we introduce the gray-box optimization (GBO) framework, which enables optimal control over the entire geometry optimization process, among multiple conformers. Algorithms designed for GBO roughly estimate energetically preferable conformers during their geometry optimization iterations. They then preferentially compute promising conformers. To evaluate the performance of the GBO framework, we applied it to a test set consisting of seven dipeptides and mycophenolic acid to determine their stable conformations at the density functional theory level. We thus preferentially obtained energetically favorable conformations. Furthermore, the computational costs required to find the most stable conformation were significantly reduced (approximately 1% on average, compared to the naive approach for the dipeptides).



Yucheng Zhang, Jinzhe Zhang, Kuniko Suzuki, Masato Sumita, Kei Terayama, Jiawen Li, Zetian Mao, Koji Tsuda, Yuji Suzuki "Discovery of polymer electret material via de novo molecule generation and functional group enrichment analysis",
Applied Physics Letters, 118, 224904 (2021).

We designed a high-performance polymer electret material using a deep-learning-based de novo molecule generator. By statistically analyzing the enrichment of the functional groups of the generated molecules, the hydroxyl group was determined to be crucial for enhancing the electron gain energy. Incorporating such acquired knowledge, we designed a molecule using cyclic transparent optical polymer (CYTOP; perfluoro-3-butenyl-vinyl ether). The molecule was synthesized and its surface potential for a 15-µm-thick film is kept at −3 kV for more than 800 h. Its performance was significantly better than all commercialized CYTOP polymer electrets, indicating great potential for its application in vibration-based energy harvesting. Our results demonstrate the application of machine learning in polymer electret design and confirm the combination of molecule generation and functional group enrichment analysis to be a promising chemical discovery method achieved via human-AI collaboration.



Kei Terayma, Masato Sumita, Ryo Tamura, Koji Tsuda "Articles-Box Optimization for Automated Discovery",
Accounts of Chemical Research, 54, 1334-1346 (2021).
(Open access)

In chemistry and materials science, researchers and engineers discover, design, and optimize chemical compounds or materials with their professional knowledge and techniques. At the highest level of abstraction, this process is formulated as black-box optimization. For instance, the trial-and-error process of synthesizing various molecules for better material properties can be regarded as optimizing a black-box function describing the relation between a chemical formula and its properties. Various black-box optimization algorithms have been developed in the machine learning and statistics communities. Recently, a number of researchers have reported successful applications of such algorithms to chemistry. They include the design of photofunctional molecules and medical drugs, optimization of thermal emission materials and high Li-ion conductive solid electrolytes, and discovery of a new phase in inorganic thin films for solar cells. There are a wide variety of algorithms available for black-box optimization, such as Bayesian optimization, reinforcement learning, and active learning. Practitioners need to select an appropriate algorithm or, in some cases, develop novel algorithms to meet their demands. It is also necessary to determine how to best combine machine learning techniques with quantum mechanics- and molecular mechanics-based simulations, and experiments. In this Account, we give an overview of recent studies regarding automated discovery, design, and optimization based on black-box optimization. The Account covers the following algorithms: Bayesian optimization to optimize the chemical or physical properties, an optimization method using a quantum annealer, best-arm identification, gray-box optimization, and reinforcement learning. In addition, we introduce active learning and boundless objective- free exploration, which may not fall into the category of black-box optimization. Data quality and quantity are key for the success of these automated discovery techniques. As laboratory automation and robotics are put forward, automated discovery algorithms would be able to match human performance at least in some domains in the near future.



Jinzhe Zhang, Kei Terayma, Masato Sumita, Kazuki Yoshizoe, Kengo Ito, Jun Kikuchi, Koji Tsuda "NMR-TS: de novo molecule identification from NMR spectra",
Science and Technology of Advanced Materials, 21, 552-561 (2020).
(Open access)

Nuclear magnetic resonance (NMR) spectroscopy is an effective tool for identifying molecules in a sample. Although many previously observed NMR spectra are accumulated in public databases, they cover only a tiny fraction of the chemical space, and molecule identification is typically accomplished manually based on expert knowledge. Herein, we propose NMR-TS, a machine-learning-based python library, to automatically identify a molecule from its NMR spectrum. NMR-TS discovers candidate molecules whose NMR spectra match the target spectrum by using deep learning and density functional theory (DFT)-computed spectra. As a proof-of-concept, we identify prototypical metabolites from their computed spectra. After an average 5451 DFT runs for each spectrum, six of the nine molecules are identified correctly, and proximal molecules are obtained in the other cases. This encouraging result implies that de novo molecule generation can contribute to the fully automated identification of chemical structures. NMR-TS is available at https://github.com/tsudalab/NMR-TS.



Kei Terayma, Masato Sumita, Ryo Tamura, Daniel T. Payne, Mandeep K. Chahal, Shinsuke Ishihara, Koji Tsuda "Pushing property limits in materials discovery via boundless objective-free exploration",
Chemical Science, 11, 5959-5968 (2020).
(Open access)

Materials chemists develop chemical compounds to meet often conflicting demands of industrial applications. This process may not be properly modeled by black-box optimization because the target property is not well defined in some cases. Herein, we propose a new algorithm for automated materials discovery called BoundLess Objective-free eXploration (BLOX) that uses a novel criterion based on kernel-based Stein discrepancy in the property space. Unlike other objective-free exploration methods, a boundary for the materials properties is not needed; hence, BLOX is suitable for open-ended scientific endeavors. We demonstrate the effectiveness of BLOX by finding light-absorbing molecules from a drug database. Our goal is to minimize the number of density functional theory calculations required to discover out-of-trend compounds in the intensity-wavelength property space. Using absorption spectroscopy, we experimentally verified that eight compounds identified as outstanding exhibit the expected optical properties. Our results show that BLOX is useful for chemical repurposing, and we expect this search method to have numerous applications in various scientific disciplines.



Kenji Homma, Yu Liu, Masato Sumita, Ryo Tamura, Naoki Fushimi, Junichi Iwata, Koji Tsuda, and Chioko Kaneta "Optimization of Heterogeneous Ternary Li3PO4-Li3BO3-Li2SO4 Mixture for Li-ion Conductivity by Machine Learning",
The Journal of Physical Chemistry C , 124, 12865-12870 (2020).

Mixing heterogeneous Li-ion conductive materials is one potential way to enhance Li-ion conductivity more than that of the parent materials. However, the huge number of possible compositions of parent materials impede the development of an optimal mixture by using conventional methods. In this study, we employed machine learning to optimize the composition ratio of ternary Li3PO4-Li3BO3-Li2SO4 for Li-ion conductivity. We found the optimum composition of the ternary mixture system to be 25:14:61 (Li3PO4:Li3BO3:Li2SO4 in mol%), whose Li-ion conductivity is measured as 4.9 × 10-4 S/cm at 300 ℃. Our X-ray structure analysis suggested that Li-ion conductivity of the mixed systems tends to be enhanced by the coexistence of two or more phases. Although the mechanism enhancing Li-ion conductivity is not simple, our results demonstrate the effectiveness of machine learning for the development of materials.



Xiaolin Sun, Zhufeng Hou, Masato Sumita, Shinsuke Ishihara, Ryo Tamura, Koji TSUDA, "Data Integration for Accelerated Materials Design via Preference Learning",
New Journal of Physics, 22, 055001(1-7) (2020).
(Open access)

Machine learning applications in materials science are often hampered by shortage of experimental data. Integration with external datasets from past experiments is a viable way to solve the problem. But complex calibration is often necessary to use the data obtained under different conditions. In this paper, we present a novel calibration-free strategy to enhance the performance of Bayesian optimization with preference learning. The entire learning process is solely based on pairwise comparison of quantities (i.e., higher or lower) in the same dataset, and experimental design can be done without comparing quantities in different datasets. We demonstrate that Bayesian optimization is significantly enhanced via data integration for organic molecules and inorganic solid-state materials. Our method increases the chance that public datasets are reused and may encourage data sharing in various fields of physics.



Masato SUMITA, Ryo TAMURA, Kenji HOMMA, Chioko KANETA, Koji TSUDA, "Li-Ion Conductive Li3PO4-Li3BO3-Li2SO4 Mixture: Prevision through Density Functional Molecular Dynamics and Machine Learning",
Bull. Chem. Soc. J., 92, 1100-1106 (2019).

The development of high Li-ion conductive solid electrolytes is crucial for the practical use of all solid-state Li-ion batteries. The mixing of hetero Li-ion conductive substances is a known method for enhancing the Li-ion conductivity more than in the original substances. In this study, using computer simulations, we proved that a ternary Li3PO4-Li3BO3-Li2SO4 system has the potential to indicate improved Li-ion conductivity based on the introduction of a pseudo-Li-ion/oxygen vacancy. The Li-ion conductivities of this ternary system were calculated using several model systems based on the density functional molecular dynamics under an isothermal-isobaric ensemble. However, an exploration using the density functional molecular dynamics cannot cover the entire combinatorial space owing to a lack of computational capability. To search through a vast combinatorial space, we conducted analyses using a machine learning technique. The analysis results clarify the relationship between Li-ion conductivity and phonon free energy, and allow the optimum composition ratio with the highest Li-ion conductivity to be predicted.



Naruki YOSHIKAWA, Kei TERAYAMA, Masato SUMITA, Teruki HOMMA, Kenta OONO, Koji TSUDA, "Population-based De Novo Molecule Generation, Using Grammatical Evolution",
Chem. Lett. , 47, 1431-1434 (2018).
(Open access)

Automatic molecule design with machine learning and simulations has shown a remarkable ability to generate new and promising drug candidates. We propose a new population-based approach using a grammatical evolution named ChemGE, that can update a large population of molecules concurrently and evaluate with multiple simulators in parallel. In computational experiments, ChemGE succeeded in finding hundreds of candidate molecules whose affinity for thymidine kinase is better than that of known binding molecules in a database (DUD-E).


Masato SUMITA, Xiufeng YANG, Shinsuke ISHIHARA, Ryo TAMURA, Koji TSUDA
"Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies",
ACS Cent. Sci. , 4, 1126-1133 (2018).
(Open access)

This work presents a proof-of-concept study in artificial-intelligence-assisted (AI-assisted) chemistry where a machine-learning-based molecule generator is coupled with density functional theory (DFT) calculations, synthesis, and measurement. Although deep-learning-based molecule generators have shown promise, it is unclear to what extent they can be useful in real-world materials development. To assess the reliability of AI-assisted chemistry, we prepared a platform using a molecule generator and a DFT simulator, and attempted to generate novel photofunctional molecules whose lowest excited states lie at desired energetic levels. A 10 day run on the 12-core server discovered 86 potential photofunctional molecules around target lowest excitation levels, designated as 200, 300, 400, 500, and 600 nm. Among the molecules discovered, six were synthesized, and five were confirmed to reproduce DFT predictions in ultraviolet visible absorption measurements. This result shows the potential of AI-assisted chemistry to discover ready-to-synthesize novel molecules with modest computational resources.


Masato Sumita Ph. D.