Curation of datasets, assessment of their quality and completeness, and nanoSAR classification model development for metallic nanoparticles

Tung X. Trinh; My Kieu Ha; Jang Sik Choi; Hyung Gi Byun; Tae Hyun Yoon

doi:10.1039/C8EN00061A

Curation of datasets, assessment of their quality and completeness, and nanoSAR classification model development for metallic nanoparticles†

Tung X. Trinh,

^a My Kieu Ha,^a Jang Sik Choi,^b Hyung Gi Byun

^b and Tae Hyun Yoon

*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Republic of Korea
E-mail: taeyoon@hanyang.ac.kr

^b Division of Electronics, Information and Communication Engineering, Kangwon National University (Samcheok), Kangwon-do 25913, Republic of Korea

Abstract

Applications of machine learning techniques for the prediction of nanotoxicity are expected to reduce time and cost of nanosafety assessments. However, due to the rapid increases in literature data quantity and heterogeneity on nanomaterials, efficient screening of data based on their quality and completeness are becoming more important for the development of reliable nanostructure–activity relationship (nanoSAR) models. Herein, we have curated a nanosafety dataset of metallic NPs, with 2005 rows and 31 columns extracted from literature data mining of 63 published articles and gap filling by adapting data from manufacturer specification or references on the same nanomaterials. By using PChem scores based on physicochemical data quality and completeness, five datasets with different qualities and degrees of completeness were generated and used for the development of toxicity classification models of metallic NPs. Comparisons of these models, built with support vector machine and random forest algorithms, confirmed us that the datasets with higher quality and completeness (i.e., higher PChem score) produced better performing nanoSAR models than those with lower PChem scores. Further analysis of relative attribute importance showed that the physicochemical properties, core size and surface charge, and the experimental conditions of toxicity assays, dose and cell lines, are the four most important attributes to the toxicity of metallic NPs.

Supplementary files

Article information

DOI: https://doi.org/10.1039/C8EN00061A
Article type: Paper
Submitted: 14 Jan 2018
Accepted: 10 Jun 2018
First published: 11 Jun 2018

Download Citation

Environ. Sci.: Nano, 2018,5, 1902-1910

Permissions

Request permissions

Curation of datasets, assessment of their quality and completeness, and nanoSAR classification model development for metallic nanoparticles

T. X. Trinh, M. K. Ha, J. S. Choi, H. G. Byun and T. H. Yoon, Environ. Sci.: Nano, 2018, 5, 1902 DOI: 10.1039/C8EN00061A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Environmental Science: Nano

Curation of datasets, assessment of their quality and completeness, and nanoSAR classification model development for metallic nanoparticles†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Curation of datasets, assessment of their quality and completeness, and nanoSAR classification model development for metallic nanoparticles

Social activity

Search articles by author

Spotlight

Advertisements