News and Views from a Mass Spectrometry Lab in Irapuato, Mexico

Use artificial intelligence!

Search engines, shopping platforms and social media networks make great use of algorithms for identifying patterns in massive data sets, and help us to find relevant information, products and friends. In stark contrast, many scientists 'do not trust' artificial intelligence. Of course, hypothesis-driven research, according to Popper's Scientific Method (Popper, Karl R. 1959. The Logic of Scientific Discovery. Oxford, UK: Basic Books.), is still the gold standard. But how should we deal with data from exploratory projects such as genomics, proteomics, metabolomics, etc.?
Data Mining methods combine general statistics with machine learning and help us to detect important variables and associations. Further, we can build predictive models for classification or quantification. For mass spectrometry data sets, we found notably the Random Forest Tree algorithm useful since it performs well with noisy data and relatively few samples. Thus, next time you analyse a few thousand variables from a dozen samples, you should try Rattle, free Data Mining software available from https://rattle.togaware.com/. This R package implements various algorithms such as Decision Tree, Random Forest Tree, Ada Boost, Support Vector Machine and Neuronal Networks. The Graphical User Interface of Rattle is human-friendly and also suitable for beginners (BTW: Graham Williams, the author of Rattle, works at the Australian Taxation Office).
Soon (~March 2020) we will publish our RSC book "Processing Metabolomics and Proteomics Data with Open Software: A Practical Guide" (http://pubs.rsc.org/bookshop/collections/series?issn=2045-7545). The co-authors Miguel Reboiro-Jato, Daniel Glez-­Peña and Hugo López-­Fernández contributed a chapter about "Statistics, Data Mining and Modeling", demonstrating with code examples various advanced strategies for data processing, such as self-organising maps (artificial neural networks), biomarker discovery and predictive machine learning models.
Thus, enter the next level of Omics data analysis and use artificial intelligence!

1 Comment

Linear

  • Jens Riedel  
    So true, basically I couldn´t agree more! Especially I like Roberts emphasize on "openness". Always keep in mind that science was the first shared economy. It´s mere existence is the logic consequence of the desire to share insights with others and to make them build on the foundation of others. Imagine where AI could get us without proprietary file formats and library access costs...

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA