PyTorch Scikit-learn .cursorrules Prompt File
About .cursorrules prompt file
What you can build
Chemistry-focused ML Toolkit: Develop a comprehensive software toolkit that provides easy-to-use APIs for building, training, and evaluating machine learning models for chemistry applications. Integration with scikit-learn and PyTorch, and support for chemical representations such as SMILES and molecular fingerprints.
Automated Hyperparameter Optimization Platform: Create a service that automates the hyperparameter tuning for chemistry-related machine learning models using Bayesian optimization or grid search to help researchers achieve better performance with less manual effort.
Drug Discovery Platform: Design an end-to-end solution for drug discovery that leverages deep learning models like graph neural networks. Include modules for data preprocessing, scaffold splits for cross-validation, and model interpretability using tools like SHAP.
Chemical Data Augmentation Service: Offer a tool that applies pre-defined and custom augmentation strategies specifically for chemical structures to help models generalize better, especially with limited datasets.
Interactive Chemistry Model Interpretation Tool: Develop a web application that allows users to visualize and interpret machine learning predictions on chemical datasets, using SHAP values and integrated gradients.
Performance Optimization Suite for Chemistry ML: Provide a tool for optimizing the performance of machine learning pipelines involving chemical data, with features like profiling, efficient data structure use, and support for GPU acceleration.
Machine Learning Reproducibility Platform: Deploy a cloud-based system integrated with tools like MLflow or Weights & Biases for tracking experiments and ensuring reproducibility in chemistry-related machine learning research.
Unit Testing Framework for Chemistry ML Pipelines: Build a dedicated testing framework that includes pre-defined test cases for chemical data processing functions and custom model components, ensuring robustness and reliability.
Visual Chemistry Property Prediction Tool: Create an application that predicts and visualizes molecular properties using learned models, with capabilities to draw chemical structures using RDKit utilities.
AI-based Chemical Structure Converter: Develop a service that uses machine learning to convert between different chemical representations, such as from SMILES to molecular graphs, with high accuracy.
QSAR Model Validation Suite: Offer a dedicated platform for validating QSAR models using specific protocols like time-split validation, along with automated statistical and hypothesis tests for model evaluation.
Benefits
- Streamlined usage of scikit-learn for traditional ML and PyTorch for GPU-accelerated deep learning, targeting chemical data.
- Emphasis on chemistry-specific model evaluation, employing scaffold splits and metrics like enrichment factor.
- Comprehensive integration of best practices in code reproducibility with tools like Git, MLflow, and setting random seeds.
Synopsis
Chemistry-focused data scientists can build robust, scalable machine learning models for chemical analysis, leveraging scikit-learn, and PyTorch with efficient data handling, processing, and evaluation pipelines.
Overview of .cursorrules prompt
The .cursorrules file provides a detailed guideline for developing machine learning models focused on chemistry applications using Python. It outlines key principles including writing clear, technical responses with examples, ensuring code readability, and implementing efficient data processing pipelines. It specifies the usage of scikit-learn for traditional ML algorithms and PyTorch for deep learning, with appropriate libraries like RDKit and OpenBabel for chemical data handling. The file explains model development strategies such as hyperparameter tuning, ensemble methods, and cross-validation tailored for chemical data. It addresses deep learning with PyTorch, emphasizing neural network design and performance optimization. Key aspects of model evaluation, interpretability, and reproducibility are covered, along with guidelines for project structure, testing, and documentation. Dependencies and conventions for coding style, variable naming, and comments are outlined. Additionally, it includes notes on integrating ML models with a Flask backend for frontend consumption and the potential use of asynchronous processing for lengthy tasks.