pandas numpy Jupyter .cursorrules prompt file
About .cursorrules prompt file
What you can build
Jupyter Data Analysis Assistant: This tool would provide users with intelligent recommendations for data analysis workflows within Jupyter Notebooks. By leveraging your expertise in pandas and numpy, it could suggest optimal data transformation and visualization strategies, ensuring best practices are followed for data manipulation and visualization.
Medical Imaging Visualization Platform: A web application specialized in visualizing and analyzing medical imaging data using matplotlib, nibabel, and SimpleITK. Users could upload medical images for visualization and perform advanced image processing tasks with straightforward user interfaces, enhancing understanding and insights.
Automated Data Cleaning and Validation: This app would automate the process of data quality checks and cleaning at the early stages of data analysis, using pandas and numpy to handle missing data and validate data types. It would implement best practices for error handling and logging, making the initial stages of data assessment more efficient.
Image Processing Workflow Optimizer: A service that profiles and optimizes image processing pipelines, particularly for large medical imaging datasets. Utilizing libraries like SimpleITK, totalsegmentator, and numpy, it would enhance performance by identifying bottlenecks and recommending vectorized operations and efficient data structures.
Interactive Data Exploration Tool: An interactive GUI-based application for performing exploratory data analysis with pandas and seaborn. This tool would prioritize readability and reproducibility, allowing users to interactively group, aggregate, and visualize data using best-in-class libraries and recommended practices.
Reproducible Data Science Notebook Templates: A repository containing well-structured Jupyter Notebook templates aimed at ensuring reproducibility and clarity in data analysis workflows. These templates would be pre-configured with markdown sections, inline plotting with matplotlib, and common data processing tasks using pandas and numpy.
Color-Blind Accessible Plot Generator: An application that aids users in creating color-blind friendly plots and visualizations using matplotlib and seaborn. It would provide options to select appropriate color schemes and adjust existing plots to meet accessibility standards, ensuring inclusive data communication.
Progress Tracking for Data Processing: A lightweight library integrated with tqdm.notebook, designed to provide real-time progress tracking for data manipulation and image processing tasks in Jupyter Notebooks. This would enhance the user experience by giving visibility into long-running operations.
Machine Learning and Data Analysis Integration Hub: A platform that combines machine learning capabilities with data analysis workflows, offering seamless integration with scikit-learn for model training and validation on pre-processed data, guided by the best pandas and numpy practices.
Advanced Data Imputation and Handling Service: A specialized tool for dealing with missing data through innovative imputation methods. It would guide users towards optimal imputation strategies, leveraging exploratory analysis insights and recommending solutions based on data type and distribution.
Benefits
- Emphasis on reproducibility and readability using concise Python examples and clear documentation in Jupyter notebooks.
- Strong focus on medical imaging data processing with specialized libraries like nibabel, SimpleITK, and totalsegmentator.
- Integration of visualization best practices to create accessible and informative plots that accommodate statistical and medical imaging data.
Synopsis
Data scientists and researchers can leverage this prompt to build comprehensive, efficient, and reproducible data analysis and visualization workflows in Python-focused Jupyter Notebooks.
Overview of .cursorrules prompt
The .cursorrules file provides guidelines for conducting data analysis, visualization, and Jupyter Notebook development using a variety of Python libraries. It emphasizes concise, technical responses with accurate Python examples and promotes best practices such as readability, reproducibility, and adherence to PEP 8 code style. The file outlines the usage of libraries like pandas, numpy, and SimpleITK for data manipulation and medical imaging data handling, and recommends leveraging matplotlib and seaborn for visualizations. It also advises on Jupyter Notebook structuring for clarity and reproducibility, error handling, data validation, and performance optimization strategies. Additionally, it lists the necessary dependencies and encourages documenting data sources and methodologies while using version control for managing scripts and notebooks.