CSS Custom Cursor .cursorrules prompt file
About .cursorrules prompt file
What you can build
Web Scraper Template Generator: Develop a tool that generates Python web scraping templates based on user input. It should allow users to specify the target website, data to extract, and the desired format, then generate a modular script using requests, BeautifulSoup, and aiohttp with best practices in error handling and performance optimization.
Async Scraping Scheduler: Create an application that schedules asynchronous web scraping tasks using aiohttp to handle data extraction from multiple websites concurrently. It should allow users to configure scraping frequency, retries with backoff, and respect for rate limits.
Error-Resilient Scraping Framework: Build a Python framework that incorporates robust error handling and logging mechanisms for web scraping. The framework should automatically manage invalid URLs, timeouts, and missing data elements, using a logging system to track issues and suggested fixes.
Scraping Dashboard with Monitoring: Develop a web application that provides a live dashboard for monitoring the status and performance of web scraping operations. It should visualize metrics like request rates, errors, and data volumes, and allow users to manage and configure individual scraping tasks.
Personalized Web Scraping Service: Offer a subscription-based service where users can define specific scraping tasks, and the service automatically performs these tasks asynchronously, caching results to minimize redundant data requests, and delivering the data in a user-friendly format.
Modular Data Extraction Tool: Create a Python package providing a suite of utility functions designed for modular data extraction, such as functions for extracting links, text, and images. It should integrate seamlessly with BeautifulSoup and lxml for efficient data parsing.
Web Scraping Learning Platform: Construct an educational platform that teaches users how to write modular, efficient web scraping scripts using Python. The platform should provide interactive tutorials covering libraries like requests, BeautifulSoup, aiohttp, and lxml, with real-world examples and exercises.
Dynamic Content Scraper: Develop a tool that can scrape dynamically loaded content with asynchronous requests, using aiohttp to effectively handle JavaScript-heavy websites. The tool should include a feature for simulating user interactions needed to load content.
Automated Rate Limiter for Web Scraping: Create a utility that automatically respects websites' robots.txt files and implements rate limits and request throttling strategies to avoid being banned while scraping.
Cached Web Scraping API: Offer an API service that performs web scraping tasks with built-in caching to provide fast response times for repeated requests. The API should allow users to specify the URL and the data they wish to extract, returning the cleaned and parsed data.
Benefits
- Emphasizes modular, functional programming with pure functions, avoiding classes for scraping logic, enhancing code reusability and readability.
- Encourages error handling through early returns and logging, improving robustness by managing common errors like invalid URLs and timeouts.
- Advocates for performance optimizations using asynchronous libraries and caching strategies, ensuring high efficiency and scalability in web scraping tasks.
Synopsis
Developers building Python web scrapers will benefit from this prompt by creating efficient, scalable, and maintainable scraping scripts adhering to best practices and modular principles.
Overview of .cursorrules prompt
The .cursorrules file provides guidelines and best practices for Python web scraping, emphasizing modular, concise, and efficient code design. It covers key principles such as using descriptive variable names, organizing scripts, and favoring functional programming. It suggests using specific libraries like requests for HTTP requests and BeautifulSoup for HTML parsing. The file also highlights error handling strategies, dependencies, and performance optimization techniques, including the use of asynchronous libraries like aiohttp for improved performance. Additionally, it emphasizes key conventions for creating scalable and maintainable web scraping scripts.