The most important step of any data-driven project is obtaining quality data. Without these preprocessing steps, the results of a project can easily be biased or completely misunderstood. Here, we will focus on cleaning data that is composed of scraped web pages. There are many tools to scrape the web.
Link: HTML Data Cleaning in Python for NLP