An Automated Job is a two-step process of scraping layered web pages. It utilizes a List Recipe and a Detail Recipe. Watch the tutorial video or read the step by step written instructions below.
The List recipe will extract detail page URLs from a search results or list page. These URLs will be uploaded to Data Miner. Once the Job is created, Data Miner will automatically open the URLs one by one in your browser, apply the detail recipe and scrape the detail page data.
This data will be accumulated into a single CSV, which will be saved to the Data Collections folder.
1) Starting from a list page, find the data you want to scrape, extract the information using a public recipe as you did in Lesson 1. This recipe will need to extract the list information as well as the URLs of individual detail pages.
2) Download the results by clicking “download” in the top right corner. You will need to download it as a CSV to perform a Job.
3) Navigate to your data collections folder. To get to the data collections folder and jobs page, just click the collections icon in the nav bar on the left hand side of the Data Miner window.
4) Import the CSV containing the URLs. Click, “Import a csv” and select the CSV file from the first scrape.
5) Once the CSV is uploaded, it’s time to create the Job. To create a Job, click on the Jobs tab from the left hand panel. Begin filling out the necessary fields.
6) Once the Job is saved, it will appear at the top of the Jobs page. Press Run.
7) The first URL will be opened up in a new tab. Data Miner will scrape the information and then move onto the next URL.
8) As the recipe runs, you can check the progress by visiting your Data Collections and then clicking on the output file that you named earlier. If you have scraped all the data that you need, click stop/close on the pop up window or wait till the Job reaches the end of the URLs and it will stop automatically.
9) Once finished, click on Data Collections, select the output file and then download by selecting your file preference Excel(XLS) or CSV.