1. Install
  2. Sign in
  3. Scrape!

Join 70,000 users now!

  • Works on most popular sites
  • See results in seconds
  • Automate your data extraction
  • Live customer support

Multi Layer Scraping and Automation with Data Miner

Lesson 4: Finding the Right Recipe

Public Recipes Picture
Updated: 11/20/2016 by Ben
Contact Me
Key Points

  • Multi layered sites have a search and a details page.
  • Recipes are made for either a search page or a detail page.
  • Recipes only work on the web page they were designed for.
  • Recipes names including a “$” have been tested by Data Miner.

The first step to a successful scrape and picking the right recipe is understanding the two page styles that Data Miner looks for. A search page and a details page.

Search Page:

Scrape Search Page

A search page typically has a search bar and as you search, the page will show you a list of items found.

Each item will have a URL to a detail page.

Search pages will usually have multiple pages and a need for pagination, which is how you go to the next page.

Detail Page:

Scrape Detail Page

A detail page displays a single item from the Search page.

The detail page typically has a larger image, more detailed information and contact info.

Detail pages typically do not have more than one page and wont require pagination.

Data Miner Pop Up:

Data Miner Pop Up

Some recipes are first tested by Data Miner and are indicated by a “ $ ”at the beginning of the recipe name.

Recipe type is most important when running a job. A job requires a URL to access the deeper levels of data. These urls can only be extracted from Search pages with Search page recipes. To learn about Jobs and multi level scraping continue to lesson 5.

Lesson 5: Multi Layered Scrapes with Jobs

Public Recipes Picture
Updated: 11/20/2016 by David
Contact Me
Key Points

  • A job is a two-step process of scraping multiple layered web pages.
  • A job combines a search and a detail page.
  • Jobs scrape detail pages using URL’s from Search Page results.
  • A job will open the URL, scrape it and then move onto the next URL automatically.

An automated job is when Data Miner uses the results of a search recipe to automatically go to each detail page and scrape that page using the detail page recipe.

Scrape Search Page

1) Starting from a search page, find the data you want to scrape, Extract the information using a public recipe as you did in Lesson 1.This will extact the list information as well as the URLs of individual detail pages.

Download and Upload

2) Download the results by clicking “download” in the bottom right corner.It will download as a CSV.

3) Navigate to your data collections folder. To get to the data collections folder and jobs page, just click “collections” in the bottom right corner of the the data miner window.

4)Upload the CSV containing the URLs. Click, “upload a csv” and select the CSV file from the first scrape.

5)Once the CSV is upload, it’s time to creat the job. To create a Job, click on the Jobs tab from the left hand panel. Begin filling out the nessesary feilds

  1. Job Name - Name the job what the scrape will accomplish.
  2. Recipe name - will be the Detail Page recipe. The recipe used to scrape detailed information for every individual URL.
  3. Source Collection - Select the CSV containing the URLs that you uploaded to Data Collections.
  4. Column # for URL - The column number where your URL is located in the CSV file. In this example and in most cases it is 1.
  5. Designation Collection - The final file output, name accordingly.
  6. Once all the feilds are filled out, press Save
Build the Job
Download and Upload Download and Upload

6) Once the Job is saved, it will appear at the top of the Jobs page. Press Run.

7) The first URL will be opened up in a new window. Data Miner will scrape the information and then close the window and move into the next URL.

8) As the recipe runs you can check the progress by clicking data collections data miner tab and then clicking on the output file that you name earlier. If you have scraped all the data you need, click stop/close on the pop up window or wait till the the job reaches the end of the URLs and it will stop automatically.

9) Once finished, click on Data Collections, select the output file and then download by selecting your file preference Excel(XLS) or CSV.

Lesson 6: Advanced Job Settings

Public Recipe Picture
Updated: 12/2/2016 by Ben
Contact Me
Key Points

  • Jobs can paginate from within the Detailed Pages.
  • You can increase wait time between scrapes to prevent loading issues.

Jobs have many advanced tools to assist with your scraping needs. We will cover the 2 most useful ones in this section. We would recommend exploring the rest on your own to take full advantage of jobs.

Data Miner Advanced Options

To Find the tools - from the jobs page, simply click on Advanced Options at the bottom of the page.

Wait Time

Wait Time Between Scrapes, is a setting to use when troubleshooting. Data Miner is a tool that runs only on your computer, which means each scrape is unique to your own Internet speed.

For that reason, if the Internet connection is running slow and Data Miner can’t scrape before the page loads, it will return a failure. To prevent this, simply increase the wait time to 60 or 90 seconds and press “Save”. By default, it’s set to 15 seconds.

Please note: never decrease the wait time below 15 seconds, which could cause the job to fail.

Job Pagination

Job Pagination is a tool used to increase the scraping capability. There will be times when a Detail Page has multiple pages and you would like to scrape them. That is when this tool is useful.

All you have to do is check “Fallow Paginated Page Results” and then enter the max amount of pages you expect each detail to page to have.

Please note: The detail page recipe must have pagination capability. If it does not and you’re not sure how to add it. Continue onto Lesson 7-9!