1. Install
  2. Sign in
  3. Scrape!

Join over 180,000 active users

  • Works on most popular sites
  • See results in seconds
  • Automate your data extraction
  • Live customer support

Writing recipes with Data Miner

Lesson 7: Rows and Columns

Public Recipe Picture
Updated: 12/15/2017 by David
Contact Me

A) Finding Rows:

screenshot - containers

Data Miner works by, first identifying a container that surrounds your data and then extracts elements from within these containers. These containers can be in table form or list form.

These containers are called Row Selectors and are indicated by a red outline in the example on the left. You must first select the rows that surround your data before you can begin selecting your data. These rows will become the rows in your CSV.

B) Finding Data

screenshot - item

Once a container is identified we can begin selecting the data we want to extract. This data is called Col or Columns and is indicated by the blue outline on the left. These Col. will become the columns in your final CSV.

Lesson 8: Creating your own Recipes

Public Recipe Picture
Updated: 12/16/2017 by David
Contact Me

In 5 minutes you'll understand Recipe Creator basics, like hover and shift, selecting rows, selecting data and simple selector writing.

Watch The Row Selection Tutorial

Watch The Column Selection Tutorial

Still have questions?

Contact us: beta@data-miner.io

Or Read Through These Quick Steps

  1. Navigate to the site you want, launch Data Miner, click New Recipe and pick your page type.
  2. List pages require rows and have multiple pages while detail pages only have one page and only require columns.
  3. Starting with a list page, using your mouse hover over the data until a highlighted box encloses all the information you are looking to scrape.
  4. Once the Row is highlighted press shift, then over on the tool select one of the suggest classes to lock in the selection.
  5. At this point you can now being selecting you individual data. Click on the Column tab and select “column 1”.
  6. Give the column a name, hover over the data you wish to extract in this column and press shift.
  7. Pick the class that highlights the data the best. (Helpful tip - use the "Select Parent" button for more options if the data isn’t selecting correctly.)
  8. Once the data is highlighted correctly, lock in the selector by clicking confirm at the bottom of Recipe Creator and click the Eyeball check your work.
  9. Continue creating by clicking "Add new Column".
  10. Once you have all your columns done, finish by clicking the Save tab at the top. Give the recipe a name, click save and then run to scrape.

How to use Recipe Creator Advanced Selectors and Actions:

In this section you'll understand how to do more advanced selectors and use Recipe Creator Actions like, button click and auto scroll.

Watch The Nav and Actions Tutorial

Watch The Find Tool and Selectors Tutorial

Still have questions?

Contact us: support@data-miner.io

How to click a button:

*Button click can be applied at any time during recipe creation.

  1. With Recipe Creator open to the Actions tab, click Find at the top, hover and shift over the button.
  2. Select the button's most appropriate class and click confirm.
  3. Copy the selector to the Button Action input box
  4. Click "Test Click"
  5. Click "Add" if working properly
  6. Once test reveals the content, hover and shift the content from the Columns tab as normal.
  7. Save, close and scrape :)

How to Auto Scroll:

*Auto scroll can be applied at any time during recipe creation.

  1. With Recipe Creator open to the Actions Tab
  2. Scroll down to the Scroll to Page End action
  3. Click "Add".
  4. Save, close and scrape :)

How to Infinite Scroll With Click:

*Infinite Scroll With Click can be applied at any time during recipe creation.

  1. With Recipe Creator open to the Actions tab, click Find at the top, hover and shift over the show more button.
  2. Select the button's most appropriate class.
  3. Copy the selector to the Infinite Scroll input box
  4. Give it a click amount - this will be how many pages it loads.
  5. Click "Test Click"
  6. Save, close and scrape :)

Recipe Creator Selector Tricks, Tips and Scenarios:

Basic Selectors

Selector Type Example Meaning
tag h1 h tags are typically headers and display important information. h tags range from h1 to h6
tag p p tags are used for basic text and paragraphs.
tag a a tags are used for links. Links are important for pagination and getting URLs from search pages.
tag img img tags are used for images. Recipe Creator must have the "Extract Data" drop down menu selected to "Image URL" for the data to scrape successfully.
class .address Classes are suggested by Recipe Creator when available and are used to select an element. They are indicated by a dot in the jquery selector.
id #email ids are similar to classes, but are more specific. They are indicated by a number sign.
tag div div tags are typically containers. They can be used for styling, displaying generic information or organizing info.
tag span span tags are typically used for simple text or icons.
tag strong strong tags are used for styling and contains text.

Advanced Selectors and Selector Combos

Selectors Example Meaning
(Space) .industry strong Separate selectors by a space to travel down elements.
, h2, p Commas combine elements into one column. So the h2 tag data and the p tag data will be in the same column.
~ span~ Selects all the next elements as long as they all have the same parent or container.
+ span+ Selects the next one element as long as they have the same parent tag or container.
:contains(" ") div:contains("Email") Recipe Creator will find the word inside quotations on for any div tag the page and select it.
:first h2:first Recipe Creator will always find the first h2 tag on the page and select it.
:last h2:last Recipe Creator will always find the last h2 tag on the page and select it.
:eq() h2:eq(2) Recipe Creator will always find the second h2 tag on the page and select it. Recipe Creator can find any number as long as there are that many on the page. (Use eq() only when numbering will be consistent between pages)
:has(" ") div:has("p.address") Recipe Creator will find the all the divs that have the p tag and the address class. Combine with :eq() to specify which div.
[selector="value"] [itemdrop="address"] When classes or tags aren't available but sites have alternative attributes in the HTML. Use them inside square brackets for your selectors.
class tag .address span When the parent container has the class, but you need data inside a child element. Select the suggested class of the parent and then type the tag containing the data with a space in-between,

Lesson 9: Javascript Snippets

Public Recipe Picture
Updated: 9/4/2017 by Ben
Contact Me
premium Please Note: Javascript Editing is limited to data miner paid plans.

A) Clean up data after you scrape with Javascript:

js cleanup Example of Data clean up Script:

var cleanup = function(results) {
  // loop through each row of results and change each column

  //debugger;

  $.each(results, function(){
    this.values[0] = "xxxx -" + this.values[0];
    this.values[1] = this.values[1] + "- yyyyy";
  });

  return results; // return modified results
};
                
Using Javascript, you can clean up your scraped results and do more sophisticated data extraction than is possible with just xPath. Data Miner will pass the scraped data to a javascript function that you provide. Then you can modify the data and pass it back to Data Miner for saving into your data collection.

With custom Javascript you can:

  • Extract Email addresses from text
  • Remove unwanted text from scraped data
  • Change currency type, change units.
  • Separate or join column data

B) Click on elements before scraping

js hooks
You can provide your own function in Javascript that DataMiner will run before it scrapes the data. Pre and Post scraping hooks give you an incredible power to do any work before or after scraping is performed.


Examples of how Pre and Post hooks can help you:

  • With Pre-hook, you can wait for an element to be present on the page before starting the scrape process.
  • With Pre-hook, you can fill a from and submit before scraping the page.
  • With Pre-hook, you can click on an item on the page or do AJAX calls.
  • With Post-hook, you can clean up your data. Or Click on a button.

C)Filling forms with Javascript:

Scrape Search Page
var workflow = {
    paginationType: "ajax",

    fillForm: function(context, resolve) {
        console.log("starting POST hook");

        if (!context.inputData)
            context.inputData = {
                name: "pizza",
            };

        return [{
            type: "text",
            selector: "input[name$='find_desc']",
            value: context.inputData.name,
            waitAfter: 1
        }, {
            type: "button",
            selector: ".main-search_submit",
            done: function() {
                resolve();
            }
        }];
    }
};             

With Data Miner you can automatically fill forms by uploading a CSV into your Collections and using a form filling recipe. To create a form filling recipe you must include the Javascript snippet and updated the selectors to the right attributes for you site. In addition, make sure the CSV column titles match exactly to the key titles (For example, the first name Allen has a key of "first"). Once the recipe is complete, run a job with the CSV as the source collection and your new form filling recipe as the recipe.




See even more examples of Javascript hooks blow:


/* --------------------------------------------------------------------------------------------------------------------

Here is an example of pre-scrape hook. In this example an element is found using the jquery slector ".tsd_name > a".
Then the element is clicked. Then we wait for 2 seconds for the page to change and then we tell Data Miner to continue
to scrape the page.

*/

var workflow = {

    "preScrape": function(request, callBack) {
        console.log("starting Pre-scrape hook");

        var $el = $(".tsd_name > a"); // Element to click on.
        var waitTime = 2; // Wait for n seconds and then continue to scrape the page

        if ($el.length > 0) {
            $el[0].click()
        }

        setTimeout(function() {
            callBack();
        }, waitTime * 1000);
    }
}


/* --------------------------------------------------------------------------------------------------------------------

Here is another example of pre-scrape hook. In this example the pre-scrape hook will wait for 5 seconds until
the element specified by jquery selector #footer appears on the page. In a loop we test for the presence of #footer and
if not present we wait for 1 second and repeat the loop. Once the element is found we call CallBack which transfers
the control back to Data Miner to continue to scrape.

*/

var workflow = {

    /* --------------------------------------------------------------
    preScrape function:
        Will be executed before any scraping is done. Must callBack to give the execution control back to Data Miner

    Input:
        request: Context for the request. URL, scraping, parameter etc.
        callBack: callback function to be called when all the pre-scraping work is done.
    Return:
        nothing

    ------------------------------------------------------------- */
    "preScrape": function(request, callBack) {
        console.log("starting Pre-scrape hook");
        //debugger;

        var condition = "#footer"; // Wait for presence of this element before scraping
        loopCounterMax = 5; // Maximum number seconds to wait before giving up

        var wait = function() {
            var $test = $(condition);

            if ($test.length > 0 || loopCounter > loopCounterMax) {
                if (callBack)
                    callBack();    // Must be called at the end when all the PreScrape work is done

            } else {
                loopCounter++;
                setTimeout(wait, 1000);
            }
        };

        loopCounter = 0;
        wait();
    },
}


/* --------------------------------------------------------------------------------------------------------------------

Here is another example of post-scrape hook. In this example you are given the data that was scraped from the page in
form on an array. Then you can modify the result and return the array back to Data Miner.

*/

var workflow = {

    /* --------------------------------------------------------------
    postScrape function:
        Will be executed after the scraping is finished. You will get the scraped results and can
        clean up or modify them

    Input:
        results: Scraped data array
    Return:
        results: Modified data array

    ------------------------------------------------------------- */
    "postScrape": function(results) {
        console.log("starting Post-scrape hook");

      // loop through each row of results and change each column

      //debugger;

      $.each(results, function(){
        this.values[0] = "xxxx -" + this.values[0];
        this.values[1] = this.values[1] + "- yyyyy";
      });

      return results; // return modified results

    },


/* --------------------------------------------------------------------------------------------------------------------

Here is an example of scrape hook. You can simply replace the scrape functionality of Data Miner by providing you own
scrape function which will be call instead of the scrape function of Data Miner.

*/

var workflow = {

    /* --------------------------------------------------------------
    scrape function:
        Will be executed instead of the default [originalScrape] scrape function of Data Miner.
        The xpaths in the data Miner UI will be ignored. However the number of columns of data returned
        must match the number of columns specified the the UI.

    Input:
        request: Context for the request. URL, scraping, parameter etc.
        originalScrape: the default scrape function of Data Miner.
        callBack: callback function to return the results.
    Return:
        results: Modified data array

    ------------------------------------------------------------- */
    "scrape": function(request, originalScrape, callBack) {
        console.log("starting scrape hook");

        var result=[];
        result.push({
            "values": [
                "1234", "1234"
            ]
        });

        callBack(results);
    }
};

 /* --------------------------------------------------------------
For Splitting names(Splits by a space)
Use cleanup
------------------------------------------------------------- */
var cleanup = function(results) {
	//debugger;
	$.each(results, function() {
		var x = this.values[2].indexOf(" ");
		this.values[2] = this.values[2].substring(0, x);
		this.values[3] = this.values[3].substring(x, this.values[3].length);
	});
	return results; // return modified results
};

 /* --------------------------------------------------------------
Split names by space and are in “Last, First” format, also removes comma,
--------------------------------------------------------------*/
	var cleanup = function(results) {
		//debugger;
		$.each(results, function() {
			var x = this.values[1].indexOf(" ");
			var y = this.values[1].indexOf(",");
			this.values[1] = this.values[1].substring(x, this.values[2].length);
			this.values[2] = this.values[2].substring(0, y);
			console.log(x);
		});
		return results; // return modified results
	};

 /* --------------------------------------------------------------
Replace any non alphanumeric character with a“ - “
--------------------------------------------------------------*/
	var cleanup = function(results) {
		//debugger;
		$.each(results, function() {
			this.values[1] = this.values[1].replace(/[^a-z0-9()_]/gi, '-');
		});
		return results; // return modified results
	};
 /* --------------------------------------------------------------
Click a Button
--------------------------------------------------------------*/
var workflow = {
	"preScrape": function(request, callBack) {
		console.log("starting Pre-scrape hook");
		var condition = "a[class~='xxxx']";
		var $test = $(condition);
		if ($test.length > 0) {
			$test[0].click();
			var wait = function() {
				callBack();
			};
			setTimeout(wait, 3000);
		} else callBack();
		return results;
	}
};

/* --------------------------------------------------------------
Button click and close
--------------------------------------------------------------*/
var workflow = {
	"preScrape": function(request, callBack) {
		console.log("starting Pre-scrape hook");
		//debugger;
		var condition = "button[data-lira-action~='edit-contact-info'"; // Wait for presence of this element before scraping
		//debugger
		var $test = $(condition);
		if ($test.length > 0) {
			$test[0].click();
			var wait = function() {
				callBack();
			};
			setTimeout(wait, 3000);
		} else callBack();
	},
	"postScrape": function(results) {
		console.log("starting Post-scrape hook");
		var $close = $(".dialog-close");
		if ($close.length > 0) {
			$close[0].click();
		}
		return results;
	}
};

/* --------------------------------------------------------------
Filter Data Miner results
 --------------------------------------------------------------*/
var workflow = {
	"postScrape": function(results) {
		console.log("starting Post-scrape hook");
		// loop through each row of results and change each column
		//debugger;

		var results2 = [];
		$.each(results, function() {
			if (this.values[2] !== "banana")   // filter column 2 values and exclude "banana"
                results2.push(this);
		});
		return results2; // return modified results
	}
};


/* --------------------------------------------------------------
Auto Scrolling with an interval and a max(twitter)
 --------------------------------------------------------------*/
var workflow = {
	"preScrape": function(request, callBack) {
		console.log("starting Pre-scrape hook");
		//debugger;
		var waitTime = 3000; // milliseconds
		var maxLoopCount = 50;
		var count = 0;
		var loopCount = 0;

		function loop() {
			loopCount++;
			if ($("li[class~='stream-item']").length !== count && loopCount < maxLoopCount) {
				window.scrollTo(0, document.body.scrollHeight);
				count = $("li[class~='stream-item']").length;
			} else if (callBack) callBack();
		}
		var tid = setInterval(loop, waitTime);
	}
};


/* --------------------------------------------------------------
Isolate data by Index
 --------------------------------------------------------------*/
var cleanup = function(results) {
	//debugger;
	$.each(results, function() {
		this.values[3] = this.values[3].substring(0, 13);
		this.values[4] = this.values[4].substring(14, 30);
	});
	return results; // return modified results
};


/* --------------------------------------------------------------
Using Form filling with drop down menus. The following javascript will click to open a form,
click a drop down menu, select an item from within the list and then click submit.

A CVS with a column titled "location" and then data below it 0 though the number of elements in the drop down menu
can be injected into a selector allowing you to select different items in a drop down and search them, when injecteing
basic text doesn't work.
 --------------------------------------------------------------*/
var workflow = {
    paginationType: "ajax",

    fillForm: function(context, resolve) {
        console.log("starting POST hook");

        if (!context.inputData)
            context.inputData = {
                location: "0", //starting from 0, the location is where the item lives within the list in the drop down.

            };

        return [{
            type: "button",
            selector: "a[class~='XXXX']", //open button selector
            waitAfter: 2
        },{
            type: "button",
            selector: "*[class~='XXXX']", //form button selector
            waitAfter: 2
        },{
            type: "button",
            selector: "*[id~='XXXX" + context.inputData.location +"']", //inputData.location is the number defined above or
                    injected from the CSV and then added to the drop down item selector.
            waitAfter: 2
        },{
            type: "button",
            selector: "button[name~='skipandexplore']", //submit button selector
            done: function() {
                resolve();
            }
        }];
    }
};

Get Beta version of Data Miner

Preview New Features

  • Bug fixes
  • Run your own custom Javascript code

Note: We recommend that you run both Production version and Beta versions Data Miner side by side. If you find a blocking issue in the beta version. Each version runs independently and they don't interfere with each other.

Download latest Beta version.