How to use the Chrome Inspector for Data Mining
The Chrome inspector is a tool that is found in every Chrome browser and contains many useful features. We will be looking at the “Elements” feature, which allows us to look at the HTML of any page. With this HTML we will be able to see how data is organized and labeled, allowing for easy data extraction once we've covered a little more about data mining.
For now, lets just familiarize ourselves with the inspector and the type of information we will see. To begin, open Chrome and navigate to a profile or product type page. We will be looking at Senator Profiles as an example. We encourage you to follow along since not all HTML will be the same. To open the inspector, right click anywhere within the window, select “inspect” from the menu and soon you’ll see a new panel pop up on the bottom or on the right of your browser.
The top right portion of the new panel is where we will focus our attention. This view is called “Elements” and it may look confusing, but in fact it’s the exact same information seen on the website. The only difference is, this information is represent as HTML instead of web content. To better understand how they are connected select the button that looks like a small arrow inside a box. It is found on the top left of the elements panel. As seen below. This will allow you to hover over items on the web page and see also the same information highlight in the HTML
As you can see I’ve highlight a portion of text in the center of the web page and the inspector is also highlighting where this information is found within the HTML. If you click web element, the highlighting will stay in place giving you the chance to investigate the HTML a little more closly.
It might not be obvious right away, but this single line contains the text from the web page. It is just hidden in an HTML container. Most web page content is always placed in some sort of container. These containers help organize and add additional information to the content. The container is a “div” element. There are many types of HTML elements, also called Tags and you will encounter quite a view as you continue into the world of data mining but for now lets just focus on the content and you will be able to learn more about the HTML in later posts.
To see the actual content, you can click on the div and open it up to reveal the text. This will be the same process regardless of content type. It will display text, images URLs and links. In addition to the content, the inspector also displays classification data know as attributes. This can come in the form of an ID name, a class name and many other kinds. These attributes are descriptive data used to describe what the information could be.
In the example above, the text we selected was actually the name of the Senator and as you can see the div has an attribute “class” and it is labeled “memberName”. There are many examples of these throughout the page. The picture has a class name "memberImage" and a container called "img", which means image. Though, not all attributes of content will be so obvious and this is when the HTML tags can be helpful. In the next post we will learn about different HTML tags and how they can also be great indicators of content when the attributes are not.