Web scraping: The unsung hero of content marketing

Looking out for ideas to generate content is like going on a blind date. You don’t know who the person is or how the date would go, but if you are lucky it may turn out well.

Similarly, when you are online and staring at a blank document for hours and feel like everything has been discussed already by all means and authorities, what do you do?

There’s nothing more infuriating than being in the zone and not able to jot down topics. Now, the importance of condensing a good blog post cannot be stressed enough and what is the very foundation of getting that out?

Conceiving an idea that is unique, engaging, and already successful.

No one wants to create posts that just adds weight to the internet indexes. By doing so, the brand becomes catastrophic and vulnerable to losing reputation.

How do we quickly generate good blog post ideas without having to waste time by scouring millions of sources?

Scraping can cannibalize your thirst to create excellent ideas.

Let’s see how.

When you begin your research on what to write about, you perform a google search. Let’s say your keyword is content marketing tools.

 

 

Do you see that? The results on google for a generic keyword can’t get specific to what you are looking for. It will throw results showing tools and software, paid ads, a few blog posts that actually would benefit you, some related searches and other potential results spanning across several search result pages.

 

 

That’s how Google works. That, in fact, is how online search works. Your keyword brings results spanning across various criteria and devising the most fitting idea to fuel your content vehicle becomes a tedious process.

 

 

Why though?

This is because you would ideally click on to every link and may have to move between search result pages to filter out the ideas based on various touchstones like popularity, influencer engagement, social media performance, backlink strengths, and a lot more. Next, you would copy paste these into a spreadsheet to get an overview of what is being discussed and what is likely to be picked for your post and brainstorm about how to give a twist to something that has been well perceived.

Just think of all the time and effort that goes into picking the right idea to create your masterpieces. By the time you sit in front of the docs, you’d be drained and will only be staring at the blinking cursor.

Great ideas that fuel your content are the foundation of all your content marketing efforts. In fact, every effort ties back to the origin of a document full of texts. So, you have to be accurate as well as quick in generating them right?

So, the fool-proof method for this is the web scraping.

Why web scraping over manual research?

We all get to the google search bar as a reflex to a question in our minds. Of course, Google helps us with all the information we need, instantly. But how many of us really penetrate into the analytics, go past the first results page? Do we even consider the performance of the content beyond passing the Google algorithm?

There are so many aspects to the success of a blog post beyond scanning through the search results. This is where manual researches should take a pause, at least for content creation. Scraping should get the focus, instead. Scraping does not only confine you to the google search results but gives you tremendous insights – social media engagement, backlinks, shares, influencer’s interest etc.

Scraping as a technique combined with intelligent engines is how some of the popular products like buzzsumo, ahrefs’ content gap tool, answer the public and various other tools work. Scraping has been the backbone of these tools and various other data aggregation or curation tools as well. But, more often than not, people spend a lot of their marketing budget on these tools that charge such extravagant prices just for the layer of branding.

Most of them are not aware of the science of scraping. But more on that later in this post. Now, let’s get back to the draining process of manual researches.

Even if you do not need these insights, you must definitely get past the first result page and scour through all the results to eliminate data like paid ads and other irrelevant generic results that are thrown as a result of a raw search.

Some pointers on web scraping vs manual research:

  • Going around researching for blog ideas across sources like search results, community forums, twitter updates for ideas takes a lot of time and effort.

Scrape instead: Scraping aggregates this information from around the web in one go.

  • Your research can help you run high-level analytics but you will not be able to drill down to the details and that is where real insights are hidden.

Scrape instead: Scraping gets all your data on a spreadsheet across multiple data points making comparison an absolute fun thing to do.

  • A simple search often misses out on the big picture. Some potential articles get unnoticed only because they are  probably on the second or further search result pages

Scrape instead: Google first page is kind of over-hyped. There’s so much noise including the paid ads, irrelevant results to the raw search terms. Users do not (at least a majority) get past the first page. But, scraping will identify the potential articles for you that may be hidden anywhere on the web.

I am sure you are convinced that scraping has an edge and are probably happy that we uncovered that for you.

Thank us later, now read on to understand how exactly this works.

Bonus: There’s a free template give away to help you curate potential articles from the web. We will be discussing how to use it, right away…

FREE TEMPLATE GIVE AWAY – HOW TO GO ABOUT IT?

So as you read ahead, you will find a code that will help you to quickly curate potential blog ideas in a snap without having to spend much time or a lot of bucks.

There’s a simple set-up procedure before you can go ahead with the process. The code here below is a python code and you may want to do a quick installation procedure to run your output. Head over here to complete your installation procedure. You can also refer to this article which is a step-by-step guide to install python version 2.7. When it is all up and running, use the code here below:

<<Template code>>

import requests
import re
import sys
import HTMLParser

reload(sys)  
sys.setdefaultencoding('utf8')

def replaces(data):
    data=re.sub(r'<[^>]*?>',r'',data,re.M|re.I|re.S)
    data=re.sub(r'\s+',r' ',data,re.M|re.I|re.S)
    data=HTMLParser.HTMLParser().unescape(data)
    return data

#Session Handling
sess = requests.Session()
sess.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0'

#Parameters to get results (Input Keyword)
searchKeyword="big data"

#Process begins here
url = "https://www.google.com/search?num=100&hl=en&cr=&q="+searchKeyword+"&start=0"
res = sess.get(url)
data = res.content

#Storing the cached content to a file
with open ("Cont.html", "w") as ch:
    ch.write("%s"%data)

#Data Extraction using Regular Expression
gtrResultRegex=re.compile(r'<div\s*class\=\"rc\"><div\s*class\=\"r\">\s*<a\s*href\=\"([^\"]*?)\"[^>]*?>\s*[\w\W]*?<div\s*class\=\"s\">\s*([\w\W]*?)\s*<\/div><\/div><\/div>',re.M|re.I|re.S)
gtrResultList=gtrResultRegex.findall(str(data))

for results in gtrResultList:
    searchURL=results[0]
    searchDesc=results[1]
    searchDesc=replaces(searchDesc)
 #Writing the output to a file
    with open ("GoogleResults.xls", "a") as fh:
        fh.write("%s"%searchURL+"\t"+searchDesc+"\n")

#Process ends here

Quick Guide:

  • Enter your keywords – As you can see from the code, the highlighted section (big data in our example) can be replaced with a keyword of your choice.
  • Run the search – Once you enter the keyword, run the code on python.
  • Output in xls format – You will be able to view the output (top 100 URLs) in  an excel format
  • Run spreadsheet comparison – You will have data points like the title of the article, the meta description for you to learn what is in the post, and the URL in reference.
  • Freeze your title – Run an analysis based on the data and filter your titles. Choose the most interesting topic and give it a slight twist!
Recommended read: Once you know what you want to write, the next step would be to promote your piece. Turns out, scraping solves content marketing melancholy as well. Learn how.

While there are a ton of tools for idea generation, why should I scrape, anyway?

Here’s why:

The primary reason you must switch to scraping or consider it is because of the cost factor. If you think about it, the free version of these tools, say, buzzsumo and Moz limits you to a couple of searches per day. Ahrefs charges a few dollars for the trial version. As busy content creators, you would be working on multiple articles each day. These limitations on the free versions certainly interrupt your flow. These tools are awesome in their own ways, bearing in mind the insights you gain. But, they are insanely expensive.

Why pay for a layer of branding while the fundamentals cost you way lesser? Scraping can help you get this information (any volume and does not limit your searches per day). You can run specific requests to gather topics, monitor social media performance, understand influencer engagement, and a lot more at a reasonable cost – as low as $10 per scrape.

For starters, you can use the free template and go specific on your requests by getting in touch with us.

Get in touch to get the ideas flowing.

  Nandhini

  Blogger and Community Manager

Leave a Comment

Your email address will not be published. Required fields are marked *