How to Generate Content Ideas Using Screaming Frog in 20(ish) Minutes

by Todd McDonald and first published on Moz.com.

A steady rise in content-related marketing disciplines and an increasing connection between effective SEO and content has made the benefits of harnessing strategic content clearer than ever. However, success isn’t always easy. It’s often quite difficult, as I’m sure many of you know.

A number of challenges must be overcome for success to be realized from end-to-end, and finding quick ways to keep your content ideas fresh and relevant is invaluable. To help with this facet of developing strategic content, I’ve laid out a process below that shows how a few SEO tools and a little creativity can help you identify content ideas based on actual conversations your audience is having online.

What you’ll need

Screaming Frog: The first thing you’ll need is a copy of Screaming Frog (SF) and a license. Fortunately, it isn’t expensive (around $150/USD for a year) and there are a number of tutorials if you aren’t familiar with the program. After you’ve downloaded and set it up, you’re ready to get to work.

Google AdWords Account: Most of you will have access to an AdWords account due to actually running ads through it. If you aren’t active with the AdWords system, you can still create an account and use the tools for free, although the process has gotten more annoying over the years.

Excel/Google Drive (Sheets): Either one will do. You’ll need something to work with the data outside of SF.

Browser: We walk through the examples below utilizing Chrome.

The concept

One way to gather ideas for content is to aggregate data on what your target audience is talking about. There are a number of ways to do this, including utilizing search data, but it lags behind real-time social discussions, and the various tools we have at our disposal as SEOs rarely show the full picture without A LOT of monkey business. In some situations, determining intent can be tricky and require further digging and research. On the flipside, gathering information on social conversations isn’t necessarily that quick either (Twitter threads, Facebook discussion, etc.), and many tools that have been built to enhance this process are cost-prohibitive.

But what if you could efficiently uncover hundreds of specific topics, long-tail queries, questions, and more that your audience is talking about, and you could do it in around 20 minutes of focused work? That would be sweet, right? Well, it can be done by using SF to crawl discussions that your audience is having online in forums, on blogs, Q&A sites, and more.

Still here? Good, let’s do this.

The process

Step 1 – Identifying targets

The first thing you’ll need to do is identify locations where your ideal audience is discussing topics related to your industry. While you may already have a good sense of where these places are, expanding your list or identifying sites that match well with specific segments of your audience can be very valuable. In order to complete this task, I’ll utilize Google’s Display Planner. For the purposes of this article, I’ll walk through this process for a pretend content-driven site in the Home and Garden vertical.

Please note, searches within Google or other search engines can also be a helpful part of this process, especially if you’re familiar with advanced operators and can identify platforms with obvious signatures that sites in your vertical often use for community areas. WordPress and vBulletin are examples of that.

Google’s Display Planner

Before getting started, I want to note I won’t be going deep on how to use the Display Planner for the sake of time, and because there are a number of resources covering the topic. I highly suggest some background reading if you’re not familiar with it, or at least do some brief hands-on experimenting.

I’ll start by looking for options in Google’s Display Planner by entering keywords related to my website and the topics of interest to my audience. I’ll use the single word “gardening.” In the screenshot below, I’ve selected “individual targeting ideas” from the menu mid-page, and then “sites.” This allows me to see specific sites the system believes match well with my targeting parameters.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:qJyinA:Google Chrome.png

I’ll then select a top result to see a variety of information tied to the site, including demographics and main topics. Notice that I could refine my search results further by utilizing the filters on the left side of the screen under “Campaign Targeting.” For now, I’m happy with my results and won’t bother adjusting these.

Step 2 – Setting up Screaming Frog

Next, I’ll take the website URL and open it in Chrome.

Once on the site, I need to first confirm that there’s a portion of the site where discussion is taking place. Typically, you’ll be looking for forums, message boards, comment sections on articles or blog posts, etc. Essentially, any place where users are interacting can work, depending on your goals.

In this case, I’m in luck. My first target has a “Gardening Questions” section that’s essentially a message board.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:f8grAc:Google Chrome.png

A quick look at a few of the thread names shows a variety of questions being asked and a good number of threads to work with. The specific parameters around this are up to you — just a simple judgment call.

Now for the fun part — time to fire up Screaming Frog!

I’ll utilize the “Custom Extraction” feature found here:

Configuration → Custom → Extraction

…within SF (you can find more details and broader use-case documentation set for this feature here). Utilizing Custom Extraction will allow me to grab specific text (or other elements) off of a set of pages.

Configuring extraction parameters

I’ll start by configuring the extraction parameters.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:6CLiO7:SEOSpiderUI.png

In this shot I’ve opened the custom extraction settings and have set the first extractor to XPath. I need multiple extractors set up, because multiple thread titles on the same URL need to be grabbed. You can simply cut and paste the code into the next extractors — but be sure to update the number sequence (outlined in orange) at the end to avoid grabbing the same information over and over.

Notice as well, I’ve set the extraction type to “extract text.” This is typically the cleanest way to grab the information needed, although experimentation with the other options may be required if you’re having trouble getting the data you need.

Tip: As you work on this, you might find you need to grab different parts of the HTML than what you thought. This process of getting things dialed can take some trial-and-error (more on this below).

Grabbing Xpath code

To grab the actual extraction code we need (visible in the middle box above):

  1. Use Chrome
  2. Navigate to a URL with the content you want to capture
  3. Right-click on the text you’d like to grab and select “inspect” or “inspect element”

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:x5zaHV:Google Chrome.png

Make sure you see the text you want highlighted in the code view, then right-click and select “XPath” (you can use other options, but I recommend reviewing the SF documentation mentioned above first).

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:KGwqPz:Google Chrome.png

It’s worth noting that many times, when you’re trying to grab the XPath for the text you want, you’ll actually need to select the HTML element one level above the text selected in the front-end view of the website (step three above).

At this point, it’s not a bad idea to run a very brief test crawl to make sure the desired information is being pulled. To do this:

  1. Start the crawler on the URL of the page where the XPath information was copied from
  2. Stop the crawler after about 10–15 seconds and navigate to the “custom” tab of SF, set the filter to “extraction” (or something different if you adjusted naming in some way), and look for data in the extractor fields (scroll right). If this is done right, I’ll see the text I wanted to grab next to one of the first URLs crawled. Bingo.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:fDZAyI:SEOSpiderUI.pngResolving extraction issues & controlling the crawl

Everything looks good in my example, on the surface. What you’ll likely notice, however, is that there are other URLs listed without extraction text. This can happen when the code is slightly different on certain pages, or SF moves on to other site sections. I have a few options to resolve this issue:

  1. Crawl other batches of pages separately walking through this same process, but with adjusted XPath code taken from one of the other URLs.
  2. Switch to using regex or another option besides XPath to help broaden parameters and potentially capture the information I’m after on other pages.
  3. Ignore the pages altogether and exclude them from the crawl.

In this situation, I’m going to exclude the pages I can’t pull information from based on my current settings and lock SF into the content we want. This may be another point of experimentation, but it doesn’t take much experience for you to get a feel for the direction you’ll want to go if the problem arises.

In order to lock SF to URLs I would like data from, I’ll use the “include” and “exclude” options under the “configuration” menu item. I’ll start with include options.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:6scUuu:SEOSpiderUI.png

Here, I can configure SF to only crawl specific URLs on the site using regex. In this case, what’s needed is fairly simple — I just want to include anything in the /questions/ subfolder, which is where I originally found the content I want to scrape. One parameter is all that’s required, and it happens to match the example given within SF ☺:

The “excludes” are where things get slightly (but only slightly) trickier.

During the initial crawl, I took note of a number of URLs that SF was not extracting information from. In this instance, these pages are neatly tucked into various subfolders. This makes exclusion easy as long as I can find and appropriately define them.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:fuqMmV:SEOSpiderUI.png

In order to cut these folders out, I’ll add the following lines to the exclude filter:

Upon further testing, I discovered I needed to exclude the following folders as well:

It’s worth noting that you don’t HAVE to work through this part of configuring SF to get the data you want. If SF is let loose, it will crawl everything within the start folder, which would also include the data I want. The refinements above are far more efficient from a crawl perspective and also lessen the chance I’ll be a pest to the site. It’s good to play nice.

Completed crawl & extraction example

Here’s how things look now that I’ve got the crawl dialed:

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:MjDfb8:SEOSpiderUI.png

Now I’m 99.9% good to go! The last crawl configuration is to reduce speed to avoid negatively impacting the website (or getting throttled). This can easily be done by going to Configuration → Speed and reducing the number of threads and URIs that can be crawled. I usually stick with something at or under 5 threads and 2 URIs.

Step 3 – Ideas for analyzing data

After the end goal is reached (run time, URIs crawled, etc.) it’s time to stop the crawl and move on to data analysis. There a number of ways to start breaking apart the information grabbed that can be helpful, but for now I’ll walk through one approach with a couple of variations.

Identifying popular words and phrases

My objective is to help generate content ideas and identify words and phrases that my target audience is using in a social setting. To do that, I’ll use a couple of simple tools to help me break apart my information:

The top two URLs perform text analysis, with some of you possibly already familiar with the basic word-cloud generating abilities of tagcrowd.com. Online-Utility won’t pump out pretty visuals, but it provides a helpful breakout of common 2- to 8-word phrases, as well as occurrence counts on individual words. There are many tools that perform these functions; find the ones you like best if these don’t work!

I’ll start with Tagcrowd.com.

Utilizing Tagcrowd for analysis

The first thing I need to do is export a .csv of the data scraped from SF and combine all the extractor data columns into one. I can then remove blank rows, and after that scrub my data a little. Typically, I remove things like:

  • Punctuation
  • Extra spaces (the Excel “trim” function often works well)
  • Odd characters

Now that I’ve got a clean data set free of extra characters and odd spaces, I’ll copy the column and paste it into a plain text editor to remove formatting. I often use the one online at editpad.org.

That leaves me with this:

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:AQjpqU:Google Chrome.png

In Editpad, you can easily copy your clean data and paste it into the entry box on Tagcrowd. Once you’ve done that, hit visualize and you’re there.

Tagcrowd.com

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:SeqYtU:Google Chrome.png

There are a few settings down below that can be edited in Tagcrowd, such as minimum word occurrence, similar word grouping, etc. I typically utilize a minimum word occurrence of 2, so that I have some level of frequency and cut out clutter, which I’ve used for this example. You may set a higher threshold depending on how many words you want to look at.

For my example, I’ve highlighted a few items in the cloud that are somewhat informational.

Clearly, there’s a fair amount of discussion around “flowers,” seeds,” and the words “identify” and “ID.” While I have no doubt my gardening sample site is already discussing most of these major topics such as flowers, seeds, and trees, perhaps they haven’t realized how common questions are around identification. This one item could lead to a world of new content ideas.

In my example, I didn’t crawl my sample site very deeply and thus my data was fairly limited. Deeper crawling will yield more interesting results, and you’ve likely realized already how in this example, crawling during various seasons could highlight topics and issues that are currently important to gardeners.

It’s also interesting that the word “please” shows up. Many would probably ignore this, but to me, it’s likely a subtle signal about the communication style of the target market I’m dealing with. This is polite and friendly language that I’m willing to bet would not show up on message boards and forums in many other verticals ☺. Often, the greatest insights besides understanding popular topics from this type of study are related to a better understanding of communication style, phrasing, and more that your audience uses. All of this information can help you craft your strategy for connection, content, and outreach.

Utilizing Online-Utility.org for analysis

Since I’ve already scrubbed and prepared my data for Tagcrowd, I can paste it into the Online-Utility entry box and hit “process text.”

After doing this, we ended up with this output:

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:F9LpWN:Google Chrome.png

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:mAMxCq:Google Chrome.png

There’s more information available, but for the sake of space, I’ve grabbed only a couple of shots to give you the idea of most of what you’ll see.

Notice in the first image, the phrases “identify this plant” & “what is this” both show up multiple times in the content I grabbed, further supporting the likelihood that content developed around plant identification is a good idea and something that seems to be in demand.

Utilizing Excel for analysis

Let’s take a quick look at one other method for analyzing my data.

One of the simplest ways to digest the information is in Excel. After scrubbing the data and combining it into one column, a simple A→Z sort, puts the information in a format that helps bring patterns to light.

ssd:private:var:folders:m2:wh1vdy452ps54mq15f_w0jlh0000gn:T:EXDvV1:Microsoft Excel.png

Here, I can see a list of specific questions ripe for content development! This type of information, combined with data from tools such as keywordtool.io, can help identify and capture long-tail search traffic and topics of interest that would otherwise be hidden.

Tip: Extracting information this way sets you up for very simple promotion opportunities. If you build great content that answers one of these questions, go share it back at the site you crawled! There’s nothing spammy about providing a good answer with a link to more information if the content you’ve developed is truly an asset.

It’s also worth noting that since this site was discovered through the Display Planner, I already have demographic information on the folks who are likely posting these questions. I could also do more research on who is interested in this brand (and likely posting this type of content) utilizing the powerful ad tools at Facebook.

This information allows me to quickly connect demographics with content ideas and keywords.

While intent has proven to be very powerful and will sometimes outweigh misaligned messaging, it’s always great to know as much about who you’re talking to and be able to cater messaging to them.

Wrapping it up

This is just the beginning and it’s important to understand that.

The real power of this process lies in its usage of simple, affordable, tools to gain information efficiently — making it accessible to many on your team, and an easy sell to those that hold the purse strings no matter your organization size. This process is affordable for mid-size and small businesses, and is far less likely to result in waiting on larger purchases for those at the enterprise level.

What information is gathered and how it is analyzed can vary wildly, even within my stated objective of generating content ideas. All of it can be right. The variations on this method are numerous and allow for creative problem solvers and thinkers to easily gather data that can bring them great insight into their audiences’ wants, needs, psychographics, demographics, and more.

Be creative and happy crawling!

The Once-A-Week SEO Checklist

This article first appeared in the 1-8-13 issue of Website Magazine.

by 

A new year always brings about new possibilities, which are often predicated on the many resolutions we all make to improve our lives and work during the course of the year.

It’s possible that many of the hardworking webmasters and website owners have resolved to improve or amp up their search engine optimization (SEO) efforts this year to help them find more relevant consumers and increase conversions. However, many of these same Web workers will quickly find themselves faced with the same problems that plagued them in the year’s passed, most notably a lack of time in an already busy schedule.

No need to worry, though, because here’s some good news for you: It’s possible to maintain a healthy SEO campaign by (mostly) conducting a check up once a week that examines the most important elements of your website for moving up the search engine rankings, allowing you to identify and correct any issues you may be having. And the best part is, once these larger problems are corrected, it will help improve many other aspects of your overall SEO performance.

Just make sure that you regularly follow a version of this SEO checklist once a week, and get ready to watch the inevitable upward progress of your search marketing efforts.

– Use Google Webmaster Tools to check sitemaps

To start, simply sign into your Google Webmaster Tools account (actually, if you don’t have one, the first step is to register one), which can help you quickly identify any issues with your domain. Primarily, you should use this service to make sure your sitemaps don’t have any errors and to review how many of your pages have been indexed. If you find that you have some missing pages, that’s a pretty good indicator that you need to submit a brand new sitemap.xml to the search engines.

– Don’t forget to look for crawl errors, too

Google Webmaster Tools can also help you spot any crawl errors (pages “not found” or broken links) on your site; if these issues are uncovered, they should be considered top priority fixes. In addition, this tool can help you check up on your site speed, HTML problems, such as short or duplicate metadata, and links to your site.

– Look for (and fix) broken links

Having a bunch of dead links on your website is going to hurt your standing with the search engines, so you should make it a point to regularly look for them by using a tools like Dead-Links.com to crawl your website and point out any hazardous hyperlinks that you are unaware of. And once you know which links are bad, you can easily fix or get rid of them.

– Tune up title tags

If you’ve put any effort into your SEO until now, every page on your site should have its own unique, descriptive title (as indicated in the HTML <title> tags), but as we all know, the more pages one adds to his or her site, the harder it is to constantly ensure that every page is given an appropriately SEO-friendly title. If you have a somewhat small site, you should be able to check all of your pages manually pretty easily, but for larger sites, Google Webmaster Tools will gather and present this information to you in a new “Content Analysis” section that can be found under the “Diagnostics” tab.

– Revise meta descriptions (as needed)

Although meta page descriptions don’t have a huge impact on search rankings, they can play a major role in convincing users to click-through to your site, so its worth giving them a once over on a regular basis, especially if you add a lot of new pages from week-to-week. In particular, you should look to make sure you don’t have any duplicate descriptions on your site. Good descriptions should be between 150 and 160 characters and made up of compelling copy that smartly uses crucial keywords, without using quotation marks or other non-alphabet characters.

– Follow the trends

Using an analytics platform like Google Analytics, check the daily, weekly and long-term search traffic trends to see what users are responding to and what isn’t working. Find out which of your pages have increased search engine traffic and which ones have had the opposite effect, and then figure out the reasons for why this is the case. Ultimately, you should have a solid idea/starting point to look at the problems on your site that need to be addressed, as well as the opportunities you have to increase search traffic based on user data.

– Add internal links when possible

Search engines use internal links to determine which pages the website owners think are the most important on the site, so to help your rankings and show off your best stuff, look around your site for ways to include links to these power pages. This is especially easy (and important) if you are consistently adding new content.

– Seek out your best search phrases and use them a lot

Thanks to – you guessed it – Google Webmaster Tools, webmasters can now find out what search phrases are leading users to their virtual door. By going to the “Statistics” tab and look at “search queries,” you’ll see the top 20 search queries that your site is appearing in, which can help you assess the performance of your current keyword campaigns and maybe even discover a few new ones hadn’t even thought of. With this information in tow, you can use TrafficZap’s keyword density tool to receive a report about the words and phrases that appear most densely on the page of the URL that you enter; this will help you figure out just how well you’re using your keywords and phrases on your site, and make adjustments accordingly.

TOP POSTS OF 2012 @ Website Magazine

Following are the top articles posted on Website Magazine in 2012.

:: 15 Fresh (and Free) Fonts for 2012 

:: Best WordPress Comment Plugins

:: The BIGGEST Mistakes in Web Design

:: New Study Reveals Top Google Ranking Factors

:: Here’s How jQuery is Creating the Interest in Pinterest

:: Google Flip-flops on Page Layout

:: Get Started with Parallax Scrolling

:: 10 Minimalist WP Themes for Maximum Inspiration

:: Google SEO: Algorithm Changes – February 2012

:: 8 Ways to Improve Your Site Over the Weekend

:: 10 Mind-Blowing jQuery Plugins for Developers

:: Crafting an SEO-Friendly Facebook Page

:: SEO Meta Data Mechanics: Titles & Descriptions

:: Google SEO: 52 New Changes to Know

:: Here’s What’s Hot – 13 Super Startups to Watch

:: Give Up the SEO Dream?

:: Getting Wild with Wireframes

:: CSS Frameworks for Responsive Web Design

:: Awe-inspiring Twitter Brand Page Designs and Tips 

:: Getting Started Selling on Amazon

:: The Facebook Timeline Countdown is On

:: 3 Pinterest Plugins for WordPress

:: Pinterest Optimization for Internet Retailers

:: Turn Your Pics Into Profits

:: Loyalty, Reward & Gamification Plugins for WordPress

26 Ways to Use Social Media for Lead Generation

By Debbie Hemley

This article first appeared in Michael Stelzner’s Social Media Examiner.

Is your business looking for leads?

As enticing as the saying is, “If you build it, they will come,” we all know that just because we build a social media presence, people don’t magically start knocking down our door.

Instead, we need to encourage people to come to our social pages and once they’re there, we have to create enough value for them to hang around. And through these repeated exchanges, casual users can become regular visitors as well as valuable leads.

In previous posts, I’ve written A-Z guides to help create the absolute best presence onTwitter, FacebookLinkedIn and blogs. Now let’s turn our attention to harnessing the power of those efforts for lead generation.

#1: Assets

As part of your social media marketing plan, Michelle deHaaff suggests that companies examine social media and online assets to see what they can leverage for full social media engagement. She identifies seven key assets: location, people, stories, images, video, audio and words to help us think about engaging more fully.

Read More . . .


4 Winning Strategies for Social Media Optimization

by Jim Tobin

This article first appeared on MASHABLE.

Jim Tobin is president of Ignite Social Media, a leading social media agency, where he works with clients including Microsoft, Intel, Nike, Nature Made, The Body Shop, Disney and more implementing social media marketing strategies. He is also author of the book Social Media is a Cocktail Party. Follow him on Twitter @jtobin.

Social media optimization (SMO) is the process by which you make your content easily shareable across the social web. Because so many options exist for where people can view your content, the content model for the web has shifted from, “We have to drive as much traffic to our website as possible,” to the more pragmatic, “We have to ensure as many people see our content as possible.”

You’ll still want most people to see your content on your site — and if you’re doing it right they will — but helping people view content through widgets, apps and other social media entry points will accrue positive benefits for your brand. The more transportable you can make your content, the better.

If you’re ready to get started with a social media optimization plan for your organization, read on for an overview.


Why Social Media Optimization Matters


Before we get to the practical, let’s start with the “Why,” as in “Why you should care about SMO?” As you can see from the chart below, social networks are driving an increasing amount of traffic to an increasing number of websites. Sites like Comedy Central, Forever 21 and Etsy are seeing more traffic from social networks than they see from GoogleGoogle. How social referral traffic is performing for you most likely depends on two factors:

1. How interesting your content is; and

2. How easily shareable you have made that content across a variety of networks.

 

chart image
Image credit: Gigya

In other words, SMO can lead to increased traffic to your site, as friends encourage their friends to digest specific content. If you can appeal to a given person, their friends are statistically more likely to be interested in the same thing, so you’re likely reaching a well-targeted audience.  Further, it also leads to improved search engine optimization, as major search engines count links as if they were votes for your site.

SMO isn’t just about building a bigger social media presence for your brand. Whether or not your organization has a strong social network presence, the social networks of others can be leveraged to great effect.

Read more . . .


Forget Community. Forget Conversation. Business Blogging Is About SEO.

By Rick Burnes

This article originally appeared on HubSpot.

If you don’t blog, you’re probably tired of people telling you why you should.The blog-pushers who insist it’s a great way to create a community around your product.

The evangelists who argue blogging is a great way to create conversation.

The practical folks who tell you blogging is a better way to publish your press releases.

You don’t dispute any of this. You just find it wishy-washy.

Your business is a data-driven machine. You live and die by leads and sales. You don’t have time for unmeasurable, time-consuming concepts like community and conversation.

Fine.

Forget community. Forget conversation. There’s a far simpler, far more measurable reason to blog: search engine rankings.

If you publish a regularly updated, well-written blog on your company’s site, it will show up more often in search engine results.

Most marketers miss this. They focus on the sexier social, networking and thought-leadership aspects of blogging. These are all very important reasons to blog (you can’t really forget community and conversation), but they’re complicated to measure.

Great search engine ranking is easier to measure. Just consider how much you’d have to pay to get equivalent ranking on a pay-per-click basis.

If you write a post about your fantastic windmill consulting firm and it shows up in the search results for “new windmills” your blog will get lots of new traffic and leads that you’d otherwise have to pay to for.

This blog is another great example. It drives three times as much traffic from Google to HubSpot as HubSpot’s traditional company site. To purchase the same kind of traffic (and the leads that come with it) we’d have to pay Google millions.

Think about that — our blog is giving us millions of dollars worth of free advertising and generating leads we can count.

There’s nothing wishy-washy about that.