Core Web Vitals only shows a sampling of URLs?

The Coverage section of Google Search Console shows 206K valid URLs, of which 174K are submitted and indexed, nearly all of which are Q&A pages. The remaining 32K are indexed, but not submitted in sitemap.

However, the Core Web Vitals section only shows data on 28K URLs. In the Enhancements section, it says there are only 27K valid Q&A items.

What happened to the other 150K URLs?

Googlebot uses cookies

Google has always said that all of their bots do not use cookies.

I caught a screenshot on Google PageSpeed Insights that displayed a modal that I only have implemented for users with a specific cookie already set.

So there you have it, Googlebot uses cookies.

Get paid to post in the forums

Many years ago, DaniWeb offered a pay-to-post system where each post was awarded a monetary value (typically between 5 and 50 cents), depending upon how in-depth it was, upvotes, etc.

Members could cash out once a month if they earned $10 or more from their posts.

Very few members took us up on this opportunity, and we ultimately ended the program.

I'm debating whether we should start it up again. Does anyone still think it's a good idea, 10 years later?

Cost for DaniWeb Premium

DaniWeb Premium is currently $5 per month. You can find details about everything it gives you here

Do you think this is too high a price point? At what price point would you be willing to pay? What if it were $1.99? What if it were $3?

Blocked by robots.txt in GSC going down

The number of known pages in Google Search Console blocked by robots.txt in the Coverage report just recently started going down. Over the course of this month, it went down from about 400K pages to 200K pages.

No changes to robots.txt file in 6+ months nor any big structural changes, 404'd pages, etc. in that amount of time either.

What would cause this number to go down on its own?

favicon.ico is showing up as a soft 404

Featured Imgs 11 is showing up as a Soft 404 in my GSC Coverage report. I can't imagine blocking it with robots.txt because it seems as if bots might want to access it from time to time. Suggestions? Or should I just ignore?

AMP version of noindexed page

If a page is noindexed, but includes a meta reference to an AMP version, is this a mixed signal?

Is the AMP version checked to see if its noindexed as well? Are there situations where you would want to index the AMP version but not the desktop version? Is Google even willing to index an alternate version of a URL when the primary version is noindexed?

Correlation between new content and traffic

The new Crawl Stats in Google Search Console shows a breakdown of how much googlebot recrawls existing content for refreshing its index, and how much is discovery of new content.

Has anyone been working on increasing the rate of new content and seeing that correlate to a linear increase in traffic?

Markdown strict mode

Just a little notification that our markdown parser is now in strict mode. That means that, when posting headings, there has to be a space after the initial hash symbol. In other words:

#This won't work

# This will work

Hopefully it will stop everyone who doesn't properly indent their code having #include<iostream> stop showing up as


Googlebot crawling AMP pages

Googlebot is crawling my AMP pages more than they are crawling my desktop pages. I have the appropriate canonical from AMP to desktop and amphtml from desktop to AMP. The desktop version also has a self-referencing canonical. Only canonical pages are in the sitemap.

This is a concern because less than 10% of our traffic is from mobile devices (unique, I know), yet it's more than 50% of our crawl budget.

The one thing that we do, which I'm not sure if this is appropriate or not, is whenever a desktop page 1 links to an internal page 2, the AMP version of page 1 links to the AMP version of page 2. Therefore, there are internal links pointing to AMP pages, but only from other AMP pages.

The other thing I was wondering is whether anyone has heard of Google serving AMP pages to desktop users behind low bandwidth connections, where they could benefit from AMP. Supposedly AMP doesn't have to be for only mobile anymore, but Google hasn't really demonstrated this. We don't get a lot of mobile traffic, but we do get a lot of third-world / low-bandwidth traffic.

How to remove content from Google?

I am trying to remove an entire folder of thin content from Google to help me recover from a Panda/EAT-related penalty. I want to keep the content on the site for the benefit of users, but not waste crawl budget or have Google think that we have so many pages of thin content.

I added the folder to robots.txt quite a few months ago. While some pages are showing up as "Blocked by robots.txt", the majority of pages now show up in my coverage report as "Indexed, though blocked by robots.txt". About 2 months ago, I submitted a removal request for all URLs that begin with the prefix, but there's been no change. Google Search Console's report updates every few days, but the number of URLs that say, "Indexed, though blocked by robots.txt" is increasing, even months after the removal request for those same pages.

Googlebot ignores robots.txt

I'm noticing Googlebot is not respecting my robots.txt. I'm seeing Googlebot's user agent crawling pages that have been in my robots.txt file for many months. Some of them are showing up in GSC as "Indexed, though blocked by robots.txt" with Last crawled dates indicated as recent as yesterday.

Additionally, I'm seeing Googlebot crawl my robots.txt file a few times a day, and the URLs are definitely blocked per the Google robots.txt tester.

My robots.txt is in the following format:

Sitemap: ...

User-agent: *

# ...

Disallow: ...
Disallow: ...
etc. ~ 40 lines

# ...

Disallow: ...
Disallow: ...
etc. ~ 60 lines

# ...

Disallow: ...
Disallow: ...
etc. ~ 20 lines

Archived DaniWeb

Ever wonder what DaniWeb was like way back in the day?

Well now you don't have to guess.

Forums and tags have a new filter in the dropdown to list all Archived topics.