![]()
A S C R A P B O O K O F S O L U T I O N S F O R T H E P H O T O G R A P H E R
![]()
Enhancing the enjoyment of taking pictures with news that matters, features that entertain and images that delight. Published frequently.
22 July 2025
It didn't occur to us that the same issue that killed our bandwidth check on the Sonic server would also affect the cron that updated our site map. That issue was Sonic's cessation of shell services.
We had figured out a way to report our bandwidth usage reading the apache log remotely, task we could cron on our own system. But we neglected to do the same thing for generating our site map.
That's used for more efficient indexing of the site by services like Google.
So, in addition to refining our htaccess and robots.txt files (and making sure they are compatible with the older version of apache that Sonic runs), we decided to update our sitemap generation.
Our old cron on the Sonic server had been running a Perl program Google had distributed long ago. It never failed us but, with nearly 10,000 stories on the site, it was generating a file that was much too large for Google.
Our first thought was to get modern. So we hunted for a free script written in PHP that would generate the site map. We could store it on the site, bookmark it and run it when necessary or, like our bandwidth check, from a cron on our system.
We tried a few of them. They all had problems. Some couldn't correctly expand a relative URL to an absolute one. Others aborted after about a thousand files. Some couldn't divine the modification date. They all insisted on all two nonsense fields to the XML site map files (priority, which they all set at 1, and update frequency, which they all set as Daily).
After a while, we felt like we were correcting student assignments, debugging these things.
A site maps is simply a list of the URLs that compose a site. Reporting he last modification date for each of them helps whatever is crawling the site to find the new stuff and skip the old stuff. So that should allow smarter indexing requiring less bandwidth.
A win win, in short.
Well, we know how to list files recursively in Perl, so we wrote a very short program to do that for the Photo Corners site. And we know how to find the modification date of a file in Perl and how to format that date, too. And once we had that data, it was simple enough (in Perl) to format it as XML.
Before we knew it, we had a Perl program of about 40 lines (with comments) to create a site map. The Google Perl program we had used was over 3,000 lines, most of which was never executed for our site.
We'll have to see how Google and other legitimate indexers accept the new site map (there are actually three the new software generates to keep within size limits). But we're confident we can address any issues in our efficient little solution.