Archived Forum PostQuestion:
and it doesn't help to issue spider.put_AvoidHttps(False) before crawling
I did not find a problem.
Here's my simple C++ test program:
void spiderTest(void)
{
CkSpider spider;
const char *url = "http://www.chilkatsoft.com/crawlStart.html";
const char *domain = "www.chilkatsoft.com";
spider.Initialize(domain);
spider.AddUnspidered(url);
spider.put_CacheDir("c:/aaworkarea/spiderCache");
// Begin crawling the site by calling CrawlNext repeatedly.
int i,total;
total=0;
for (i = 0; i < 10; i++)
{
bool success;
success = spider.CrawlNext();
if (success == true)
{
total++;
if(spider.get_LastFromCache())
{
printf("Downloaded from cache: %s\n",spider.lastUrl());
}
else
{
printf("Downloaded from Internet: %s\n",spider.lastUrl());
spider.SleepMs(1000);
}
}
else
{
if (spider.get_NumUnspidered() == 0)
{
printf("No more URLs to spider\n");
}
else
{
printf("%s\n",spider.lastErrorText());
}
break;
}
}
}
I have the same problem, I cant crawl the following webpage: https://naxom.se/
what can be the problem? https or the link structure?