Question:
and it doesn't help to issue spider.put_AvoidHttps(False) before crawling
I did not find a problem.
Here's my simple C++ test program:
void spiderTest(void) { CkSpider spider;const char *url = "http://www.chilkatsoft.com/crawlStart.html"; const char *domain = "www.chilkatsoft.com"; spider.Initialize(domain); spider.AddUnspidered(url); spider.put_CacheDir("c:/aaworkarea/spiderCache"); // Begin crawling the site by calling CrawlNext repeatedly. int i,total; total=0; for (i = 0; i < 10; i++) { bool success; success = spider.CrawlNext(); if (success == true) { total++; if(spider.get_LastFromCache()) { printf("Downloaded from cache: %s\n",spider.lastUrl()); } else { printf("Downloaded from Internet: %s\n",spider.lastUrl()); spider.SleepMs(1000); } } else { if (spider.get_NumUnspidered() == 0) { printf("No more URLs to spider\n"); } else { printf("%s\n",spider.lastErrorText()); } break; } } }
I have the same problem, I cant crawl the following webpage: https://naxom.se/
what can be the problem? https or the link structure?