THEwikiStics - Wikimedia statistics FAQ

< Home

THEwikiStics FAQ

...where you can find Frequently Asked Questions towards THEwikiStics and midom's logs!

Question: What do these stats show?
- Answer: The number of the Wikimedia Squid server accesses [hits] on wiki pages!
Question: Where does the data come from?
- Answer: As raw data I use midom's logs (a filtered copy of the Wikimedia Squid access-log stream)!
Question: How accurate/reliable are your figures at THEwikiStics?
- Answer: The Wikimedia developers pass as many traffic as possible towards the Wikimedia Squid servers to save their rare database servers. All page views [HTML accesses] are counted (particularly all not-logged-in visitors (anonymous; IPs) will always get their HTML pages from the Squids; see also de:Squid#Beispiel).
- Answer: Presumably, the numbers are rather too high, as all Squid accesses are counted. Including those of wiki bots/scripts, before/after-views of editors/patrollers and probably also web crawlers (spiders) as well as spammers and defective scripts! Furthermore, page reloads/refreshes and hard redirects [see #5] are counted additionally [surplus]!
  Note that the background noise is about 1 hit per page (see vowiki)!
- This means that the figures do not precisely show the number of impressions ("human page views") (nor the number of unique visitors (not at all)), but just the pure number of direct page accesses! Furthermore, be aware that a Squid access log stream of heavy load servers is not really the most accurate data log out there [see #10]!
Question: Why is the sub sorting order Z to A?
- Answer: This is the result of the numerical sorting; to get the items from A to Z, everything would have to be sorted once more.
Question: Why are soft redirects (#REDIRECT) listed independently?
- Answer: All wiki redirects (#REDIRECT) have been cached on the Squid servers as unique pages; wiki redirects are individual pages! Web browsers do not follow those "soft" redirects as they are just copies, made by the MediaWiki software! When accessing an URL of a wiki redirect, the URL in the address bar of your browser is the one of the soft redirect itself; only one hit has been made/counted then! So, should I merge the splitted results like there? No, this would be hard work and it is just good to know under what title a page has been accessed! If being bothered, resolve your annoying redirects ;-)
Question: Are hard redirects (e.g. forced capitalization) counted individually, too?
- Answer: Yes, but in this case web browsers do follow those redirects very well! "Hard" redirects are used for obeying case sensitivity for example; the redirect is forced by the Squid-/web-server in this case (e.g. with 301 HTTP status code). This means, your browser is told to access another, the correct page! Your address bar history will show two different URLs. All hits on "/wiki/foo bar" will result in another hit on "/wiki/Foo_bar" (but "F" not necessarily if your wiki is capable of allowing lowercase first-characters, like Wiktionaries e.g.)! If your wiki runs in uppercase mode, the figures for "Foo_bar" will be correct, the numbers for "foo bar" good to know, but redundant. By the way: The figures for the extra "foo%20bar" log entry ("%20" is URL encoded for " "; becomes "_" (like others through server)) are ignored; instead, only the resulting "Foo_bar" (respectively "foo_bar") log entry is correct (assuming normal web browser traffic). Moreover, also special pages like Special:Random and Special:Search that redirect to localized aliases are counted surplus in source logs this way.
Question: Are hits on moved pages splitted into different titles?
- Answer: Yes, it's hard to figure out what pages have been moved all the time. This means that the Squid server live-stream logs just show the hits for the concurrent title of a page/redirect. A page move will cause a different main title in all later logs, leading to splitted results in summaries!
Question: Is it possible to automatically filter out log spam done by scripts/bots?
- Answer: Not really, as the logs do not contain browser information. But for certain pages it should be possible to show how many requests were caused by bots/scripts ... If you should be a bot/script operator or programmer, please see these tips: any page requested by using the /w/index.php path does not get counted by the Squid cache servers (proof); this unfortunately also applies for &oldid permalinks!
Question: Why are there so many Special:Export/... requests?
- Answer: Either this is caused by (maybe defective) bots (e.g. scripts for exporting Wikipedia content) or it's spam ;-)
Question: Why are there sometimes odd pages listed (containing special characters etc.)?
- Answer: Because of spammers, browser encoding errors, Squid cache problems and Squid server errors there can be weird stuff like "/sf-forum", "�", "Ã¤Ø×©", "function.fsockopen" or "/skins-1.5/". Of course, I will try my best to filter out all that cruft! ... In the hope of not also filtering out some good stuff, by the way ;-)
Question: Why are there two different background colours, by the way?
- Answer: 'Light yellow' means 'daily / evolving / in development', and 'light green' means 'completed / finished long-term analysis'!
Question: How many unique visitors do you estimate?
- Answer: See unique visit(or)s per day and unique visit(or)s per month!
Question: Do you round the number of hits per day?
- Answer: Yes, so no ugly commas are shown ;-)
  For also having 100 % accurate numbers in the HTML output, the monthly/accumulated figures are shown if you hover the bold numbers!
Question: What is planned for the future?
- Answer: See MediaZilla requests!
Question: Are there any daily/monthly/yearly logs available?
- Answer: Yes, there are; just use those dumps.
Question: Why were only the smaller wikis listed at the beginning?
- Answer: I wrote all my scripts with smaller wikis in mind, initially! With bigger wikis being also included, the page analyses would have been too time and resource consuming for me! But there already was a project that was approaching from the other direction, see stats.grok.se!
Question: Why are there very small differences between your figures and the ones at stats.grok.se for some pages?
- Answer: At stats.grok.se the analyses are done on a UTC+1 basis; running my scripts with the parameter "+1" at the end would produce the same results!
Question: Why all this banners?
- Answer: It is for a good sake, only. Ah, and for colouration purpose, too ;-)

[Disclaimer / report issues]

Having another question(s) or comment(s)? You can kindly contact User:Melancholie!