GFDL corpus namespace

From dKosopedia

Jump to: navigation, search

The GFDL corpus namespace is the most important achievement of the GFDL corpus access providers - it breaks down what can roughly be described as "all human knowledge and history" into NEARLY HALF A MILLION carefully scoped articles in English, OVER ONE MILLION total when all of the hundreds of languages are included.

Contents

debates names thoroughly

As the words in the languages themselves do, the scope of articles varies a bit across languages. However, this has been minimized due to the particular discipline of encyclopedia assumptions applied at Wikipedia, the most commonly used GFDL corpus editing site and GFDL corpus precedent setter. For a sample of the level of attention paid to the maintenance of this namespace see current votes for deletion arguments on the English language Wikipedia. If you are not willing to argue things out to this degree, you must be grateful to those hundreds of people who are. There is no reasonable way to reproduce this effort in any smaller project. Certainly not in dkosopedia itself where we're busy setting standards for other things, especially multiple point of view and the position:namespace.

central to winning debates

Democrats and Greens, typical users of dkosopedia itself, may wish to have influence on what things are called. If so the way to do that is to join those ongoing arguments, not to create an incompatible standalone namespace that will not be instantly understood and accepted by journalists and the public:

Build your arguments on the well known and accepted concepts that can be neutrally defined, and you will win debates. Use language no one understands, and you will not - being mired in suspicious re-definitions while your opponents are using known terms - that their own operatives help define.

(another issue is candidate bio control in Wikipedia - if a candidate has nicknames this is also a namespace issue)

constrains ALL naming for media purposes

The GFDL corpus namespace has a rather remarkable and perhaps unfortunate influence:

This namespace is ONE OF ABOUT ONE THOUSAND KNOWN WAYS of breaking down all human knowledge into categories. The University of Toronto Library is a major world centre of research into these and has over NINE HUNDRED such schemes (technically known as a taxonomy but that term does not include proper names and slang phrases and idioms and dictionary entries and so on, which the GFDL corpus namespace does include).

There are four ways however in which the GFDL corpus namespace is a special and irreplaceable case:

1. It dominates the google rank - it is deliberately designed NOT to define concepts into existence but merely reflect them -combine uncontroversial references - as a result over time statistically its articles MUST RISE IN THE GOOGLE RANK even though there is a POLICY OF QUESTIONING ARTICLES THAT RANK HIGH and thus might be overly leading the definition.

2. It is licensed under the GFDL open content license - a Share Alike license that permits very wide re-use of the material in many contexts, e.g. to build net search systems on as Wikia and others are doing.

3. It is built in Unicode which means that any character in any language can be used in names, according to large public wiki naming conventions in each supported language. This is simply not true of any prior naming scheme in existence, since Unicode itself is a relatively recent invention.

4. It happened to exist in mid-2004, when net based articles and lookup first assumed a central role, google became the dominant search method and best-financed company, and design precedents in a vast number of other large public wikis were being set. (In addition precedents in command verbs that affect other wiki software by teaching it to more editors than anyone else).

Because of 4, the probability of another competing effort doing the same is extremely low. DMOZ for instance has its own naming taxonomy but it has not spread due to the restrictive (not open content ) licensing on the actual web site descriptions used. Yahoo's naming category scheme is strictly proprietary and can be expected to lose prominence as Wikipedia gains credibility via scholars and editors attracted to its wiki critical mass. Given the need for 3, there is no chance of any standard emerging from a non-software or non-Internet source, as it would not be adequately tested by browsers, editors, and the exchange of a too-vast variety of information in all languages.

Accordingly, this namespace is an ABSOLUTE CONSTRAINT in all large scale "content management" and MUST be slavishly propagated without a drop of "creativity". IF IT IS DEFINED IN THE GFDL CORPUS IT ALREADY HAS A NAME, and that name is the best name for that concept on google.

sets policy terms absolutely

Partisan political efforts have a special need to at least appear to be neutral, and not to reproduce work already done by nonprofits and other friendly groups, for fear of manipulating memes to create things that don't exist. For the above reasons, the list of policy terms used in the Green Party of Canada Living Platform MIRRORED EXACTLY THE NAMES AS USED IN THE GFDL CORPUS NAMESPACE. Any deviation whatosever EVEN IN PUNCTUATION is sabotage, since it imposes a "cleanup" or "name munging" stage to "find" or rather "guess" the corresponding name in the GFDL corpus. The major challenge, and the worst of the fatal tikiwiki flaws, is that tikiwiki makes this simply impossible:

There will be substantial manual effort required to track policy term names from the GFDL corpus namespace to reflect them exactly in the GFDL corpus namespace

is in standard wikitext

The GFDL corpus is overwhelmingly maintained in mediawiki and thus the namespace reflects its design decisions. Whether one "agrees" or "likes" these or not is irrelevant. Whether there are "better" decisions that could be made in other software (like tikiwiki ) is even more irrelevant. The de facto standard is set and this is it:

The mediawiki software and thus the "wikistyle" or wikitext format:

  • relies on [[double square brackets]] to make internal links
  • supports "#REDIRECT [[link]]" syntax for identical concepts - VERY IMPORTANT - FOLLOW THIS CONVENTION SLAVISHLY!
  • does not permit spaces and underscores as different characters in titles - an underscore is treated as a single space.
  • permits commas in titles, e.g. U.S. House election, 2006
  • permits apostrophes in titles, e.g. nature's services
  • treats colons (":") in names as a way to create subspaces (only)
  • treats slashes ("/") in names as a way to create subpages (only)
  • is case-sensitive on all characters after the first

must be manually converted to

A namespace built with any other assumptions will simply NOT REFLECT the GFDL corpus namespace. NO CONCEIVABLE "TOOL" OR "CONVERTER" COULD SOLVE THIS PROBLEM. There are simply too many judgement calls to make in translating into mediawiki namespace.

When moving onto mediawiki from tikiwiki or something, much pain will be felt, and many users may not be able to make the transition:

"The software isn't done until every single user is dead." - one of the extended Murphy's Laws.

This article is adapted from a CC-by article by Craig Hubley first published at Green Party of Canada Living Platform. Retain this notice of acknowledgement as an Invariant Section under GFDL to retain the permission to distribute this article under GFDL anywhere within the GFDL corpus. The original is visible at [1]

Personal tools