WAAAY too many Aliases [#571]
: WAAAY too many Aliases
(2006-08-03 00:05) [#1999]
There are waaaaay too many aliases for this page. Can't we get rid of the ones not used elsewhere on SL, please?
: Why there are all these aliases
(2006-08-03 09:55) [#2001]
The whole point of the aliases is as a "safety net" of sorts. Who knows if some guy creates a Wikilink to his or her favorite spelling, then says "Hey, there's no article! Let's create one right now!" and then uses it. Maybe it doesn't happen so often on SL, as SL isn't used that much. But it potentially can happen.
Also, getting rid of all of the ones not used on SL is misleading, because I have set up some links so even though they are labelled with one romanization, it actually redirects to another. It's very easy to do.
For instance: [Toya Meijin|Kouyou Touya] becomes Kouyou Touya
Frankly, I have to create less aliases for these articles than I have to for Wikipedia - SL does not differentiate spacing, dashes, and apostrophes - And it does not allow article names with diacritical marks (macrons and circumflexes, which are often used in romanization of Japanese), while Wikipedia does.
As for why Toya Meijin has all of the aliases:
- The character is known by a title as well as his full name - He is called "Koyo Toya" AND "Toya Meijin" - And his name has to have the Japanese-order versions (e.g. "Toya Koyo") redirect too.
- In addition, his given name, "Koyo" ("Kōyō") has two long "ou"s in addition to having a family name, "Toya" (Tōya) with one long "ou". That, folks, creates endless romanization possibilities. As for using romanizations like "Kouyo", where one long "o" is seen but the other isn't, that's not unusual - The German-language Hikaru no Go (published by Carlsen Comics) uses "Kouyo Tohya" (remember this is from "Tōya Kōyō) - if you type "Kouyo Tohya" into Google, you will get many German-language hits - In addition the Kuroki Go board company's founder, Soujiro Kuroki, would have his name conventionally romanized as Kuroki Sōjirō''.
- As a second note, the reason why I differentiate between hiragana "ou" and "oo" (even though they both romanize to ō) is simple - If someone has an "ō" that resolves to hiragana "ou", "ou" is an acceptable romanization (e.g. I have seen "Touya Akira" a lot since the "ou" is from the hiragana of Tōya) - BUT if the name is hiragana oo (think "Tohru Honda" (本田 透 Honda Tōru) of Fruits Basket), "ou" is NOT an acceptable way of writing the name.
: Re: Why there are all these aliases
(2006-08-03 03:24) [#2002]
We don't need aliases where the "h" Romanizations substitute for "u" Romanizations because relatively nobody who reads Hikaru no Go will encounter them. Even if we lived in the fantasy world in which "h" Romanizations were valid, the "u" or "(nothing)" Romanizations are so common and popular that such a "safety net" is useless. Even nets have holes in them (to let the little fish escape capture). Aliases are not threading for a "safety bag".
: Re: Why there are all these aliases
(2006-08-03 03:36) [#2003]
It's fairly common to see "h" romanizations (like "Oh", or "Tohya"). They are called "Passport Hepburn", and the Japanese government allows them on passports - http://www.seikatubunka.metro.tokyo.jp/hebon/
And German people who read their versions of Hikaru no Go DO encounter them. True, the said romanizations don't occur in the official English version, nor do they happen in most of the scanlations that were produced before HnG got licensed. Even so, one romanization used in Roman script can be used in many Indo-European languages. So a German person could use "Akira Tohya".
: unnecessary aliases
(2006-08-03 06:23) [#2004]
I also believe that this page has too many aliases, we could do without a lot of them.
ViciousMan, if someone were to create a page in one of the alternative romanizations, it's easy to merge the contents of that page into the main page, and turn the new page into an alias at the time. Besides, that would also indicate that the alias is needed. Just my 2 cents.
: What's the problem?
(2006-08-03 08:35) [#2009]
What is the problem with having many aliases?
: Re: What's the problem?
(2006-08-03 22:13) [#2012]
Not a big problem, I just find it a bit confusing, and unnecessary. I think using fewer aliases, and creating new ones when they were shown to be necessary would be a better system than trying to cover every romanization possible .
: Re: unnecessary aliases
(2006-08-05 04:50) [#2028]
I completely agree with you, Phelan.
: ((no subject))
(2006-08-03 16:12) [#2010]
I asked this on one of the more technical pages as well, but wouldn't be possible to have a more intelligent search function? More google-like? This might do away with most of the aliases.
On top of that it would make searching a lot easier. For me this one of the biggest weaknesses of Wikipedia, the search funtion there is quite crappy.
: Re: ((no subject))
(2006-08-03 22:53) [#2013]
I sometimes miss a 'search with google' in the search page. The one we have works fine, but sometimes it's hard to find what you really want.
: I want something Googlelike in Sensei's Library
(2006-08-03 23:28) [#2014]
That works fine, but it is not 'internal'. Also google tend to exagerate sometimes.
is it that hard to get thi set up internally? So if you misspel soemthing, you still come up with the correct page for example (good for aliases)?
: how google SL
(2006-08-04 02:53) [#2016]
- Goto http://www.google.com/advanced_search?hl=en (you can get here by clicking on the advanced search link next to the text box on the main google search page.
- enter " http://senseis.xmp.net" into the only return results from the site or domain
- enter your search terms near the top of the page. Voila! Google searching that only returns search engine hits within the SL domain.
I believe it is possible for SL to include a google search text box on SL pages that will do the steps above for you.
P.S. this is what erislover did to make his example link above.
: Re: how google SL
(2006-08-04 09:21) [#2019]
Yes, what I say I missed was that, some box that did all those steps for us (like wikipedia when their search is offline): I think is quite hard to handle misspelling, a few times I've been thinking about it and could'nt find a good way in a program to do it easily. But I'm not an expert on this.
And about no results... usually you have to write one correct word, like only 'kobayashi'. The way google handles misspelling is quite statistical... It won't say you misspelled... let me think, 'eignevector' because it isn't misspelled usually (and BTW there are pages with it :D), and it's not a common word.
: Re: how google SL
(2006-08-04 14:41) [#2022]
In the normal Google search box just include the text site:senseis.xmp.net in order to restrict the search to this domain. That is what I did to make my link. No need to go through the advanced pages. :)
22.214.171.124: Using the soundex algorithm in mysql
(2006-08-04 17:05) [#2025]
I think that this problem could be solved by having the search function also use the mysql soundex of words. This would be implemented somewhat like this:
SELECT pagename FROM pages WHERE pagename SOUNDS LIKE 'Toya Koyo'
For a description of the soundex algorithm, see http://en.wikipedia.org/wiki/Soundex
For MySQL docs on this, see:
Since the algorithm will also drop 'h', all the folliwing words will have the same soundex: koyo, kouyou, kooyoh, kouyoo kohyo, etc.
: Re: Using the soundex algorithm in mysql
(2006-08-06 03:06) [#2029]
Here are a few functions to program in the algorithm:
Hepburn is defined as the modified Hepburn romanization system used by the United States Library of Congress. There are many variants out there.
Standard and wāpuro romanizations
- Long a (ああ, アア, a-a) can be "a", "aa", or "ah" (Hepburn standard is "ā")
- Long o (おお, オオ, o-o) can be "o", "oo", or "oh" (Hepburn standard is "ō")
- Long o (おう, オウ, o-u) can be "o", "oo", "ou", or "oh" (Hepburn standard is ō)
- Long u (うう, ウウ, u-u) can be "u", "uu", "uh" (Hepburn standard is "ū")
- Long e (ええ, エエ, e-e) can be "e", "ei", "ee", "eh" (Hepburn standard is "ei" for Japanese and Chinese origin, "ē" for foreign)
- Long i (いい, イイ, i-i) can be "i", "ii", "ih" (Hepburn standard is "ii" for Japanese and Chinese origin, "ī" for foreign
- Sha/Sya (しゃ, シャ) can be "sha" or "sya" (Hepburn standard is "sha")
- Shi/Si (し , シ) can be "shi" or "si" (Hepburn standard is "shi")
- Sho/Syo (しょ, ショ) can be "sho" or "syo" (Hepburn standard is "sho")
- Shu/Syu (しゅ, シュ) can be "shu" or "syu" (Hepburn standard is "shu")
- Ji/Zi (じ , ジ) can be "ji" or "zi" (Hepburn standard is "ji")
- Ji/Zi/Di (ぢ , ヂ) can be "ji", "zi", or "di" (Hepburn standard is "ji")
- Zu/Du (づ, ヅ) can be "zu" or "du" (Hepburn standard is "zu") (This is not the same as "Zu" (ず, ズ).)
- Ja/Zya (じゃ , ジャ) can be "ja" or "zya" (Hepburn standard is "ja")
- Jo/Zyo (じょ , ジョ) can be "jo" or "zyo" (Hepburn standard is "jo")
- Ju/Zyu (じゅ , ジュ) can be "ju" or "zyu" (Hepburn standard is "ju")
- Ja/Zya/Dya (ぢゃ, ヂャ) can be "ja", "zya", or "dya" (Hepburn standard is "ja")
- Jo/Zyo/Dyo (ぢょ, ヂョ) can be "jo", "zyo", or "dyo" (Hepburn standard is "jo")
- Ju/Zyu/Dyu (ぢゅ, ヂュ) can be "ju", "zyu", or "dyu" (Hepburn standard is "ju")
- Cha/Tya (ちゃ, チャ) can be "cha" or "tya" (Hepburn standard is "cha")
- Chi/Ti (ち, チ) can be "chi" or "ti" (Hepburn standard is "chi")
- Cho/Tyo (ちょ, チョ) can be "cho" or "tyo" (Hepburn standard is "cho")
- Chu/Tyu (ちゅ, チュ) can be "chu" or "tyu" (Hepburn standard is "chu")
- っち is standard as "tcha" but is often written as "ccha" for stylistic reasons (etc for other tchs)
- Jo/Zyo and Jo/Zyo/Dyo can be "Jyo"
- Ja/Zya can Ja/Zya/Dya be "Jya"
- Ju/Zyu can Ju/Zyu/Dyu be "Jyu"
- Zu/Du can be "Dzu" (e.g. Adzuki and Kudzu")
- Ra, Re, Ri, Ro, Ru, Rya, Ryo, Ryu can be "La", "Le", "Li", "Lo", "Lu", "Lya", "Lyo", "Lyu"
- ん is sometimes rendered as "nn" instead of "n"
- i and o historically had been used to represent the Yōon sound. E.G. Tokyo was historically called "Tokio", Kyoto "Kioto", etc.
- We (ゑ) was rendered "Ye" - The pronunciation used to be "We" but changed to "Ye" shortly before the kana fell out of use.
Also, n followed by some consonants can be rendered "m" in original Hepburn but is "n" in modified Hepburn - e.g. tenpura = tempura
Also, "E" had been changed to "ye" historically - names like Ieyasu and Inoue have been seen as "Iyeyasu" and "Inouye"
In addition, sometimes Ka is rendered "Ca" and Ko as "Co", etc.
: Using unnecessary and erroneous aliases
(2006-08-06 15:13) [#2035]
Only alias the Romanizations used on Sensei's Library!!!!!!!!!!!!!!! That is their purpose!!!!!!! Aliases are names that pages are "also known by" within the SL world. They are not the complete set of possibilities for the name of a page!!!
: Re: Using unnecessary and erroneous aliases
(2006-08-07 02:42) [#2037]
"Only alias the Romanizations used on Sensei's Library!!!!!!!!!!!!!!! That is their purpose!!!!!!! Aliases are names that pages are "also known by" within the SL world. They are not the complete set of possibilities for the name of a page!!!
As long as SL remains an open wiki, visitors will use romanizations other than what SL uses (due to lack of knowledge of SL preferences, or any other reason).
In fact, the whole purpose of these redirects is NOT to reinforce other romanizations; on the contrary - It serves to reinforce what SL uses. When alternate usages are found, unless the link is a part of a personal comment, the usage is changed to fit SL's convention.
After all, even though Wikipedia has name order conventions, the article for the naming convention specifically says to redirect from the OTHER order.
However, I do not use every single romanization possible for every article - Instead I use romanizations that I encounter on the internet. I Google test them to see if someone else uses them. If not, I do not place a redirect.
In fact, in that post back there, I wasn't telling everyone that all of the conventions must be used in redirects at all times. I was telling a person how to program an SL search engine MySQL index - He ought to throw all of these variants into the search engine programming.