I have database with words and phrases from for exp. English to 15 other languages, and also for every language in that list to other 15. For one pair they are sort for now in one table like this (en -> de):
id_pair
word_en
word_de
What is the best way to create database for that huge list of words and phrases?
I know that I must separate every primary language from others, and was thinking maybe like this:
ENGLISH
ID | WORD
1 | 'dictionary'
GERMAN
ID | WORD
1 | 'lexikon'
2 | 'wörterbuch'
TRANSLATION_EN_DE
ID_EN | ID_DE
1 | 1
1 | 2
Is this the best way to normalize DB? But what is with phrases, I need also if someone enter word "dictionay" that this returns also "This dictionary is good" and translation for that. (I know this can find in first table with sql query, is that best way?)
Also need it alphabetically all time, I will have lot of new entry daily, so I can print couple words before and after the word/phases someone looking for translate.
I'm stuck and cant decide what is the best way to optimize it. These db have all together more than 15gb just text based translation, and around 100k daily req, so every ms worth. :)
Any help will be appreciate, thx!
解决方案
With separate table for each language, you'd need a large number of junction tables to cover all the possible translation combinations. On top of that, adding a new language would require adding more tables, rewriting the queries, client code etc.
It's better to do it in a more generalized way, similar to this:
Regarding the TRANSLATION table, I propose to also create a CHECK (WORD_ID1 < WORD_ID2) and create an index {WORD_ID2, WORD_ID1} (the opposite "direction" from the PK), and represent the both directions of the translation with only one row.
Consider clustering the TRANSLATION table if your DBMS supports that.
Also need it alphabetically all time
The query...
SELECT * FROM WORD WHERE LANGUAGE_ID = :lid ORDER BY WORD_TEXT
...can use the index underneath the UNIQUE constraint {LANGUAGE_ID, WORD_TEXT}.