I am working for a travel site, where i need to store the tourist spots which tourists traveled to. I need the spots to be unique in the locations table so that i can know the popularity of a particular spot etc.
I will also need also need all countries, states, cities stored with me because i cannot depend on user input.
The database is MySQL.
Seeing the data sets available for such locations i see there is a problem of nesting of cities across countries which may use provinces, states, counties etc.
So, my question is how to design the schema so that i can store all the locations.
I was thinking about having tables for countries, states, cities, and spots.
the spots table will contain spot_name, cityId, stateId, countryId, and some fields to have longitude and latitude bounds.
This way i can identify same spots by their geopositions.
But again, this solution won't work because of the states/provinces/counties etc. problem.
Can you please suggest how to build the schema and go about seeding it with correct data so that dependency on user input is minimum.
解决方案
you should use a geospatial database as then you can store your locations like countries and states as spatial entities and so can determine the nesting correctly.
If you can't use one you can simulate geospatial positions using strings in a normal table by dividing the world up into a grid, then subdividing each square of the grid recursively.
For example divide the world into 9 squares, numbered 1-9 from top left to bottom right. Anything which is in these large squares has only a single digit reference. Then divide each square into 9 and anything which is at this level has a 2 digit reference. so 11 is the top left square and 99 is the bottom right square.
Repeat this process until you have the precision that you need. a single feature might have a reference 10 digits long 5624357899 but you would know that this would be inside any larger feature which is fewer digits which starts with the same string like 5624357. So your countries would have fewer digits because they are larger, but your individual locations would have more because they are smaller and more accurately located.
This will only give you a course approximation of location (and will be bad for long thin features) but might be suitable enough
The first grid will look like this:
______________________________
| | | |
| 1 | 2 | 3 |
| | | |
|_________|_________|_________|
| | | |
| 4 | 5 | 6 |
| | | |
|_________|_________|_________|
| | | |
| 7 | 8 | 9 |
| | | |
|_________|_________|_________|
The second round looks like this (only first square completed for simplicity):
______________________________
|11|12 |13| | |
|---------| 2 | 3 |
|14|15 |16| | |
|---------| | |
|17|18 |19| | |
|_________|_________|_________|
| | | |
| 4 | 5 | 6 |
| | | |
|_________|_________|_________|
| | | |
| 7 | 8 | 9 |
| | | |
|_________|_________|_________|
you repeat this process until you have fine enough approximation for your purposes.