文章目录
- GOOGLE 手机定位厘米挑战赛选手提到的技巧、方法总结
- 🔥🔥[Winning Solution of Previous Google Smartphone Decimeter Kaggle Challenge](https://www.kaggle.com/competitions/smartphone-decimeter-2022/discussion/322510)
- [How to Approach this Competition](https://www.kaggle.com/competitions/smartphone-decimeter-2022/discussion/323548)
- [Time Synchronization Across CSVs](https://www.kaggle.com/competitions/smartphone-decimeter-2022/discussion/323135)
GOOGLE 手机定位厘米挑战赛选手提到的技巧、方法总结
🔥🔥Winning Solution of Previous Google Smartphone Decimeter Kaggle Challenge
Posted in smartphone-decimeter-2022 7 days agoarrow_drop_up23
First Place
Second Place
Third Place
Fifth Place - Part 1
Fifth Place - Part 2
Fifth Place - Part 3
Sixth Place --Part 1
Sixth Place --Part 2
Seventh Place
Ninth Place
Tenth Place
Eighteenth Place
Nineteenth Place
How to Approach this Competition
Posted in smartphone-decimeter-2022 2 days agoarrow_drop_up14
1 - Understand the format of this dataset, objective, and evaluation
I recommend getting familiar with the structure of the data and going through the competition data tab
The goal of this competition is to predict the exact latitude and longitude coordinates of a car based on GNSS data coming from cell phones. The submission file will then be scored using the mean of the 50th and 95th percentile distance errors (see more details)
2 - Make a baseline
Since the baseline for this year’s competition is in ECEF (Earth-Centered Earth-Fixed) coordinate system, the coordinate system must be converted to BLH to get the latitude and longitude submission baseline.
@saitodevel01 has published an excellent notebook demonstrating how to get to this baseline.
The baseline this year seems to be a public score of 4.87
After you have this baseline, you basically have the path the car took, but the signal is noisy. Now you must correct the baseline and find the true path.
3 - Correct Outliers
Since we are working with data in time series paths, correcting outliers can be especially useful and should probably be one of your first steps. In my notebook, I demonstrate a popular method of outlier correction I saw used in last year’s competition.
4 - Smooth Baseline
Data smoothing uses an algorithm to remove noise from a data set. This can be very useful when working with gnss data. In my notebook, I also demonstrate how to use the savgol filter to smooth the baseline.
There are also many other forms of data smoothing you can try such as the Kalman filter which was popular in public notebooks such as this one by @emaerthin in last year’s competition.
5 - Post Processing
There are several other types of post processing that you can use besides data smoothing. A lot of ideas such as snap to grid were used last year. Reading up on post processing for geospatial data may help give you an edge.
To learn more about gnss data check out the resources in @chris62 discussion
6 - Machine Learning
Machine learning may be a more challenging approach in this competition and post processing may be a simpler and more rewarding approach to start with. Nonetheless, using machine learning effectively may give you an edge in this competition.
See how last year’s 10th place @sai11fkaneko used conv nets, and LGBMs
7 - Hyperparameter Tune
Tuning your post processing methods and models can help you squeeze the best results out of the data. You can use the train baseline and ground truths to help you tune these models. In the notebook linked under the outlier correction and post-processing tabs, you can see how I used Bayesian Optimization to improve the results of the previous methods.
Documentation to skopt
Documentation to optuna
8 - Validation with Train Files
Everything you apply to your test set, you should also apply to your train set. Rather than choosing your post processing and parameters based on the leaderboard, I recommend evaluating with the train set. You can use the groundtruth.csv files to help you.
Last year I remember my private leaderboard rank took a very large drop because I overfit to the leaderboard. This year I’m going to try to avoid that.
If you have any more tips or questions feel free to comment them.
Time Synchronization Across CSVs
Posted in smartphone-decimeter-2022 4 days agoarrow_drop_up5
The device_{gnss,imu}.csv files were generated such that their utcTimeMillis values are synchronized as closely as possible. Specifically:
- The device_gnss.csv uses the GNSS signal arrival time as calculated by the GNSS hardware clock.
- The device_imu.csv uses the Android system clock and, when available, corrects for biases using a GNSS chipset real-time clock.
Since the GNSS chipset real-time clock is not always available in raw GNSS measurements, there will be inconsistencies concerning the synchronization of IMU measurements to GNSS measurements. These inconsistencies tend to be larger for Qualcomm devices (Google Pixel {4, 5}), as their 1Hz GNSS measurements are out of phase with respect to their position reporting. Based on our experience, shifting each IMU measurement ~0.6 seconds forward before matching GNSS measurement may result in better synchronization. For Broadcom devices (Xiaomi Mi 8, Google Pixel 6 Pro, and Samsung Galaxy {S20, S21, S21 Plus}), no such compensation is needed.
Note that the ground_truth.csv files were generated such that their UnixTimeMillis values are the GNSS signal arrival time as calculated by the reference INS system. They are expected to be accurately synchronized with respect to the utcTimeMillis values in their corresponding device_gnss.csv files.