XGBoost Experiment Manual
机器学习
实验
你可以点击这里查看中文版本。
Motivation of Experiment
- Mastering the principles and characteristics of Boosting method.
- Learning to invoke the interface of XGBoost method to address the two classification problem.
- Understanding the parameters of XGBoost and then adjust the parameters to train.
Dataset
Dataset Download Link:
* Training set: TrainingData.csv.zip
* Valadation set: ValidationData.csv.zip
Dataset Context:
The UJIIndoorLoc database covers three buildings of Universitat Jaume I with 4 or more floors and almost 110.000m2. It was created in 2013 by means of more than 20 different users and 25 Android devices.The database consists of 19937 training/reference records and 1111 validation/test records.
- The 529 attributes contain the WiFi fingerprint, the coordinates where it was taken, and other useful information.Each WiFi fingerprint can be characterized by the detected Wireless Access Points (WAPs) and the corresponding Received Signal Strength Intensity (RSSI). The intensity values are represented as negative integer values ranging -104dBm (extremely poor signal) to 0dbM. The positive value 100 is used to denote when a WAP was not detected.
- During the database creation, 520 different WAPs were detected. Thus, the WiFi fingerprint is composed by 520 intensity values.
- Then the coordinates (latitude, longitude, floor) and Building ID are provided as the attributes to be predicted.The particular space (offices, labs, etc.) and the relative position (inside/outside the space) where the capture was taken have been recorded. Outside means that the capture was taken in front of the door of the space.
- This Dataset also include the information about who (user), how (android device & version) and when (timestamp) WiFi capture was taken is also recorded.
Dataset Content:
- Attributes 001 to 520 (WAP001-WAP520): Intensity value for WAP001. Negative integer values from -104 to 0 and +100. Positive value 100 used if WAP001 was not detected.
- Attribute 521 (Longitude): Longitude. Negative real values from -7695.9387549299299000 to -7299.786516730871000
- Attribute 522 (Latitude): Latitude. Positive real values from 4864745.7450159714 to 4865017.3646842018.
- Attribute 523 (Floor): Altitude in floors inside the building. Integer values from 0 to 4.
- Attribute 524 (BuildingID): ID to identify the building. Measures were taken in three different buildings. Categorical integer values from 0 to 2.
- Attribute 525 (SpaceID): Internal ID number to identify the Space (office, corridor, classroom) where the capture was taken. Categorical integer values.
- Attribute 526 (RelativePosition): Relative position with respect to the Space (1 - Inside, 2 - Outside in Front of the door). Categorical integer values.
- Attribute 527 (UserID): User identifier (see below). Categorical integer values.
- Attribute 528 (PhoneID): Android device identifier (see below). Categorical integer values.
- Attribute 529 (Timestamp): UNIX Time when the capture was taken. Integer value.
Experiment Environment
Individual completion
Experimental procedure
Install and invoke XGBoost
- pip 安装:
- conda 安装:
- (linux) conda install xgboost
Algorithm step
- Using 'pandas' or other packages to load the experiment dataset.
- Combining 'BuildingID' and 'Floor' into a unique property as the target of the forecast.
- Attributes 001 to 520 (WAP001-WAP520) are appropriately processed and converted into data acceptable to XGBoost as data input..
- Setting appropriate parameters for the XGBoost.
- Invoking the Python API of XGBoost to train.
- Predicting the result in the Validation set and requiring that the accuracy of result is above 90 percentage.
Grading standard
Rating item |
Proportion |
Description |
Attendance |
40% |
You can ask for leave from the college if special circumstances exist |
Code is valid |
20% |
Valid code means that there is no syntax error in the code |
Experiment report |
30% |
Mainly checking whether fill in the template of the experiment carefully |
Code specification |
10% |
Mainly checking whether name the variables standardly |
提交方式
Submission process
- Visiting the website 222.201.187.50:7001
- Clicking on the corresponding submission entry
- Filling in your name, student number, and then uploading the report with the format of pdf and the code compression package with the format of zip.
Precautions
- The experiment report and code can be uploaded several times.,and we get the last file you submitted.
- After uploading, you can refresh the page and check if the upload is successful in the file list below.
- Assistant will save all uploaded files at the experimental deadline, and the files uploaded after the deadline are invalid.
- If you write an experiment report in the format of Word, you need to export it to pdf format.
- The package format of the code must be zip. Please do not submit the compressed file in rar format.
- The submission URL can only be accessed on the campus network.
- The code is written in Python, the experimental report's grading standard is: English is better than Chinese, and latex is better than word.
Reference documentation
https://xgboost.readthedocs.io/en/latest/parameter.html (Refer to the parameter description in the content of the website.)
Any comments or suggestions can be directly reflected in the QQ group to the assistants.