File: http://www.isip.piconepress.com/courses/temple/ece_8527/resources/data/set_10 This directory contains two versions of the data: /data: the original floating-point data /data_binary: a version of the data in which floating-point numbers have been converted to binary using the process described below ===== Below is a strategy to convert set #10 to binary data. Call this approach #1 (BCD). These kinds of methods to convert to binary can influence performance. For example, we could treat the floating-point data as a binary representation, but that adds a nasty nonlinear transformation and makes the classification problem much harder. So let's also try approach no. 2 (LIN): linear quantization. Take the range [-1,1] and divide it into 1/N segments where N = 2^M and M is the number of bits. Number the segments sequentially. For example, consider the range [-1,1] divided into two-bits: [-1.0, -0.5] => 00 [-0.5, 0.0] => 01 [ 0.0, 0.5] => 10 [ 0.5, 1.0] => 11 So the data point -0.75, 0.75] would be represented as: [00 11] => 0011 I would like you to convert set #10 to these versions: #1 (BCD) - 8 bits #1 (BCD) - 16 bits #2 (LIN) - 8 bits #2 (LIN) - 16 bits and run a few standard algorithms (KNN, RNF). Let's see how performance on the binary-valued data compares to the continuous-valued data. ============ Please refer to the table in this paper: https://www.isip.piconepress.com/publications/unpublished/conferences/2021/ieee_spmb/auto_tuning/paper_v11.pdf Table #3 summarizes performance. Let's focus on set #10 first. The raw data is here: https://www.isip.piconepress.com/courses/temple/ece_8527/resources/data/set_10/data/ There are three csv files. Train on train_03.csv, tune on dev_03.csv, but only evaluate on eval_03.csv. (Do not adjust parameters based on the error rates for eval_03.csv). There is a simple scoring script attached as well. Each entry is a triple: class assignment, x-coordinate, y-coordinate. To make this data binary, multiply the x and y coordinates by 32767 and clip on the range [-32676, +32767]. That makes each data point two 16-bit values. If you need them as a single binary-values vector, convert them to a 32-bit bit sequence by essentially taking the binary values of each number and concatenating them. For example, consider the data point: [-0.5, +0.5] => [-16384, +16384] => [11000000 00000000 [01000000 00000000] => [11000000000000000100000000000000] The last representation is a unique binary sequence.