File: http://www.isip.piconepress.com/courses/temple/ece_8527/resources/data/set_10

This directory contains two versions of the data:

 /data: the original floating-point data
 /data_binary: a version of the data in which floating-point numbers
    have been converted to binary using the process described below

=====
Below is a strategy to convert set #10 to binary data. Call this
approach #1 (BCD).

These kinds of methods to convert to binary can influence
performance. For example, we could treat the floating-point data as a
binary representation, but that adds a nasty nonlinear transformation
and makes the classification problem much harder.

So let's also try approach no. 2 (LIN): linear quantization. Take the
range [-1,1] and divide it into 1/N segments where N = 2^M and M is
the number of bits. Number the segments sequentially. For example,
consider the range [-1,1] divided into two-bits:

 [-1.0, -0.5] => 00
 [-0.5,  0.0] => 01
 [ 0.0,  0.5] => 10
 [ 0.5,  1.0] => 11

So the data point -0.75, 0.75] would be represented as:

 [00 11] => 0011

I would like you to convert set #10 to these versions:

 #1 (BCD) -  8 bits
 #1 (BCD) - 16 bits
 #2 (LIN) -  8 bits
 #2 (LIN) - 16 bits

and run a few standard algorithms (KNN, RNF). Let's see how
performance on the binary-valued data compares to the
continuous-valued data.

============
Please refer to the table in this paper:

https://www.isip.piconepress.com/publications/unpublished/conferences/2021/ieee_spmb/auto_tuning/paper_v11.pdf

Table #3 summarizes performance. Let's focus on set #10 first.

The raw data is here:

https://www.isip.piconepress.com/courses/temple/ece_8527/resources/data/set_10/data/

There are three csv files. Train on train_03.csv, tune on dev_03.csv,
but only evaluate on eval_03.csv. (Do not adjust parameters based on
the error rates for eval_03.csv). There is a simple scoring script
attached as well.

Each entry is a triple: class assignment, x-coordinate,
y-coordinate. To make this data binary, multiply the x and y
coordinates by 32767 and clip on the range [-32676, +32767]. That
makes each data point two 16-bit values.

If you need them as a single binary-values vector, convert them to a
32-bit bit sequence by essentially taking the binary values of each
number and concatenating them. For example, consider the data point:

  [-0.5, +0.5] => [-16384, +16384] => [11000000 00000000 [01000000
   00000000] => [11000000000000000100000000000000]

The last representation is a unique binary sequence.