Big Data Programming in Mat-lab

Big Data Programming in Matlab
o For each programming question, submit both the Matlab script and
the output.

Answer the following questions.

(1) Implement MDS in Matlab (although Matlab already has coded
MDS in a function called cmdscale.m, I still would like you to do
it by yourself in order to fully understand the steps of MDS). It’s
preferable to write a function for MDS with input being distances
and dimension of target Euclidean space and output being the low
dimensional coordinates and the Kruskal stress score:

function [Y, stress] = mds(X, k).
Afterwards, do the following:

0 Apply your function to a data set (called Chinese CityData.mat)
that contains the mutual distances of 12 Chinese cities to pro-
duce a two-dimensional map (for clarity let’s place the City of
Urumqi in the top left corner). How good is your map?

0 Suppose you had airline distances for 50 cities around the
world. Could you use these distances to construct a world
map?

(2) Download the ISOmap code from the course website (note that the
code provided on the ISOmap website has an error; it has been
fixed in the version I provide). Also, find a real data set from the
Internet (anything but those already listed on the ISOmap website)
to which it makes sense to apply ISOmap (be sure to describe your
data clearly but briefly). Perform ISOmap on your data set and
interpret the low dimensional representation you got.

(3) Use Dijkstra’s algorithm to calculate, by hand, the shortest distance
from the node 0 to every other node in the following graph:

READ ALSO :   Occupational Therapy

28 CONTENTS
(4) This question concerns Kernel PCA with the Gaussian kernel, also
called the Radial Basis Function (RBF) kernel:
x._x. 2
«xix» = 6-” 302’”?
and aims to help you understand what the combined algorithm
does:

0 The value [£(Xz’, Xj) represents dot product between the images
of Xi, Xj in some infinite-dimensional feature space f;

9 Each data point x,- is mapped to a unit vector in .7: (as [£(xi, xi)
1);

o If two points xi,xj 6 Rd are spatially “close” under the Eu-
clidean distance, then their feature vectors gbi, (25,- E J: will have
a small angle;

0 If two points X¢,xj 6 Rd are spatially “far” from each other,
then their feature vectors #57:: qfij E .7: will have an angle close
to 90 degrees;

0 The scale parameter a > 0 defines how far is “far” and how
close is “close”. It is normally chosen to be the average dis-
tance between each point and its kth nearest neighbor in the
data set (say 16 = 8).

Overall, Kernel PCA maps nearby points to unit feature vectors
with small angles and faraway points into orthogonal directions
and applies PCA in the feature space .73, thus preserving all (and
only) local geometry. It does not depend on the shape of the data
set and thus is a general-purpose kernel. For more reference, you
should read the first two papers listed under Kernel PCA on the
course website.

Now, implement Kernel PCA in Matlab as a function and apply
it to the data in kernelpcmdatwmat. Display the two dimensional
representation of the data obtained by Kernel PCA. What do you
find?

READ ALSO :   Academic help online

(5) The Iris data set in the University of California, Irvine (UCI) Ma-
chine Learning Repository (http : //archive . ics .uci . edu/ml/datasets/
Iris) contains 3 classes of 50 instances each, where each class refers
to a type of iris plant. First, download this data set to your com-
puter and use the file script_read_irisdata.m to read it into Matlab.
Afterwards, perform the following tasks:

0 Apply kmeans with 10 restarts to the iris data set to divide
it into three groups. What is the error percentage of your
clustering? Is it good and why?

0 Now, suppose we do not know how many classes there are and
would like to estimate it by kmeans with k : 1,. . . ,6, each
with 10 restarts. Plot the scatter versus 1:. How many clusters
does the plot indicate?