Q1

(a)

The new customer will likely NOT use ChatGPT.

(b)

(i)

No. of GoPro CamerasNo. of LaptopsUse_ChatGPT
421
241
200
020

(ii)

Instance 1

Instance 2

Instance 3

Instance 4

Instance 5

The final values are

Q2

(a)

Compute the difference of each data point from their mean vector

Compute the covariance matrix

Compute the eigenvalues and eigenvectors of

When

When

Arrange the eigenvectors in descending order of the eigenvalues

Transform the data points by eigenvector matrix

For each transformed data point only keep the value corresponding to the larger eigenvalues

(b)

Q3

(a)

Note that

He is unlikely to have lung cancer since the probability of him having lung cancer () is less than the probability of him not having lung cancer ().

(b)

  • The Bayesian Belief Network can be very difficult to construct
  • Bayesian Belief Network can be computationally expensive to run for larger and more complex networks
  • The Bayesian Belief Network cannot contain cyclic relationships

Q4

(a)

Advantages

  • Materializations allow for a faster querying time of data
  • We can query data based on the level of materialization

Disadvantages

  • Storage space is required for materializing views, which means there can be additional costs
  • It might take up too much storage space if we materialize too many views

(b)

Iterate the greedy algorithm for k values from 1 to n, until the memory size to materialize the views is larger than the available memory size X. Therefore n would be the largest number of views possible to be materialized within the memory size limit.

Then we choose the value of k, which is the number of views to be materialized, based on the largest benefit, i.e. the largest benefit we could obtain from 1 to n.

Q5

(a)

(b)

(c)

If we take 3 decimal figures, we can arrive at the stopping condition after 14 iterations, i.e. the resulting matrix repeats itself.

By considering the PageRank matrix, we can rank the websites in the order of S, Q, R, P.