1. Understand "standardize" for data preprocessing

  2. Why is the mini batch gradient descent's cost function graph noisy?
  3. Are these methods suitable for predicting a numeric value?

  4. Are the terms Hypothesis Space and Parameter Space Interchangeable?
  5. Statistical approach to compare the calibration between models

  6. How to decide whether a new feature is effective in improving the model?
  7. Conveying uncertainty in accuracy measurements for machine learning models
  8. What is no ' information rate ' algorithm?
  9. Find marginal distribution of $K$-variate Dirichlet
  10. KNN posterior probabilities

  11. How to design a complex machine learning system where individual classifiers can be retrained without modifying rest of the system?
  12. Implementing Balanced Random Forest (BRF) in R using RandomForests

  13. Unscented Kalman filter-negative covariance matrix

  14. Is it possible, to compare classifiers using Dietterich's 5x2cv paired t test and Matthew's correlation coefficient as an "error" metric?
  15. Criteria to select a sample for the prediction of a strange feature

  16. How to partition leave-one-subject-out (not leave-one-example-out) cross-validation in MATLAB?

  17. Are graphical models and Boltzmann machines related mathematically?

  18. What is the difference between off-policy and on-policy learning?
  19. RNN-Model for Extracting real-valued sequences
  20. Bayesian nonparametric answer to deep learning?
  21. When does deep learning fail?
  22. How can I train a CNN on raw numbers?

  23. Paradox in Data Snooping with lucky researchers

  24. On which datasets does AdaBoost algorithm overfit?

  25. What are inferred transformations

  26. SVM: Why do we demand the hyperplanes to be equal $\pm 1$?
  27. Combining the forward and backward algorithms in HMMs

  28. Estimating probabilities conditional on actions with Random Forest

  29. Why is it so important to have principled and mathematical theories for Machine Learning?

  30. What are some useful data augmentation techniques for deep convolutional neural networks?

  31. Training an Artificial Neural Network with limited-memory Quasi-Newton

  32. Difference between supervised machine learning and design of experiments?

  33. Self-organizing maps: fuzzy input?

  34. What is the difference between artificial intelligence and machine intelligence?
  35. What can be the reason to do feature selection based on variance before doing PCA?

  36. Classification probability threshold

  37. Feature correlation and their effect of Logistic Regression

  38. Hypothesis testing performance of replicated model

  39. Heuristic Feature Selection for Gradient Boosting
  40. Understanding the quality of the KMeans algorithm

  41. How can in interpret the SVM summary on e1071 package in R?
  42. How to compare likelihood of models which produce very small or zero probabilites?
  43. In VAEs, why don't we just use a fixed variance for the z distribution?
  44. Why is the validation accuracy fluctuating?

  45. 10-Fold CV F1 score not affected by the hidden layer size? (Neural Network)
  46. When does the accuracy on the test set surpass that of the training set?
  47. Why will the validation set error underestimate the generalisation error?

  48. Relation between precision, recall and sample size
  49. Feature scaling and mean normalization

  50. How do you Interpret RMSLE (Root Mean Squared Logarithmic Error)?

  51. What does average of word2vec vector mean?
  52. What does word embedding weighted by tf-idf mean?

  53. Best machine learning approach for instance prediction in a clinical multidimensional time series?

  54. Difference between Random forest vs Bagging in sklearn
  55. How to improve classification based on 2D distance between classes

  56. Why does gradient descent not work on non-IID data?

  57. Hashing trick vizualisation

  58. Event Correlation

  59. Behavior of AdaGrad without the square root in the denominator
  60. Why is step function not used in activation functions in machine learning?
  61. Is it possible to reduce Bayes error by adding new features?
  62. Using Machine learning for specific "simple" string parsing

  63. Regression Model Predictions with Pseudo-Random Results

  64. What is Recurrent Reinforcement Learning

  65. Discriminant function for diagonal LDA
  66. How to identify support vectors in SGD svm?
  67. lifetime of fraud detection models

  68. State of the art results on Cifar-10

  69. PLS vs. linear regression
  70. What's the hinge loss in SVM?

  71. Are autocorrelated errors an issue for ML models like neural networks?
  72. Mathematical Machine Learning Theory "from scratch" textbook?

  73. Meaning of Splitrule in random forest model
  74. Is it possible to design a global image classifier?

  75. "Stability" of error estimates with cross validated SVMs

  76. Text classification based on keywords

  77. Can SARSA(λ) Be Generalized To Include Off-Policy Learning Off-Line?

  78. Automatic feature building/extraction
  79. Cross entropy applied to backpropagation in neural network

  80. What is the connection between estimation and information theory?

  81. Looking for RNN regression/classification examples using mxnet in R
  82. Machine Learning model for dealing with Curse of Dimensionality
  83. Model for extracting triple of keywords from sentence
  84. How does one design regularizers such that they match the parameters that generated the data when one has strong priors about the parameters?

  85. Calculation of RMSE for random forest model
  86. What is the best way to use Latitude and Longitude features in building a Machine Learning model?

  87. Bernoulli NB vs MultiNomial NB, How to choose among different NB algorithms?

  88. Why are the weights of RNN/LSTM networks shared across time?

  89. How to tune hyperparameters of microsoft LightGBM trees?
  90. Training in one step vs multiple steps

  91. What is the difference between independent variable and a feature?
  92. Conditional Independence and Marginalization
  93. Naive Bayes model diagnostics -- testing independence between features

  94. How to perform CAP curve analysis in R

  95. In deep learning, what is the difference between "disentangled representation" and "distributed representation"
  96. Why use a train/test split with linear regression

  97. Parallel minibatch gradient descent algorithms
  98. Stacking GBT using Logistic Regression
  99. What does variation mean in text corpora?
  100. Anomaly Detection with Dummy Features (and other Discrete/Categorical Features)