1. What is the reason that the Adam Optimizer is considered robust to the value of its hyper parameters?

  2. Best way to average F-score with unbalanced classes

  3. A2C Loss Function Explosion
  4. Application of Machine Learning to the Automated Theorem Proving

  5. Cross-validation for timeseries data with regression

  6. Optimizing the ridge regression loss function with unpenalized intercept
  7. Calculating F-Score, which is the "positive" class, the majority or minority class?

  8. Decision trees, Gradient boosting and normality of predictors
  9. What happens if A is not invertible in equation Ax=b?

  10. In LDA, after collapsed Gibbs sampling, how to estimate values of other latent variables?

  11. What is the relationship between graphical models and hierarchical Bayesian models?

  12. Number of observations in a node in XGBoost
  13. Natural Language to SQL query

  14. Are these methods suitable for predicting a numeric value?

  15. What is the difference between Conv1D and Conv2D?

  16. Comparability of the negative log marginal likelihood in Gaussian processes

  17. How do the residual blocks prevent exploding gradients?
  18. Feature engineering for fraud detection
  19. Confused about the realizability assumption and equations of upper bound
  20. number of nodes in an unpruned decision tree

  21. Feature scaling/normalization and prediction

  22. How to encode timestamp features toward better meaningful features
  23. Learning Curve - Interpreting Bias Variance with Accuracy

  24. appropriate machine learning algorithm for few (features) variables

  25. Including evolutionary methods in machine learning course
  26. How do I find multiple change points in an online dataset?
  27. Is graduate level probability theory (Durett) used often in ML, DL research?

  28. GP: How to select a model for a classification task, based in overall accuracy and log-marginal likelihood?

  29. How do these matrices form an order-$4$-tensor?
  30. What is the problem with overdifferencing a long memory time series?
  31. How exactly does machine learning theory work/help in practical problems?

  32. predicting x,y position using machine learning
  33. Observation symbols for training a set of HMMs
  34. Bayesian model selection: picking the MAP model by integrating

  35. Neural network non binary output?

  36. How to know that your machine learning problem is hopeless?

  37. Why do we use gradients instead of residuals in Gradient Boosting?
  38. Model that optimizes mean absolute error always gives same prediction
  39. which machine learning approach should I use for generating HTML file based on XML description file

  40. What does the matrix $M = [diag(m_{:,1}),\ldots,diag(m_{:,m})]$ look like?

  41. CNN: Range of filters and activation functions
  42. Is this clear overfitting?

  43. How to set mini-batch size in SGD in keras

  44. What is monotonic classification?

  45. Adjust coefficient pearson as CNN loss function

  46. Finding the closest matching curve
  47. Gradient descent and latent factor in matrix factorization
  48. Variational Autoencoder − Dimension of the latent space

  49. What is the best form (Gaussian, Multinomial) of Naive Bayes to use with (one-hot encoded) features?

  50. Neural Network Trains Fine and Test Predictions are Horrible Bordering on Ridiculous

  51. How to insert feature vectors as additional channels in conditional DCGANs
  52. Lower or higher PCA should be considered as the best PCA
  53. Laplace smoothing understanding implementation
  54. Help to fully understand Convolutional Neural Networks
  55. Difference between feature, feature set and feature vector

  56. Large dataset to take the loan giving decision
  57. How to automatically select the nugget parameter in Gaussian process regression (GPR)?
  58. How to choose the number of features to select the number of features to drop?

  59. More data, to counteract overfitting, results in worse validation accuracy
  60. Classification: how important is the sample-to-feature ratio?

  61. How to handle machine learning inputs that are related but conceptually isolated

  62. Why does my SVM take so long to run?

  63. Difference between "Hill Climbing" and "Gradient Decent"?

  64. Sample Weight in Edward

  65. What are the downsides of bayesian neural networks?

  66. How to choose the correct class encoding approach in classification
  67. Multi-task learning with missing data for one task

  68. Logistic regression with censored labels

  69. Frequency Matching Between Predictor and Response Variable

  70. Decision tree with imbalanced data not affected by pruning

  71. How to transform categorical variable into numerical variable when using SVM or Neural Network
  72. Notation to represent the batch-normalised value of x

  73. Bagging, boosting and stacking in machine learning
  74. What's the three motivations for ensemble learning?

  75. Using Boosting tree to generate feature in sklearn

  76. Artificial neurons based on modelling observed correlations and predicting from them?

  77. Evaluating neural network for certain task
  78. Machine Learning : Classification algorithm for very high dimensional data which is uniquely definable in a very small sub-space

  79. Combine a deep neural network with a convolution neural network

  80. Neural network only converges when data cloud is close to 0

  81. Computation of the marginal likelihood from MCMC samples

  82. Tensorflow Sampled_Softmax_loss - correct usage

  83. Difference between Bayes network, neural network, decision tree and Petri nets

  84. Precision and Recall on equal level

  85. Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

  86. Why do we care about Quasi-norm in Statistics and Machine Learning?

  87. Parameter learning in augmented Bayesian Networks

  88. hyperparameter tuning in neural networks
  89. Difference between Bag of words and Vector space model
  90. Boosted trees and Variable Interactions
  91. Why SGD does not work in approximating circle area formula?

  92. Feature scaling and mean normalization
  93. How are the concepts of correlation matrix and covariance matrix related intuitively

  94. What happens when we feed a 2D matrix to a LSTM layer
  95. Difference between Random Forest and Extremely Randomized Trees
  96. Statistics in the context of Search Engine Optimization (SEO)?

  97. How to use weights with Elasticnet regression in python?

  98. Help interpreting short / truncated calibration curve

  99. Can recurrent neural networks be used to classify the language of a word?
  100. Interpret learning curves: Training error and validation error are low