46 thoughts on “How do I select features for Machine Learning?”

  1. Do you have any tips on how to handle datasets where there is a strong class imbalance? (ie. 95% of class A, 5% of class B?) Thanks, these videos are extremely helpful!

  2. I did Recursive Feature Elimination with Cross Validation and Variance Inflation Factor for dimentionality reduction 🙂

  3. Thank you so much Kevin! Your response was very succinct and clear! I actually showed your video to my colleagues during our machine learning Friday sessions at work and we all loved it. It was a timely topic for us since we’re all fairly new to building ML models.

  4. Hey Kevin, Thanks for your videos. They are extremely helpful. I have some knowledge on Python and Tableau and would like to switch my career to machine learning. I have been watching many videos on machine learning but confused from where to start. Please guide me how should I learn it stepwise. Thanks

  5. Hello Kevin, Can you make a video on finding multicollinearity with VIF using sklearn library or may be with some other library.

  6. I am trying to learn machine learning on my own so I can't quite understand the steps you take. So based on what you said about choosing features, if one wants to eliminate features using forward selection should they know beforehand which algorithm they are going to use and try to do forward selection on the specific algorithm? Or should one do forward selection using logistic/linear regression and then having found the significant variables choose an algorithm (e.g Decision trees, kNN,..)? Thanks in advance.

  7. I'm working with a 2000 dimension data, Is it ok to use pca to reduce them to 50 and then use forward feature selection to further reduce to 20 or is it ok go from 2000 to 20 using pca itself??
    Is it ok to use 2000 to 20 pca reduction method?

  8. Awesome lesson! This topic is quite important in text classification while the number of words and phrases extracted from text are somehow overwhelmed.

  9. Hey..thanks for the video. Can you make a video on how to identify multicollinearity, correlation etc from the dataset?

  10. i am a phd student from ALGERIA and i d like to thank u for your helpfull vedeos and the effort you put to do them , can i ask you please to show us an example of how to build train and test an adaboost classifier in scikit learn like u did with knn and please can you tell us can we use SVM as a weak learner for adaboost ?? and how to make that weak learner loop in the classifier and compute those params error alpha of the weak learner and weight update ?? thanks in advance sir

  11. So does that mean we may do this on every dataset, or is it imperative that we do all of this in all datasets?

  12. Hi. Thanks for your nice video. I am from India. I need help.
    If I want to filter data frame based one column with specific value (like: football) where number of times ouwn column value is max. How do I write. Please help.

  13. thank you for your nice video and with good presentation and i have question, have data set but the data does not have Labeled and i want to made feature selection for classification? how can i select features for unlabeled data

  14. Can you please explain in detail about Onehot encoding various features in detail because it would be helpful for many , Thank you

  15. Hi you are a great teacher, very clear! I´m starting with DS and I want to ask you if you have the video of the presentation to share and deepen the topic of dimensionality reduction, thanks in advance, Kika

  16. Hey, I don't quite get this part
    "Tree based feature selection is only useful if that is your model that you're using or you could theoretically use a tree based model to look at feature importance, and then not actually use a tree based model for your model that you're building."
    Why is it? I think that because of those features are important (using tree based) then we can build a great model using tree based algorithm. Or maybe I am missing something here?

  17. Best school too learn. I am learning it by my self as I I don'have enough bills toh py the fee. I have learned complete pandas from you thanks alooot, fantastic work and bless you

  18. Hey, Kevin, your content is great. I did a whole project by taking help solely from your content 😊

  19. Hi Kavin, it was nice going through your videos. They are amazing. my question is please which software do you use for making your video?

  20. Even google can't provide so exact answer to the feature selection as you have comprehended in 10mins!!!!

    Thank you so much!!!

  21. Hi Kevin! Thanks for a very clear explanation. This video is very useful as I'm very new in machine learning.

    I have one question related to the feature selection. I started learning ML by implementing the decision tree. Most of the online tutorials just put all the features into the decision tree and let the DT select the features by itself. However, what if you have tons of features (let's say 100,000 variables), is it better to perform some feature selection before building the DT model? or it doesn't matter since DT can use Gini to automatically select the potential attribute to the model.

  22. Great video. I learned so much in just one short video that would need a huge number of articles. One question, can you use ensemble models like decision trees and random forest to look at the feature importance and then use it to train another machine learning model (Say logistic regression)? Aren't the feature_importance given by an ensemble technique specific to themselves?

  23. hi

    can you please let me know how to start the project in data science for bike sharing in detail with step by step

Leave a Reply

Your email address will not be published. Required fields are marked *