Machine learning project: split training/test sets before or after exploratory data analysis?

Is it best to split your data into training and test sets before doing any exploratory data analysis, or do all exploration based solely on training data?

I'm working on my first full machine learning project (a recommendation system for a course capstone project) and am looking for clarification on order of operations. My rough outline is this to import and clean, do exploratory analysis, train my model, and then evaluate on a test set.

I am doing exploratory data analysis now - nothing special initially, just starting with variable distributions and whatnot. But I am not sure: should I split my data into training and test sets before or after exploratory analysis?

I don't want to potentially contaminate algorithm training by inspecting the test set. However, I also don't want to miss visual trends that might reflect real signal that my poor human eye might not see after filtering, and thus potentially miss investigating an important and relevant direction while designing my algorithm.

I checked other threads, like this, but the ones I found seem to ask more about things like regularization or actual manipulation of the original data. The answers I found were mixed but prioritized splitting first. However, I don't plan to do any actual manipulation of the data before splitting it (beyond inspecting distributions and potentially doing some factor conversions).

What do you do in your own work and why?

Thanks for helping a new programmer!

Amy

Machine learning project: split training/test sets before or after exploratory data analysis?

Trending Articles

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

Practice Sheet of Right form of verbs for HSC Students

Black Angus Grilled Artichokes

Sexual Assault Alert, Man Wanted in an ongoing Sexual Assault investigation,...

Can I request a sedan if I book full-size luxury suv?

Skint TV teen to be sentenced

Shanike Mcbride

Rapist Malachi Williams in contempt for 'uncontrolled' behaviour...

[GET] Steal My $1,566.66/Month BLACK HAT SEO Method Before It Gets Saturated...

ESENT データベース USS.jtx で、エラーイベント ID 490、454、489、455 が記録される事象について

Principal’s past includes domestic violence case

The 10 Tennessee Cities With The Largest Black Population For 2021

Henrique & Juliano – Manifesto Musical 2 (Ao Vivo) – EP 3 [iTunes Plus M4A]

Bradford County Court News 4/7/2013

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Teenage girl from North Devon suffered panic attacks from being...

Outlook でメールを保存または送信時に...

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

99 God Status for Whatsapp, Facebook

Download New Album: Wizkid – Morayo (Full Album)