EN: Mostly conceptual — fintech taxonomy, ML categories, NLP, and the data analytics pipeline.
VN: Chủ yếu khái niệm — fintech, machine learning, NLP, quy trình phân tích dữ liệu lớn.
Note: sheet bạn ghi "Simple Linear Regression" — CFA Institute curriculum chính thức cho M11 là Big Data Techniques.
4 V's of Big Data: Volume, Velocity, Variety, Veracity.
Overfitting: Model memorizes training data; performs poorly on new data. Mitigated by cross-validation and regularization.
An asset manager builds a credit-scoring model that uses 200 features. The model achieves 99% accuracy on training data but only 65% on out-of-sample data. Which problem is most likely?