Published onJune 19, 2018Feature Selection Using Feature Importance Score - Creating a PySpark Estimatorpythonsparkbig-dataExtending Pyspark's MLlib native feature selection function by using a feature importance score generated from a machine learning model and extracting the variables that are plausibly the most important
Published onApril 8, 2018Creating a Custom Cross-Validation Function in PySparkpythonsparkbig-dataCustom cross-validation class written in PySpark with support for user-defined category such as by time, geographical or consumer segments.