Published onJune 19, 2018Feature Selection Using Feature Importance Score - Creating a PySpark Estimatorpythonsparkbig-dataExtending Pyspark's MLlib native feature selection function by using a feature importance score generated from a machine learning model and extracting the variables that are plausibly the most important
Published onApril 8, 2018Creating a Custom Cross-Validation Function in PySparkpythonsparkbig-dataCustom cross-validation class written in PySpark with support for user-defined category such as by time, geographical or consumer segments.
Published onMarch 25, 2018Uploading Jupyter Notebook Files to BlogdownpythonblogdownSimple guide to convert jupyter notebooks to markdown posts which can be published in your favourite static site generator
Published onFebruary 26, 2018Notes on Regression - Approximation of the Conditional Expectation FunctionregressionolsnotesDeriving the OLS formula as a means of approximating the conditional expectation function