David Skinner
Principal Analyst
B Economics (Hons)
M DataSc (Prof)
M FinPlan
Over 20 years of experience
David is the Principal Analyst of Daskin Data Science & Analytics and has decades of experience working within regional North Queensland, undertaking data analytics and financial analysis.
He has over 20 years of experience across all areas of expertise including evidenced based business planning and decision making, strategy development and project management, data analytics, financial modelling and data governance experience in various levels of Government and commercial business sector.
David was born and raised in regional Queensland with over two decades of stakeholder consultation and engagement, international relations, strategic planning and managing multi-disciplinary project teams having delivered workshops, focus groups and business development plans for business clients.
He is tertiary qualified with Bachelor of Economics with Honours and Master of Data Science (Professional) from James Cook University (JCU), also a Joint Academic Certificate of Advanced Data Science from Statistical Analysis Systems (SAS) & James Cook University (JCU), a Graduate Diploma of Financial Planning from University of the Sunshine Coast and Master of Financial Planning from Deakin University.
our mission
To set businesses on a focused direction with performance growth, through evidenced based business planning and decision making.
our VISION
To communicate the value data science and data analytics brings to organizational decision making; how methodologies can be applied within and across sectors or industries to better understand and optimise opportunities; to continuously improve how business and organisations gather, curate, interpret, and utilise data.
our values
Learning, educating, and effective communication to share experiences and generate solutions
Commitment to support the motivation to derive value from data
Ensuring preparations and requirements are in place for data production
Strive for Simple Solutions
Value the power of computation to generate solutions and have them accessible to a growing number of users
Projects
Algae Bloom Prediction Model for Lake Systems+
The project brief was to develop an Algae Bloom Prediction model for a number of lake systems on behalf of a Local Government organisation. The data provided was a single dataset for 21 individual chemical observations of water quality obtained in their water monitoring process. There is 16 years of data in a weekly time series from period June 2006 to May 2022. The algae bloom prediction model was developed using RStudio. Treatment of missing values was necessary. Missingness at Random (MAR) approach was applied as a Multiple Imputation method. The Multiple Imputation by Chained Equations (MICE) approach was the preferred method. Feature Selection was also performed using a variable of importance methods with Random Forest and XGBoost algorithms feature selection analysis method applied was a model comparison between Lasso, Ridge and ElasticNet Regression. The Lasso Regression Feature Importance model and the XGBoost Feature Importance model were applied to the XGBoost algorithm, Lasso model accuracy result was lower than the XGBoost accuracy result. The feature selection process reduced the final model to 10 variables. A Classification prediction model was applied. There were 4 machine learning algorithms used as classification models, Logistic Regression, Adaptive Boosting (Adaboost), Gradient Boosting Model (GBM) and Extreme Gradient Boosting (XGBoost). Model evaluation was undertaken using cross validation and comparing performance metrics. It was identified the data is imbalanced in the class variable, a SMOTE function was applied to the data. The measure I used to determine the best oversampling method is the ROC value. Performance metrics demonstrated that the XGBoost algorithm was performing the best and was applied as the preferred prediction model. The Algae Bloom Prediction model has an application (API) applied using Shiny. The Shiny web framework enables collecting input values from a web page, making those inputs easily available to the application in R, and having the results of the R code written as output values to the web page. The Application Programming Interface (API) for the Algae Bloom Prediction Model can be deployed and shared online via shinyapps.io platform.
Machine learning triage classification, a complement to the modern emergency department +
The project using NSW Health Emergency Department data proposed a model for an interactive triage data tool that predicts a patient’s triage category on presenting to an Emergency Department (ED) and can be used to update a patient’s triage category after each collection of vital observations. An automated triage tool would provide hospital administrators seeking to allocate hospital resources more effectively an opportunity to validate nurse-led decision-making and, if the tool can be improved to predict triage categories with greater sensitivity and specificity, to triage patients in order of treatment acuity. The project team applied various methods such as Natural Language Processing, Feature Engineering, Feature Selection, Multi Model Comparison, and Product and Solution design. Using a multi-stage ensemble modelling approach the project team has shown that machine learning triage classification is feasible, however features and models require further engineering and improvement before predictions approach the accuracy levels of nurse-led decision making.
Pedestrian and Traffic Data Analysis +
The Parking Management of the City of Sacramento wanted to gain information that will assist them in planning, implementing and managing use of curb-side zones along their streets in the CBD. Using data gathered via a live video camera analysis system, they have sample data describing the users of the street zones such as cars, pedestrians, bicycles and other vehicles. The data chunks were each in excess of 3 million observations. The data demonstrated there are a number of issues happening consequentially, to resolve and develop solutions. The data was split into 5 Zones, 1: Pedestrian Sidewalk, 2: Pick Up - Drop off zone, 3: Bicycle Lane, 4: Car, Bus Truck lane, 5: Car Bus, Truck lane. The vision highlighted an issue where cars were double parking in the vehicle lane as the pickup – drop off lane was full, forcing bicycles into the pedestrian sidewalk causing risk of injury to pedestrians. Using Python and RStudio programming languages, Regression methods applied are Linear Regression, Ridge Regression, Logistic Regression, Random Forest Regression and Correlation Matrices. Evidence from Regression methods proved that volume of vehicles and dwell time in Pick Up - Drop off zones cause instances of double parking. Solutions were recommended to reduce and provide incentive to limit the time and number of cars spending time in the Pick Up - Drop off zone.
Energy Output Prediction +
Predicting full load electrical power output of a base load power plant is important in order to maximize the profit from the available megawatt hours. The task of this Hyper-parameter Optimization (HPO) modelling project is to produce a neural network to predict the net hourly electrical energy output of the plant. The project involved using Python programming language to create a neural network Hyper-parameter Optimisation (HPO) model using Amazons’ SageMaker cloud machine learning service. Using a dataset which contains 9568 data points collected from a Combined Cycle Power Plant over a 6 year period. Data Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. Preprocessing techniques such as type conversion, rounding and normalization were applied to the data before the distributed training and Hyper-parameter optimization tuning was performed. AWS Sagemaker and tensorflow API have been applied to perform Hyper-parameter Optimization (HPO) processes to obtain the best training model. A regression method was applied to obtain best fit measures. The HPO solution used a loss metric as the cross validated accuracy measure, the model The HPO_tensorflow_power_plant model applied a regression method to the model and used Root Mean Squared Error (RMSE) as model evaluation metric. Comparisons of different classification methods of HPO models, highlight the difference in processing time and accuracy.