To explain our approach to you, we will take you through the step-by-step plan of the Data Science Lifecycle. This lifecycle, designed by Microsoft, provides a lifecycle to structure the development of your data science projects. The lifecycle outlines the steps, from start to finish, that projects usually follow when they are executed.
The lifecycle outlines the major stages that projects typically execute, often iteratively:
- Business Understanding; in this phase we explore the local environment of the organization and determine the variables to be included in the first model. We often do this during a Kick Off workshop. After we’ve decided what parameters should be in the model we identify the relevant data sources that the business has access to or needs to obtain.
- Data Acquisition and Understanding; in this we want to produce a clean, high-quality data set whose relationship to the target variables is understood. We locate the data set in the appropriate analytics environment so you are ready to make the model. In this phase we also work on data preprocessing; a data mining technique that involves transforming raw data into an understandable format.
- Modeling; the process for model training includes the following steps and will be repeated until a serie of accurate algorithms are found:
- Split the input data randomly for modeling into a training data set and a test data set.
- Build the models by using the training data set.
- Evaluate the training and the test data set. Use a series of competing machine-learning algorithms along with the various associated tuning parameters (known as a parameter sweep) that are geared toward answering the question of interest with the current data.
- Determine the “best” solution to answer the question by comparing the success metrics between alternative methods.
- Deployment; after you have a set of models that perform well, we can operationalize them for other applications to consume. Depending on the business requirements, predictions are made either in real time or on a batch basis. To deploy models, we expose them with an open API interface. The interface enables the model to be easily consumed from a dashboard and back-end applications.
- Customer Acceptance; the customer should validate that the system meets their business needs and that it answers the questions with acceptable accuracy to deploy the system to production for use by their client’s application. All the documentation is finalized and reviewed. The project is handed-off to the entity responsible for operations.This entity might be, for example, an IT or customer data-science team or an agent of the customer that’s responsible for running the system in production.
The above steps are part of our set-up phase. After the customer accepted the tool, we bring the software in production where we maintain and host the software and update it with new possible features.
Request a demo to find out how Genius can automate your decision making at scale.