Sign Up Now!

Sign up and get personalized intelligence briefing delivered daily.

Sign Up

Articles related to "data"

Machine Learning Basics: Polynomial Regression

  • In this article, we will go through the program for building a Polynomial Regression model based on the non-linear data.
  • In the previous examples of Linear Regression, when the data is plotted on the graph, there was a linear relationship between both the dependent and independent variables.
  • So, in this problem we have to train a Polynomial Regression model with this data to understand the correlation between the Level and Salary of the employee data in the company and be able to predict the salary for the new employee based on this data.
  • The class “LinearRegression” is also imported and is assigned to the variable “lin_reg” which is fitted with the X_poly and y for building the model.
  • In this step, we are going to predict the values Salary based on the Polynomial Regression model built.

save | comments | report | share on

5 Python Code Smells You Should Be Wary Of

  • Code smells are a sign of weakness or design flaw that might cause readability, maintainability, and scalability issues.
  • In the next few sections, we’ll look at a few quintessential Python code smell cases and how to avoid them.
  • Using default arguments is a fairly common exercise in Python wherein you can set predefined values in the function and opt to change it at call time.
  • This is useful when setting literals, numbers, or booleans as it helps you prevent a long list of parameters with redundant values.
  • Hence using None as the default value and assigning the mutable variable inside a function is safer bet as you won’t end with maintainability issues.
  • Not only is it less pythonic since it forces you to access elements through an explicit index variable but there’s also a case of readability issues.

save | comments | report | share on

Tableau: Unleashing the Power of Visual Analytics

  • Confused, I consulted multiple online resources on how to use Tableau and I came across Ben Jones’ Communicating Data with Tableau: Designing, Developing, and Delivering Data Visualizations book.
  • Moreover, I learnt how to use features like filters, pages, marks and others to build high quality worksheets.
  • Finally, I got the hang of utilizing the dashboard feature to build interactive visual dashboards that can be used to tell stories to people.
  • Below is a short summary of how Tableau is used to build beautiful interactive visual interfaces.
  • Another aspect that I love about Tableau is the fact that I can easily build my charts by dragging fields from the table section and integrating them with the features used to create the worksheets.
  • I used my knowledge of Tableau to build a scatter plot of Life Expectancy vs Income and make an interactive dashboard.

save | comments | report | share on

What is a Full Stack Data Scientist?

  • The scope of a full stack data scientist covers every component of a data science business initiative, from identifying to training to deploying machine learning models that provide benefit to stakeholders.
  • A full stack data scientist must be able to identify and understand business problems (or opportunities) that can be solved using the data science toolkit.
  • This skill is essential because useful machine learning models cannot be built without data understanding.
  • Lastly, a full stack data scientist must have the skill to deploy model pipelines to production.
  • Two key underlying skills of the full stack data scientist are the ability to design a system or process and the ability to quickly pick up new technologies.
  • These two elements are the keys to any organization extracting value from data science — solving the right problems and making them accessible to the end-user.

save | comments | report | share on

Model with TensorFlow and Serve on Google Cloud Platform

  • In this guide, we learn how to develop a TensorFlow model and serve it on the Google Cloud Platform (GCP).
  • We consider a regression problem of predicting the earnings of products using a three-layer neural network implemented with TensorFlow and Keras APIs. In this guide, we will use the TensorFlow 2.1.0 and Google colab runtime environment.
  • Google colab offers the training of machine learning models on free GPU and TPU.
  • To define the deep learning model, we use the Keras API module shipped with TensorFlow 2.1.0 We start with a basic model with two intermediate relu layers.
  • Let us read the sample input data and test our model to predict and save the model locally for future use.
  • In this guide, we have learned about training the deep learning models with TensorFlow 2.1.0 and deploying the trained model on the google cloud platform.

save | comments | report | share on

Deep Learning in Healthcare — X-Ray Imaging (Part 4-The Class Imbalance problem)

  • Also, it should be noted, while dealing with medical images, the final accuracy (both train accuracy or validation accuracy) of the model is not the right parameter to base the model’s performance on.
  • Important Note — Oversampling should be done on train data, and not on test data as if test data contains artificially generated images, the classifier results we will see would not be a proper interpretation of how much the network actually learned.
  • This function, creates all the artificially augmented images for normal and viral pneumonia images, till they reach the difference in values from the total bacterial pneumonia images.
  • Now that the class imbalance problem is dealt with in the next part we will look into image normalization and data augmentation using Keras and TensorFlow.

save | comments | report | share on

Tamgucalc: The terminal spreadsheet now with mouse control

  • It allows you to enter numbers, define labels, and most importantly, to enter numerical formulas in Lisp.
  • By default, the spreadsheet uses the dimensions of your terminal to define the number of rows and columns.
  • To enter a formula, simply position on a cell and type: '('.
  • When you type a formula, you can use the arrows to select the cells that go into your formula.
  • When you have selected a cell, press "enter" to save it in your formula.
  • Note that when you define a range that includes several rows and columns, tamgucalc introduces an "&&&" operator to merge all the selected rows into a single data vector.
  • You can define lambdas functions, functions (defun) or simply use the basic operators.
  • Note the use of mat[:1][10:] to extract the full column from cell 10,1.

save | comments | report | share on

Why alternative data is here to stay

  • Alternative data sources, typically the preserve of equity and commodity analysts, has flown the coop through COVID-19, and is now a critical input to global macro.
  • Global macro, which can been adequately defined as the analysis and prediction of economic, political and financial market developments, has also seen some early movers adopt Alt-Data.
  • Those who embraced Alt-Data early on during COVID-19, were able to manage through the volatility of March in the context of quantifiable risk.
  • The moves were extreme, but Alt-Data, both in terms of COVID-19 and measures of social mobility, helped to fill the void and empower decision making.
  • Alternative data sources, typically the preserve of equity and commodity analysts, has flown the coop through COVID-19, and is now a critical input to global macro investing, writes Grant Wilson.

save | comments | report | share on

Artificial Intelligence: Events Around The World (Jul 4)

  • The article explores a fundamental fact of data-based roles, and that is every industry needs individuals that understand data.
  • One trend that the article includes is the normalization of data literacy as a required skill from every professional individual.
  • The article is focused on data science jobs, but this trend applies to every modern job role; everyone simply needs to understand the data that are produced by systems.
  • The new hires will get involved in projects that are associated with machine learning fields such as computer vision, natural processing, speech recognition and data science.
  • Current events have brought to light the bias that AI systems have as a result of their training data, and I believe in the future we might see a more monitored and rigorous approach to the data gathering processes within several machine learning projects and research.

save | comments | report | share on

Geographic Clustering with HDBSCAN

  • In this article, I will illustrate the process of geographic clustering using the HDBSCAN [1] algorithm and the Vehicle Energy Dataset.
  • The trip endpoint information we seek lives in the move table of our SQLite database, and we use the code below to read both trip start and end locations with a single query.
  • The outlier score ranges from zero to one, where zero means that the algorithm is pretty sure that the location belongs to the cluster, while a value close to one says the opposite.
  • For each cluster point, we store the correspondent id (might be -1 for outliers), the geographic location, and the level 12 H3 index.
  • To test the inclusion of a random point in the cluster, we would have to convert its coordinates into an H3 index and compare it to the cluster’s list.

save | comments | report | share on