Sign Up Now!

Sign up and get personalized intelligence briefing delivered daily.


Sign Up

Articles related to "data"


Trump praises party-switching Rep. Van Drew, decries impeachment in blue-state campaign rally

  • — President Donald Trump praised Rep. Jeff Van Drew, the newest Republican in Congress, as he denounced Democrats' impeachment efforts at a raucous campaign rally in New Jersey on Tuesday.
  • During his visit to the coastal resort town of Wildwood in Van Drew's Republican-leaning district Tuesday night, Trump depicted Van Drew as a symbol for Democratic dissatisfaction with party leadership.
  • New Jersey is among the only solid-blue states in which Trump has held a 2020 campaign rally.
  • House Democrats launched the impeachment inquiry following a whistleblower's complaint about Trump's July 5 call with Ukraine President Volodymyr Zelenskiy.
  • Trump's defense team completed their opening statements Tuesday afternoon, wrapping up a series of arguments to convince the Republican-led Senate to acquit the president of the two impeachment charges against him.

save | comments | report | share on


Exactly how Facebook stalks you - and it's creepier than you thought

  • Facebook is giving us a new way to glimpse just how much it knows about us: On Tuesday (Wednesday AEDT), the social network made a long-delayed "Off-Facebook Activity" tracker available to its 2 billion members.
  • You don't necessarily have to be logged in to the Facebook app or website on your phone - companies can report other identifying information to Facebook, which will marry up the activity to your account after the fact.
  • The social network also doesn't pass your personal information back to businesses - they just get the chance to target ads to people with Facebook accounts who triggered the trackers.
  • The Washington Post says it stopped using the Facebook tracking pixel, along with some other social-networking trackers, on content pages as of Oct. 24.
  • Facebook says companies are required to provide us "robust notice" that they're sending data about our activity to the social network.

save | comments | report | share on


Introduction to Linear Regression

  • We do see some other correlations between verbal and status for example, however, since we are trying to find a solution to a specific problem, we can focus on gamble response variable and income dependent variable(predictor).
  • Before we start looking to create regression model, predictive value, errors, intercept or slope, we should look to see if the current relationship between income as the predictor and gamble as the response variable meets certain assumptions and conditions that is required for linear regression.
  • In order for us to create a linear regression model, we need to make sure the relationship is linear, there is independence of errors meaning, the residuals are not influencing each other and they are not following a certain pattern, there is homoscedasticity between income and gamble so that the data does not look like a funnel and normality of error distribution where the observations are mostly around the predicted value and evenly distributed.

save | comments | report | share on


A Brief Tour of Scikit-learn (Sklearn)

  • Scikit-learn is a python library that provides methods for data reading, data preparation, regression, classification, unsupervised clustering, and much more.
  • We can see that random forest performance is much better than linear regression.
  • We can further improve performance by optimizing parameters in random forests.
  • Feel free to train and test on the full data set for a more suitable comparison of performance between models.
  • We see that support vector regression performs better than linear regression but worse than random forests.
  • Similar to the random forest example, the support vector machine parameters can be optimized such that error is minimized.
  • We see that k-nearest neighbors algorithm outperform linear regression when trained on the full data set.
  • In another post, I will outline some of the classification methods that are most common in the python machine learning library.

save | comments | report | share on


Geolocations and geocodes instrument set for data analysis

  • You need to extract key data, get the necessary details, visualize data points on the map and prepare them for the analysis or some learning algorithm.
  • I want to share the instruments set I use for such tasks on the example of my native city.
  • It has very convenient endpoints to search different venues (like tourist places, cafes, bus stations, etc.) by concrete coordinates or within some neighborhood, endpoints to discover venue details, ratings and a lot of other stuff.
  • Our dataset contains a lot of venues with their categories, locations and regions attached.
  • There are a lot of details in the API response, nevertheless, we will use only a few ones: extended categories, number of likes from users, rating and the information if the venue is open for now.
  • By this time, our dataset with venues details connected is almost completed.

save | comments | report | share on


Understand Neural Networks & Model Generalization

  • Training a deep neural network that can generalize well to new data is a challenging problem.
  • When it comes to neural networks, regularization is a technique that makes slight modifications to the learning algorithm such that the model generalizes better.
  • Proper regularization is a critical reason for better generalization performance because deep neural networks are often over-parametrized and likely to suffer from overfitting problems.
  • In other words, this approach attempts to stop an estimator’s training phase early, at the point where it has learned to extract all meaningful relationships from the data, before beginning to model its noise.
  • Batch normalization, besides having a regularization effect helps your model in other ways (allows for the use of higher learning rates, etc.).
  • Indeed, most of the time, we cannot be sure that for each learning problem, there exists a learnable Neural Network model that can produce a generalization error as low as desired.

save | comments | report | share on


Food for Thought — Paper Tuesday

  • Deep neural networks have unparalleled performance in many computer vision tasks like image classification, object segmentation, and image generation.
  • The authors demonstrated that GridMask is less likely to remove (excessive deletion) or keep 99+% (excessive reservation) of the target objects than current methods by a large margin.
  • The authors also tested GridMask on standard datasets like ImageNet -1K and CIFAR10 and demonstrated the potentials of this simple algorithm.
  • Although the 1% improvement is not significant in real-life applications, but it’s quite useful in data science competitions like Kaggle, in which people create gigantic ensemble models just to increase by a small margin.
  • This paper is a reminder to all that we should spend more time exploring the simple tricks rather than engage in an “Arm Race” in which people simply pile up more parameters.

save | comments | report | share on


Introduction to Linear Regression

  • We do see some other correlations between verbal and status for example, however, since we are trying to find a solution to a specific problem, we can focus on gamble response variable and income dependent variable(predictor).
  • Before we start looking to create regression model, predictive value, errors, intercept or slope, we should look to see if the current relationship between income as the predictor and gamble as the response variable meets certain assumptions and conditions that is required for linear regression.
  • In order for us to create a linear regression model, we need to make sure the relationship is linear, there is independence of errors meaning, the residuals are not influencing each other and they are not following a certain pattern, there is homoscedasticity between income and gamble so that the data does not look like a funnel and normality of error distribution where the observations are mostly around the predicted value and evenly distributed.

save | comments | report | share on


A Brief Tour of Scikit-learn (Sklearn)

  • Scikit-learn is a python library that provides methods for data reading, data preparation, regression, classification, unsupervised clustering, and much more.
  • We can see that random forest performance is much better than linear regression.
  • We can further improve performance by optimizing parameters in random forests.
  • Feel free to train and test on the full data set for a more suitable comparison of performance between models.
  • We see that support vector regression performs better than linear regression but worse than random forests.
  • Similar to the random forest example, the support vector machine parameters can be optimized such that error is minimized.
  • We see that k-nearest neighbors algorithm outperform linear regression when trained on the full data set.
  • In another post, I will outline some of the classification methods that are most common in the python machine learning library.

save | comments | report | share on


Geolocations and geocodes instrument set for data analysis

  • You need to extract key data, get the necessary details, visualize data points on the map and prepare them for the analysis or some learning algorithm.
  • I want to share the instruments set I use for such tasks on the example of my native city.
  • It has very convenient endpoints to search different venues (like tourist places, cafes, bus stations, etc.) by concrete coordinates or within some neighborhood, endpoints to discover venue details, ratings and a lot of other stuff.
  • Our dataset contains a lot of venues with their categories, locations and regions attached.
  • There are a lot of details in the API response, nevertheless, we will use only a few ones: extended categories, number of likes from users, rating and the information if the venue is open for now.
  • By this time, our dataset with venues details connected is almost completed.

save | comments | report | share on