Sign Up Now!

Sign up and get personalized intelligence briefing delivered daily.

Sign Up

Articles related to "towardsdatascience"

Connecting MySQLServer to Your Python Environment

  • Turns out there is a wonderful package for connecting your python environment to the MySQL server.
  • This package makes connecting to an ODBC database easily.
  • In this article, I will walk through how to connect python to a database in the MySQL server.
  • Along with this package we need to use pandas because pandas has a function that makes reading SQL queries really easy in python.
  • By combining these two packages and their functions we can streamline our data call to make the analysis or model pipeline all the more efficient.
  • Now that we are connected and have a query written we need to tell python to execute the query in SQL.
  • We can use the pandas' function read_sql_query to perform this task.
  • Now our SQL query data has been called into python and converted into a pandas data frame for easy manipulation.

save | comments | report | share on

Feature Extraction using Principal Component Analysis — A Simplified Visual Demo

  • When I taught Data Science at General Assembly in San Francisco, I found that helping students visualize the transformation between features and principal components greatly enhanced their understanding.
  • The following demo presents the linear transformation between features and principal components using eigenvectors for a single data point from the Iris database.
  • Well, you can certainly transform the principal components back into the original features by performing the calculation shown in Figure 5 above.
  • In contrast, when we reduce dimensionality through feature extraction methods such as PCA, we keep the most important information by selecting the principal components that explain most of the relationships among the features.
  • In our case, the first and second principal components (i.e., pc1 and pc2) explained more than 95% of the variation from the features based on the normalized eigenvalue associated with each eigenvector, as shown in Figure 8 below.

save | comments | report | share on

Cleaning and Transforming Data with SQL

  • For each row, the program starts at the top of the CASE WHEN statement and evaluates the first Boolean condition.
  • For the first condition from the start of the statement that evaluates as true, the statement will return the value associated with that condition.
  • COALESCE allows you to list any number of columns and scalar values, and, if the first value in the list is NULL, it will try to fill it in with the second value.
  • NULLIF is a two-value function and will return NULL if the first value equals the second value.
  • Another useful data transformation is to change the data type of a column within a query.
  • This is usually done to use a function only available to one data type, such as text, while working with a column that is in a different data type, such as a numeric.

save | comments | report | share on

Build or Buy Data Science Solutions

  • In detail, these solutions provide a framework for: (1) collaboration as a way for non-technical folks to contribute to data projects along with data scientists and data engineers, (2) data governance as a way for team leaders to monitor the machine learning workflows, (3) efficiency as a way to save time throughout the data-to-insights process, (4) automation as a potential way to automate certain parts of the data pipeline to alleviate inefficiencies, and (5) operationalization as a way to deploy data projects into production quickly and safely.
  • In their most basic form, data science solutions enable people within an enterprise organization to (1) use the data to produce machine learning solutions, (2) scale their products by providing transparency and reproducibility throughout the team and within a project, and (3) access all the data and collaborate on data projects in a central hub.

save | comments | report | share on

The Data Science Boom in Esports

  • In the past few years alone, Esports has become one of the most popular forms of entertainment in the world, rivaling traditional sporting events like the NFL Super Bowl to the MLB World Series.
  • This came from a report from Newzoo, an analytics company, which projected that the Esports industry would bring in roughly $1.1 billion dollars in revenue, which is a 26.7% increase from the year prior.
  • Along with data provided by game developers, one of the most popular League of Legends websites for data analytics is a website called Mobalytics.
  • These positions can range in titles calling for Data Engineers to Data Scientists and even Game Analytics, Esports is becoming a hot spot for data science.
  • Gaming companies like Riot Games and other Esports organizations need and are out searching for talented data scientists to help optimize their game.

save | comments | report | share on

Introduction to Papermill

  • It transforms your Jupyter notebook on a data workflow tool, by executing each cell sequentially, without having to open JupyterLab (or Notebook).
  • To illustrate it we are going to develop a Python Notebook to run a simple analysis using a weather forecast API (PyOWM), perform the data wrangling, generate a few visualizations and create a final report.
  • The idea is to create a straightforward workflow to fetch data for a specific city using a Python API called PyOWM, execute the data wrangling, create some plots and organize the information on a pdf report.
  • In the next sessions, we are going to configure our Jupyter Notebook to accept any city as the parameter for the workflow and automatically execute it using Papermill.
  • It carries basically every information needed to document the process, meaning we could use it as a log-like data, to document our workflow execution.

save | comments | report | share on


  • When it comes to Kafka topic viewers and web UIs, the go-to open-source tool is Kafdrop.
  • And there’s a reason behind that: Kafdrop does an amazing job of filling the apparent gaps in the observability tooling of Kafka, solving problems that the community has been pointing out for too long.
  • Conveniently, Kafdrop displays the computed lag for each partition, which is aggregated at the footer of each topic table.
  • In Kafka, this period is usually in the order of tens or hundreds of milliseconds, depending on both the producer and consumer client options, network configuration, broker I/O capabilities, the size of the pagecache and a myriad of other factors.
  • It’s exactly what you’d expect — a chronologically-ordered list of messages (or records, in Kafka parlance) for a chosen partition.

save | comments | report | share on

The toaster who went surfing

  • No, this is not a children’s fairytale, but rather something which happens every day around the world to smart toasters.
  • Such toasters connect to the internet, surf the web for updates and also allow their owners to control them remotely.
  • Welcome to the world of the Internet of Things (IoT).
  • The idea behind IoT is to create devices (such as toasters, fridges, ovens, lightbulbs, doors, cars, etc.) with the capability of connecting to the internet and transfer or receive data over a network without requiring human interaction.
  • The four case studies which use IoT and AI, have been implemented successfully somewhere near you.
  • However, the possibilities of IoT are endless.
  • As can be seen, the possibilities offered by IoTs are endless.
  • In the end, it’s not just the toaster that’s going to surf the internet, but most of our household and industrial devices.

save | comments | report | share on

Which countries put the highest value on human life and health?

  • To learn how countries value the lives of their citizens we should compare actual health expenditures vs ability to pay.
  • You need a bucket that covers a 6-fold range in healthcare expenditures (2% to 12% of per capita GDP) to capture 90% of the countries.
  • It’s not like poor countries (or rich ones) are all forced into a narrow range of expenditures.
  • Second, there is a significant (P < 0.0001) upward trend to this relationship — richer countries are willing to spend more of their resources on healthcare.
  • Despite its ranking at the top of the list, the Marshall Islands spends only $680 per capita on health care.
  • But it doesn’t pour resources into healthcare because it values the lives of its citizens.
  • On this basis, I offer the countries of Serbia, Bosnia-Herzegovina, Paraguay, Ecuador and (especially) Nicaragua as the places where life and health are valued the most.

save | comments | report | share on

On Variety Of Encoding Text

  • So if somebody has to actually put a system to production, his first choice will be USE and then maybe ELMo. Then they try out the embeddings for semantic relatedness and textual similarity tasks.
  • The model makes use of a deep network to amplify the small differences in embeddings that might come from just one word like good/bad.
  • This makes USE(DAN) a great model for classifying news articles into categories but might cause problem in sentiment classification problems where words like ‘not’ can change the meaning.
  • The full ELMo model holds up better, with performance dropping only 7 F1 points between d = 0 tokens and d = 8, suggesting the pretrained encoder does encode useful long-distance dependencies.
  • Second, the performance of ELMo cannot be fully explained by a model with access to local context, suggesting that the contextualized representations do encode distant linguistic information, which can help disambiguate longer-range dependency relations and higher-level syntactic structures.

save | comments | report | share on