Using Part-of-Speech to Analyse Movie Reviews
- Questions like “Is this review positive or negative?”, or “Is this tweet ironic or not?” are very common, so our attention usually focus in practical aspects, like which algorithm to use, or how to transform words in tokens, etc.
- In simple words, the nlp object is the core of the spacy job, as it does the heavy tasks like transforming the words into tokens, lemmas, etc.
- It is an interesting finding that could be further investigated, like using bigrams to discover the common words used before and after “good” according to the sentiment, as well as it opens the door for other analysis that can give valuable information to build reports and dashboards.
- The goal of this article was to show you how to use POS into Data Analysis, demonstrating a practical applications of the concepts we learn when studying NLP.
How to extract facial expressions, head pose, and gaze from any Youtube video
- In this post, I share a free, easy-to-use, and robust alternative to paid services for facial feature extraction using OpenFace, a state of the art tool for facial action unit recognition, gaze estimation, facial landmark detection, and head pose estimation.
- Here I share instructions on how to use a Google Colab Jupyter notebook that allows you to setup OpenFace and extract facial features from any Youtube video without having to install a single package on your laptop.
- If you only have 1 face at a time in your video, then you could use FeatureExtraction instead or FaceLandmarkImg if you’d like to extract features from an image.
- Hopefully, this was an interesting exercise on how you can use Google Colab and OpenFace to extract facial features from any Youtube video in a few minutes (after installation).
Political Data Science: A tale of tweets
- The current political situation in Scotland after the Brexit vote, and most recently, Boris Johnson’s win in the winter General Election of 2019, is very heated.
- I then performed a systematic comparison with a Deep Learning Recurrent Neural Network (RNN) known as Long-Short-Term-Memory (LSTM) Network.
- With Grid Search you set up a grid of hyperparameter values and for each combination, train a model and score on the validation data.
- I passed the combined hyperparameters to the GridsearchCV object for each classifier and 10 folds for the cross validation which means that for every parameter combination, the grid ran 10 different iterations with a different test set every time (this took a while…).
- And if we want a neural network to understand our tweets, we need one that can learn from what it reads and build on it.
The Non-Treachery of Dataset
- There were various efforts for letting AI create art (e.g., a Model, trained on 24k Painting Dataset from Kaggle).
- On one hand, we would like to train an end-to-end deep convolution model to investigate the capability of the deep model in fine-art painting classification problem.
- On the other hand, we argue that classification of fine-art collections is a more challenging problem in comparison to objects or face recognition.
- So does AI has the imagination to recognize art, a non-representational production of the human brain, heart, and hand?
- AI is perplexed as well — the most it can do, is distinguishing between artist styles, depicted objects, and art movement features.
- Exactly: ArtGAN cannot reconstruct famous paintings (in this case because it’s trained on the huge dataset from various art epochs).
- It’s a disruptive force which lets us re-consider the human factor in the concept of art.
On Implementing Deep Learning Library from Scratch in Python
- They provide the necessary recipe to update model parameters using their gradients with respect to the optimization objective.
- On the back-end side, these libraries provide support for automatically calculating gradients of the loss function with respect to various parameters in the model.
- The backward(…) method receives partial derivatives of the loss function with respect to the operator’s output and implements the partial derivatives of loss with respect to the operator’s input and parameters (if there are any).
- The backward(…) function receives partial derivatives dY of loss with respect to the output Y and implements the partial derivatives with respect to input X and parameters W and b.
- This method updates the model parameters using their partial derivatives with respect to the loss we are optimizing.
- Inspired by the blog-post of Andrej Karapathy, I am going to train a hidden layer neural network model on spiral data.
How analytics maturity models are stunting data science teams
- A strong reason why teams get bogged down at the lower end of the maturity model is that management paradigms that make descriptive and diagnostic analytics effective may be a death knell for predictive and prescriptive work.
- As an example, if I am building a machine learning model for predictive maintenance, and find that the available data carries no useful signals, failing after two weeks of experimentation on a laptop is much better than failing with a six month budgeted project and a team of ten.
- To recap: a primary way maturity models damage teams is when companies take the methods of management that worked for delivering descriptive analytics solutions, and impose them on advanced analytics work without modifying the approach to account for data uncertainty.
- It requires mature processes that acknowledge data uncertainty, safe spaces to experiment to de-risk advanced analytics work, proper model operations post go-live and financial models that are tailored for products instead of projects.
Humans are becoming Life Engineers
- The team involved in the research into the development of these biological machines (Xenobots) has drawn motivation from the fact that the current building blocks of human technology are based on materials that degrade and are harmful to both the environment and humans themselves.
- The designing of biological machines also motivates the further exploration and discovery of the potential usefulness of cell-based organisms and technology in medicine, robotics, and other research areas.
- According to Sam Kriegman — a Ph.D. student at the University of Vermont and a member of the research group, nature’s exploration into the possible combination of organisms that can be formed from biological cells is not extensive and we are required to explore areas that nature hasn’t.
- And for us humans, the introduction of computer-designed organisms paves the way for a more steady pace of advancement and innovation in the field of cellular biology and AI algorithms.
Is PyTorch Catching TensorFlow?
- PyTorch can now be run more easily on Google Cloud’s Tensor Processing Units (TPUs) — the fastest way to train complex deep learning models.
- TensorFlow still has more bells and whistles for deep learning in production and on the edge than PyTorch does, but PyTorch is getting closer to feature parity.
- PyTorch and TensorFlow are the two games in town if you want to learn a popular deep learning framework.
- In this article, I’m going to focus on the four metrics that I think matter most: job listings, research use, online search results, and self-reported use.
- I used Google Trends to find the relative number of searches for PyTorch (Software) and TensorFlow (Computer application) in the USA from January 26, 2017 to January 26, 2020.
- PyTorch has taken the lead in usage in research papers at top conferences and almost closed the gap in Google search results.
Underrated Machine Learning Algorithms — APRIORI
- ARM( Associate Rule Mining) is one of the important techniques in data science.
- CANDIDATE_SET — C(k) support_count of each item in the dataset.
- ITEM_SET — L(k) comparing each item in the candidate_set support count to minimum_support_count and filtering the under frequent itemset.
- In stage1, the candidate_set C1 is generated by measuring the support_count of each item in the dataset.
- In this case, there are no low frequent items to minimum_support_count in candidate_set.
- In this case, 31 items in itemset need to be eliminated due to low frequent category i.e., below minimum_support_count.
- Item_set L4 is generated by comparing candidate_set C4 with minimum_support_count.
- Today, we’ve learned how to build the Apriori algorithm and implementing it in Association Rule Mining on a general grocery dataset from a supermarket.
My NLP learning journey
- But a computer needs specialized processing techniques to understand raw text data.
- That’s why NLP attempts to use a variety of techniques to create structure out of text data.
- I will introduce a little bit nltk and spacy, both state-of-the-art libraries in NLP and the difference between them.
- Spacy: is an open-source Python library that parses and “understands” large volumes of text.
- The first step in processing text is to split up all the parts (words & punctuation) into “tokens”.
- And that’s exactly what Spacy is designed to do: you put in raw text and get back a Doc object, that comes with a variety of annotations.
- Given enough data, usage, and contexts, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances.
- LDA was introduced back in 2003 to tackle the problem of modeling text corpora and collections of discrete data.