AI/Machine Learning - RISHABH LALA

Machine Learning | Artificial Intelligence : SOME CONCEPTS (this page is under construction...)

Data is the future gold mine -> gives scientific bases/validations to your intuitions
The difference is Machine Learning is part of Artificial Intelligence
Since mid 2000s big data started coming up and data storage cost came down, people wanted to take fantastic insights from it. Last few years have been age of data explosion. Now, we need algorithms to make sense of data.
Few data feeders systems: CRM (customer relationship model), BIG Data (variety, velocity - 3000 transactions per second, and volume- unstructured data ie. non relational database), and Business Intelligence (building predictive models based on what happened before)
Data mining and Machine learning are forward looking- futuristic
5 V o big data: Volume, Velocity, Variety, Value, Veracity, Variability
In 2025, we would have 175 Zettabytes of data generated globally -> 1000 Exabytes -> meaning data printing on stack of books that can take us to Pluto and back 3500 times
A lot of companies Linkedin, Facebook were valued in Billions because of the data that they were collecting.
1 BIG SUGGESTION: STORE YOUR DATA
JOBS Hoax: With time many programing jobs can become redundant, what will be important is business strategizing, taking output from machine learning models and making business sense of it.
Data Mining can give you insights of the data: Example: Don't go shopping on Friday nights, it's more expensive or some co-relation that may have not been discovered before like relation between diapers and beers.
Credit card fraud detection is a prime example of data mining. Once it happened, Target knew about a girl being pregnant before her parents knew, based on her search history, and purchase pattern like prenatal pills, etc.
Capital one in early 80s pioneered information technology with help of data analysis. They launched balance transfer to steal businesses and also launched variable interest rates based on varying credit worthiness of individuals while covering the risk.
Classification algorithms with supervised learning have the ability to detect credit card frauds. Unsupervised learning in combinations learn continually with Realtime data. Combine this with the power of analysis of confusion matrix (false positives, true positives, etc) it becomes easy to detect credit card frauds.
Nobody knows how neural network works whereas decision tree can be explained based on the input variables and logics.
THE BIGGEST QUESTION: HOW DO WE TRANSLATE BUSINESS PROBLEM INTO A DATA MINING PROBLEM? EXAMPLE: MY CUSTOMERS DON'T STICK WITH ME; WHAT TO DO?
SECOND BIGGEST QUESTION: HOW DO WE OBTAIN THE RIGHT DATA? RANDOM SAMPLES?
What do I do:
- Model building: Variables to collect: Name, age, height, ... all relevant parameters
- Fix Outliers - evaluate to remove them or reason to keep them
The algorithms I have used:
- SVM
- Random Forest

Data Drift: Patterns change, shopping behaviors change, economies change, so there is a continuous need to update and upgrade the model, it's variables, need to be upgraded.
Two types of Data: Primary Data - collected through regular business transactions
Secondary Data:
Binary Classifier: Test Data, Train Data, Validation Data
Positive Prediction and It is a positive -> True Positive
Negative Prediction and it is negative -> True Negative
But model can make mistakes and predict a false positive and false negative.
Accuracy= (True Positive + True Negatives)/Total Predictions
Misclassification Rate = 100% - accuracy%
What is the problem? The problem is all these algorithms give a probabilistic score and leave it upto us to decide whether to accept the risk or not. Example: 0.85 certainty that this transaction is a fraud.
Sensitivity (healthcare) = True Positive Rate = Recall (memory based) -> how many positives compared to the actual positive were actually identified.
Specificity = True Negative Rate = TN/Negatives
There is always a trade off between Sensitivity and Specificity. The more sensitive you get the less specific results may get.
F1Score=2/[(1/Recall)+(1/Precision)]
So, a good model has high accuracy, high specificity, and high sensitivity.
Important Questions to ask: what is the cost of false negative for the scenario or the cost of false positive.
Dependent variable and independent variable.
Expected value calculations and cost benefit analysis are critical to making a real business decision.

MARKET BASKET ANALYSIS
Bundling items together and suggesting the buyers on what items they should/can buy is the result of market basket analysis.
Support and Confidence:
Support for an Item A is 75% when it's in 3 out of 4 shopping carts.
Confidence in a combination of items example A&C together:Support for (A&C)/Suport(A) or Support for (A&C)/Suport(C) meaning, every time a customer picks A, 66% of times he picks C and every time customer picks C, he always picks A =100% (based on given data) respectively.
Rules:
1)
1) Associatie Rule:

intelligent_cloud_computing.pdf
File Size:	344 kb
File Type:	pdf

Download File