In this course, we study the basics of text mining.The basic operations related to structuring the unstructured data into vector and reading different types of data from the public archives are taught.Building on it we use Natural Language Processing for pre-processing our dataset.Machine Learning techniques are used for document classification, clustering and the evaluation of their models.Information Extraction part is covered with the help of Topic modelingSentiment Analysis with a classifier and dictionary based approachAlmost all modules are supported with assignments to practice.Two projects are given that make use of most of the topics separately covered in these modules.Finally, a list of possible project suggestions are given for students to choose from and build their own project.Who this course is for:Beginners in python and curious about data science
Knows programming in Python and basic concepts of Data Science but cannot practically relate the two
Facilities
Location
Start date
Online
Start date
Different dates availableEnrolment now open
About this course
In this course the students will learn the basics of text mining and will build on it to perform document categorization, document grouping and sentiment analysis
The practicals are carried out in Python language, Natural Language Processing (NLP) is used for pre-processing
Starting from a very small dummy dataset, we migrate to existing databases and then to building a database of your own to performed text mining tasks
Sentiment analysis of user hotel reviews
Beginners in python and curious about data science
Knows programming in Python and basic concepts of Data Science but cannot practically relate the two
Questions & Answers
Add your question
Our advisors and other users will be able to reply to you
We are verifying your question adjusts to our publishing rules. According to your answers, we noticed you might not be elegible to enroll into this course, possibly because of: qualification requirements, location or others. It is important you consult this with the Centre.
Thank you!
We are reviewing your question. We will publish it shortly.
Or do you prefer the center to contact you?
Reviews
Have you taken this course? Share your opinion
This centre's achievements
2021
All courses are up to date
The average rating is higher than 3.7
More than 50 reviews in the last 12 months
This centre has featured on Emagister for 6 years
Subjects
Programming
Word
Project
Ms Word
Course programme
Introduction
5 lectures11:11Course IntroductionCourse IntroductionInstructor's Introduction Introduction of the InstructorCourse OutlineOutline of the course, briefly describing the content of its 9 modules.Course OverviewAn overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.Session 1: Resources
Introduction
5 lectures11:11Course IntroductionCourse IntroductionInstructor's Introduction Introduction of the InstructorCourse OutlineOutline of the course, briefly describing the content of its 9 modules.Course OverviewAn overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.Session 1: ResourcesCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionCourse IntroductionInstructor's Introduction Introduction of the Instructor
Instructor's Introduction Introduction of the Instructor
Instructor's Introduction Introduction of the Instructor
Instructor's Introduction Introduction of the Instructor
Introduction of the Instructor
Introduction of the Instructor
Course OutlineOutline of the course, briefly describing the content of its 9 modules.Course OutlineOutline of the course, briefly describing the content of its 9 modules.Course OutlineOutline of the course, briefly describing the content of its 9 modules.Course OutlineOutline of the course, briefly describing the content of its 9 modules.Outline of the course, briefly describing the content of its 9 modules.Outline of the course, briefly describing the content of its 9 modules.Course OverviewAn overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.Course OverviewAn overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.Course OverviewAn overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.Course OverviewAn overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.An overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.An overview of the project to do at the end of this course is given. It almost covers all the topics that we study during this course.Session 1: ResourcesSession 1: ResourcesSession 1: ResourcesSession 1: Resources
Text Representation
8 lectures20:162.1.1 Theoretical Concepts of Text Representation In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.2.2.1 Structuring One Document Corpus In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.2.2.2 Structuring a Multiple Document Corpus In this lecture, a multi-document corpus is structured to generate a matrix representation with multiple row representing document. A cell value represents the frequency or binary value of the feature in column in the document in row.2.2.3 Setting Parameters In this video, we experiment with the different parameter settings for the constructor CountVectorizer. Setting the binary = 'true' converts all the values that are greater than 1 to 1. The parameters max_df and min_df define the maximum and minimum range of document frequencies for a word. Anything beyond this range is ignored. max_features help to restrict the number of features to a specific number. ngram_range helps to provide a lower and upper limit for ngrams to be considered as features.2.2.4 Using TF-IDF Representation This lecture covers representing the textual documents in a structured format while having tf-idf values for each word in a document. These values are generated with the formula. For any word w in document d,
TF-IDF(w) = TF(w) x IDF(w)
TF(w) = count of w in d / size of d
IDF(w) = log [M +1 / k]
Where M is the total number of documents while k is the count of documents containing w.2.2.5 Reading Data from a Labeled Dataset2.2.6 Using Textual Dataset from UCI Respository Applying the text representation techniques on a dataset from UCI repository.Session 2: Resources
Text Representation.
8 lectures20:162.1.1 Theoretical Concepts of Text Representation In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.2.2.1 Structuring One Document Corpus In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.2.2.2 Structuring a Multiple Document Corpus In this lecture, a multi-document corpus is structured to generate a matrix representation with multiple row representing document. A cell value represents the frequency or binary value of the feature in column in the document in row.2.2.3 Setting Parameters In this video, we experiment with the different parameter settings for the constructor CountVectorizer. Setting the binary = 'true' converts all the values that are greater than 1 to 1. The parameters max_df and min_df define the maximum and minimum range of document frequencies for a word. Anything beyond this range is ignored. max_features help to restrict the number of features to a specific number. ngram_range helps to provide a lower and upper limit for ngrams to be considered as features.2.2.4 Using TF-IDF Representation This lecture covers representing the textual documents in a structured format while having tf-idf values for each word in a document. These values are generated with the formula. For any word w in document d,
TF-IDF(w) = TF(w) x IDF(w)
TF(w) = count of w in d / size of d
IDF(w) = log [M +1 / k]
Where M is the total number of documents while k is the count of documents containing w.2.2.5 Reading Data from a Labeled Dataset2.2.6 Using Textual Dataset from UCI Respository Applying the text representation techniques on a dataset from UCI repository.Session 2: Resources2.1.1 Theoretical Concepts of Text Representation In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.
2.1.1 Theoretical Concepts of Text Representation In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.
2.1.1 Theoretical Concepts of Text Representation In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.
2.1.1 Theoretical Concepts of Text Representation In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.
In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.
In this lecture, the students will learn about the need of transforming textual data into a structured format so that they can be consumed by machine learning techniques for analysis. Textual data is raw and is far from being structured despite of having paragraphs and sentences. In terms of machine learning the data should be in the form of a matrix having rows and columns with each cell holding a numerical value. In this lecture we will look into structuring the textual data and applying different representation schemes on it.
2.2.1 Structuring One Document Corpus In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.
2.2.1 Structuring One Document Corpus In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.
2.2.1 Structuring One Document Corpus In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.
2.2.1 Structuring One Document Corpus In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.
In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.
In this lecture, we implemented the text representation technique with a very basic corpus consisting of only one document. The document is converted in a matrix of one row and few features representing unique words in the corpus.
2.2.2 Structuring a Multiple Document Corpus In this lecture, a multi-document corpus is structured to generate a matrix representation with multiple row representing document. A cell value represents the frequency or binary value of the feature in column in the document in row.
2.2.2 Structuring a Multiple Document Corpus In this lecture, a multi-document corpus is structured to generate a matrix representation with multiple row representing document. A cell value represents the frequency or binary value of the feature in column in the document in row.
2.2.2 Structuring a Multiple Document Corpus In this lecture, a multi-document corpus is structured to generate a matrix representation with multiple row representing document. A cell value represents the frequency or binary value of the feature in column in the document in row.
2...
Additional information
Basics of programming (Any language, python is a bonus)
Basic understanding of Machine Learning
Can code with lists, loops and conditions and have basic understanding of models learning patterns from data