This lecture is about getting help.
This lecture applies both to this cla-, this course
that you're taking right now, the Data Scientist Toolbox.
But also to all the other courses you're going to be taking in the course track.
So, keep in mind that in a standard class you may have taken in a
class of 30 or 100 people, you would raise your hand and ask a question.
And then you'd be able to immediately get feedback from your instructor.
But in a class like this in a massive online open
class there could be up to a 100,000 people taking the class.
And what you're going to do instead is post your questions to the message board.
And then hopefully, your fellow students
will upload them if they're good questions.
And you instructor will try to respond to as many as possible, but
probably more often than that your peers or community TAs will be responding.
And so, there are three of us that are teaching these nine classes and we
are going to try to put in as much as we can to answer your questions.
But obviously, that's a limited resource.
And so relying on your fellow peers and your community
TAs, we found is a great way to get involved.
We've also learned that the community that's built around the
message boards and the massive online open courses is amazing.
And it's a, probably the best learning part of the entire experience.
And so hopefully, you'll get involved and you'll
be an active participant in those message boards.
It's very clear that the fastest answer is often the one that you find for yourself.
So to try to answer your questions yourself, you should try to
look it up on Google or look it up on Stack Overflow.
If you ask a question that's very simple to Google, you'll
often a get response that says Google it or read the documentation.
Which is not the easiest way to get the answer that you're going for.
An important part of being an active participant
in a community environment here is to, if
you figure out an answer to a question is to post it to the message board.
If you're struggling with a particular part
or structure or idea or art programming exercise.
It's almost a sure bet that there's a lot
of other people that are struggling with the same thing.
And so, they'll really appreciate it if you take the time to post
the message board the way that you figured out how to solve that problem.
So, I thought I'd mention just a few important R functions that will
help you to find answers for some of the questions you might have.
So, when you have an R function, we'll talk a
little bit more about R more later in the class.
You can actually type several different ways you can
type to get the help file for that function.
So, one example is that you can type like this.
You can type ?rnorm and that will tell you
what the help file is for the function rnorm.
You can also search like this help.search.
And if you use help.search, you might not even
necessarily have to get the function name exactly right.
It'll still search through, through the help files and try to find things for you.
And then, if you want to get the arguments for a function,
you can use the function, you can use, the, command args, like this.
Args of rnorm and that'll tell you the function arguments.
These functions are very useful if your goal is to try
to figure out how r is working for a particular function.
But it might not be so useful, if you want
to understand the sort of underlying concepts involved in those functions.
So another thing is you might want to do is
actually look a little bit deeper into the code.
So if you wanted to do that you can actually just type the function
name without any brackets and it will
actually reproduce the entire code for you.
And so what you see here if I type rnorm like this.
Then what I end up getting out on the R console is actually this right here.
I get out sort of all the code that corresponds to that function.
You could also see this link here to a
reference card with a lot of helpful R functions.
So, an important point that you'll run into a lot
in this class is how to ask an R question.
And so there are a few different components
of it that you should keep in mind.
First is, you will want to outline what are the steps
that you have executed in order to create this problem.
So, if you ran three functions in order
you should reproduce what those three functions are.
And then you should say what you expect the output to be.
And then what you saw instead.
So I expected it to give me the answer
to this question and instead, it gave me an error.
And so a really important thing to keep in mind is that R packages and R and all
of these other tools that we're going to be
telling you about are going to be evolving over time.
And so it's really important that you tell
the version of the product that you're using.
So, the version of the package, the version of R
that you're using and then what operating system you're working on.
Whether you're on Mac or Linux or Windows.
When you're asking a data analysis question, there's a similar
set of things that you need to re, re, report.
So first is what is the question you are trying to answer.
You're saying I'm trying to relate variable y to variable x.
And then, what steps or tools do you use to answer it?
This may be a combination of R tools and outside tools and maybe some intuition.
And then, you again, you report what you expected to see.
I expected to be able to tell the
relationship between them and what do I see instead?
I see oh, I don't know, I see some
crazy scatter plot and I don't know what that means.
And so important thing to think, keep in mind here
too is what other solutions you might have thought about.
So sometimes you run through three or four
different things to try to get the right answer.
And so, if you're report what you try or the different things you try, there
when people try to answer your question, they
can go directly to something you haven't tried.
So an important point of asking questions in
highly massive class like this is to make sure
that you're very specific in the titles of
the questions that you're using on the message forum.
So some examples of bad titles are things like this.
So you can say, Help!
I can't fit a linear model.
Then you're not exactly giving a lot of detail as
what exactly your problem is or how it can be addressed?
So, a better question to ask is, sort of saying, okay, I
have this function and it's happening in that version of R 2.15.
And here is the error that's being produced.
It's a seg fault that's being produced.
And is only being produced when I have a
large data set and here's the software that I'm using.
I'm using Mac OS X 10.6.3.
And even better question is to use a title that's a little bit more succinct.
So, here you lead off again, the function that you're asking about, you
say okay, I'm asking about R 2.15 and again it's on this operating system.
And then I very succinctly describe seg fault on large data frame.
So by focusing on the very specific details, it means people
can jump very quickly to the answers that you might need.
So there's similar sorts of questions, specific details you would
want to give when asking questions about data analysis problems.
So, in general, the more specific you are the faster your answer will come.
So there's some etiquette that we would like
to encourage in terms of using these forums.
Or in, in just using help sites in general not necessarily the ones in these forums.
So, again, describe the goal that you have.
What's the question you're trying to answer?
Be very explicit.
Try to provide the minimum amount of information.
If you, you provide way to much information it's very hard for
people to filter through and figure out what their real problem is.
Being polite never hurt anybody and will often get your answer more quickly.
And then follow up and post solutions.
So if you post a question and somebody ends up giving
you the answer on Stack Overflow instead of on the course website.
It's the polite thing to do, post that you found on the course
website so that people can search it and find that answer as well.
Please, please use the forums rather than using personal emails.
We are very excited about trying to help you learn about data science.
But it's very easy to overwhelm the inboxes of your
instructors or community TAs if you all start sending emails simultaneously.
When there's a typo in the assignment, please report it on the
forums and we will address it as fast as we possibly can.
Some things that you shouldn't necessarily do are immediately
assume you found a bug in a major program.
So, saying you found a bug in R and that's why things aren't working.
Groveling as a substitute for your work is obviously not a great thing.
So begging other people to do your work for you.
Please don't post homework questions on mailing lists or on the course forums.
If you post the questions or the answers on the forums
it, it sort of takes away from the experience of everybody else.
And then, you don't want to ask general data analysis questions on R forums.
Those are often redirected back to courses.
So try to keep those who are R course forums, where hopefully there'll
be a big group of interested people all trying to answer the same questions.
So the transfer of these slides go to
Roger Payne who's another instructor in the course track.
He has these getting help videos.
That's a link to his video on YouTube and it was
inspired by Eric Raymond's lecture, How to ask questions the smart way.
There are multiple issues with the code, but the direct problem is that you are being bitten by the factor bug. Compare these values:
So when you run this command, , you will not get a match even though one exists. Try this instead:
The vector was wrapped in the function to give this output.
You will still get a warning because you did not eliminate factors from the beginning. It's difficult to tell where to start fixing the approach, but it may get you through the assignment.
You can start on the right foot by adding two arguments to , we will set to so strings remain as characters. And to . This tells R what to look for in the file to determine missing values.
With this corrective step added, you can now take out all of the and parts. Look what I did with the heart attack section:
Now can just take the value of directly. And doesn't need any special treatment either.
Towards the bottom, you can now take out the last lines that drop the factor levels. I changed the ending to:
Now when the code is run it works without warnings:
Here is a shortened code that will work: