Assessment 2: Visualization and data processing – Total marks 40

Assessment 2: Visualization and data processing – Total marks 40
Assessed by:
Outline
The following exercises are designed to assess your understanding of concepts, implementation, and interpretation of topics in Visualization and Data Processing. Some questions may require you to search and use R functions that we have not used so far. In all following questions submit codes and output.
The questions in this assessment may have multiple correct solutions. Almost no statistical background is presumed knowledge for this assessment. All methods required for solution are available on the content pages of Weeks 2-5 of this subject. Some of them have been covered in detail during collaborate sessions.
Submissions
This assessment consists of 11 questions with several sub-questions. Insert code, plots and explanations/justifications in the provided text boxes where indicated. Do not remove the headings in the text boxes. Answers outside the box won’t be marked. Note that you should not need more space than is provided (the text boxes).
Change the file name to your first and last name when submitting to Learn JCU.
Submit as a Word file or a pdf file.

Visualization:
Import the data oneworld.csv (saved in https://drive.google.com/file/d/1dJnK9froCCxCn1PFEbv6svLdKnhRiFCL/view?usp=sharing) into R. The objective in this section is exploring the relationship between GDP categories, Infant mortality and regions.
Q1. Insert your R code to:
Create a new ordinal variable called GDPcat with three categories, “Low” “Medium” and “High”, derived from the variable GDP with:
• The proportion of countries in each GDPcat category is approximately “Low” 40%, “Medium” 40% and “High” 20%.
• The “Low” category has countries with the lowest GDP values and the “High” category has countries with the highest GDP values.
• Remove any missing observations.
Q2. Insert your R code, Plot, and interpretation of the plot:
Using the ggplot2 library, visualise the relationship between GDPcat and Infant.mortality, stratified by Regions, on a single plot. Comment on your plot.
Data Processing: Section Marks 15
Q3. Insert your R code to: Marks (4)
Write an R function to identify the proportion of missing observations in a variable or column of tabular data.
Q4: Insert the code to: Marks (2)
Implement the function from Q3 across all variables of the dataset airquality. This dataset is available in R. Print a list of the variable name with the proportion missing observations in each variable.
Q5. Insert your justification: Marks (2)
Use airquality dataset available in R. Specify a variable from the airquality dataset for univariate missing value imputation. Justify your variable choice based on the count or proportion of missing observations, noting that univariate imputation reduces the natural variation of a variable.
Using base R or dplyr functions (no additional libraries) replace all missing observations in the chosen variable from above with an imputation value. Justify the choice of replacement value. Hint: Read the appropriate section on your Weekly content page to perform this task.
Q6. Insert the code and justification to: Marks (3)
Using base R or dplyr functions (no additional libraries) replace all missing observations in the chosen variable from Q5 with an imputation value. Justify the choice of replacement value. Hint: Read the appropriate section on your Weekly content page to perform this task.
Q7. Insert the code, output and explanation: Marks (4)
Compare the mean and standard deviation of the chosen variable from Q5 before and after imputation. Provide an explanation of the comparison.

Text Analytics: Section Marks 15
Mysterydocs.RData is a collection of unstructured text documents (can be found https://drive.google.com/file/d/1FU2bTUMtqrFizpEQwoz1MQ5Yw2AHRgwe/view?usp=sharing).
The response to the questions below must include comments, where indicated.
Q8. Insert the code and output to: Mark (1)
Import the Mysterydocs.RData file into R and identify the number of documents in the docs dataset.
Q9. Insert the code and output to: Marks (4)
Using methods of Week 5 Topic 2, clean the collection of texts and convert it into tabular data. Use at least 5 cleaning steps, including stemming. Display the last six rows and first five columns (only) of the cleaned tabular data that you created.
Q10. Insert your R code and plot: Marks (3)
Create a subset of the cleaned tabular data from Q9 retaining only those words that have occurred at least 200 times within the entire corpus. Use a visualization tool to show the frequency distribution of words of the 50 most frequent words in the subset data. Hint: Select an appropriate visualization tool from your learnings of Week 3
Q11. Insert your R code, plot, and interpretation of the plot: Marks (7)
Visualise a similarity matrix between documents derived from the cleaned data in Q9. Comment on the visualisation and noting any obvious structure in the similarity matrix as depicted in the plot. For visualisation of the similarity matrix, you may use R functions such as levelplot() or image()or any other suitable plotting function. You would have to research the implementation of these functions

———–

Assessment 2: Visualization and data processing – 40 points total

Outline

The activities below are intended to examine your comprehension of concepts, implementation, and interpretation of Visualization and Data Processing topics. Some queries may require you to look up and use R functions that we haven’t covered yet. Submit codes and output for all of the following questions.

This assessment’s questions may have many right answers. For this assessment, almost no statistical background is assumed. All methods necessary for solution are given on the content pages of this subject’s Weeks 2-5. Some of them have been thoroughly discussed during collaborative meetings.

Submissions

This test consists of 11 questions, each with multiple sub-questions. Fill in the blanks with code, charts, and explanations/justifications.

Calculate the price
Pages (550 words)
\$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with CheapNursingWriter
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
Testimonials
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
Nursing
The paper was EXCELLENT. Thank you
Customer 452449, September 23rd, 2022
Anthropology
excellent loved the services
Customer 452443, September 23rd, 2022
Excellent timely work
Customer 452451, April 19th, 2023
Psychology
Thanks a lot the paper was excellent
Customer 452453, October 26th, 2022
Anthropology
Excellent services will definitely come back
Customer 452441, September 23rd, 2022
Architecture, Building and Planning
The assignment was well written and the paper was delivered on time. I really enjoyed your services.
Customer 452441, September 23rd, 2022
Thank you!
Customer 452451, November 27th, 2022
Excellent service - thank you!
Customer 452469, February 20th, 2023
English 101
Very good job. I actually got an A
Customer 452443, September 25th, 2022
Job well done. Finish paper faster than expected. Thank you!
Customer 452451, October 3rd, 2022
Theology
Job well done and completed in a timely fashioned!
Customer 452451, November 18th, 2022
11,595
Customer reviews in total
96%
Current satisfaction rate
3 pages
Average paper length
37%
Customers referred by a friend