Be able to use external data sources within functions

Resources | Subject Notes | Information Communication Technology ICT

ICT 0417 - Spreadsheets: Using External Data Sources

ICT 0417 - Spreadsheets: Using External Data Sources

Objective

Be able to use external data sources within functions in a spreadsheet program.

Introduction

Spreadsheet software like Microsoft Excel, Google Sheets, and LibreOffice Calc offer powerful features for working with data. One crucial aspect is the ability to import and utilize data from external sources. This allows you to expand your datasets, automate data updates, and perform more comprehensive analyses.

Methods for Importing External Data

There are several ways to bring data into a spreadsheet:

  • Text Files (.txt, .csv): Comma-Separated Values (CSV) and Tab-Separated Values (TSV) are common formats.
  • Excel Files (.xlsx, .xls): You can open and import entire Excel workbooks or select specific sheets.
  • Databases: Connect directly to databases like MySQL, Access, or SQL Server.
  • Web Data (using Get & Transform Data): Extract data from websites, APIs, and online services.

Using External Data in Functions

Once data is imported, you can use spreadsheet functions to manipulate and analyze it. The specific functions available depend on the data type and the spreadsheet software being used. Here are some common examples:

1. VLOOKUP

Description: The VLOOKUP function searches for a value in the first column of a range and returns a value from a specified column in the same row. It's useful for retrieving related information from a lookup table.

Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])

  • lookup_value: The value you want to search for.
  • table_array: The range of cells containing the lookup table. The first column of this range must be sorted.
  • col_index_num: The column number within the table_array from which to return a value.
  • [range_lookup]: Optional. TRUE (or omitted) for approximate match (the first column of the table_array must be sorted ascending). FALSE for exact match.

Example: Suppose you have a table of product codes and prices in cells A1:B10. You want to find the price of a product with code 'P123' in cell D1.

=VLOOKUP(D1, A1:B10, 2, FALSE) 

This formula searches for the value in D1 (the product code) in the first column (column A) of the range A1:B10. If an exact match is found, it returns the value from the second column (column B) of the same row.

2. INDEX and MATCH

Description: INDEX returns the value of a cell at a given row and column number. MATCH returns the relative position of an item in a range that matches a specified value. Together, they provide a more flexible alternative to VLOOKUP.

Syntax:

  • INDEX(array, row_num, [column_num])
  • MATCH(lookup_value, lookup_array, [match_type])

Example: Using the same product code and price table (A1:B10), find the price of a product with code 'P123' in cell D1.

=INDEX(B1:B10, MATCH(D1, A1:A10, 0))

This formula first uses MATCH to find the row number where the product code in D1 is located in column A (A1:A10). Then, it uses INDEX to return the value from the corresponding row in column B (B1:B10).

3. IMPORTDATA

Description: The IMPORTDATA function imports data from a URL. The data is expected to be in a structured format like CSV or TSV.

Syntax: =IMPORTDATA(url)

Example: To import data from a CSV file located at https://example.com/data.csv into cell A1:

=IMPORTDATA("https://example.com/data.csv")

The data from the CSV file will be imported into the spreadsheet, starting from cell A1.

4. GETDATA

Description: The GETDATA function retrieves data from a web page based on a specified XPath query. This allows you to extract specific pieces of information from complex HTML documents.

Syntax: =GETDATA(url, xpath_query)

Example: To retrieve the title of a web page at https://www.example.com using the XPath query //title:

=GETDATA("https://www.example.com", "//title")

The title of the web page will be displayed in the cell containing the formula.

5. IMPORTXML

Description: The IMPORTXML function retrieves data from an XML document or an HTML document using an XPath query. It's similar to GETDATA but specifically designed for XML and HTML parsing.

Syntax: =IMPORTXML(string, xpath_query)

Example: To retrieve all the links from an HTML document at https://www.example.com using the XPath query //a/@href:

=IMPORTXML("https://www.example.com", "//a/@href")

The links found in the HTML document will be displayed in the cell containing the formula.

Important Considerations

  • Data Format: Ensure the external data is in a compatible format (CSV, TXT, XML, HTML).
  • Data Cleaning: External data may require cleaning and formatting before it can be used effectively.
  • Error Handling: Implement error handling (e.g., using IFERROR) to gracefully handle situations where data is missing or invalid.
  • Security: Be cautious when importing data from untrusted sources.

Practice Exercises

  1. Import a CSV file containing student names and grades and use VLOOKUP to find the grade for a specific student.
  2. Use INDEX and MATCH to retrieve the price of a product from a product catalog.
  3. Import data from a CSV file containing weather information and create a chart to visualize the data.
  4. Use IMPORTDATA to retrieve a list of articles from a news website.