logo
logo
Sign in

How Web Scraping Is Used To Extract Healthcare Data?

avatar
3i Data Scraping
How Web Scraping Is Used To Extract Healthcare Data?

Health information comes in several types and dimensions. A web scraping tool is used for obtaining data from difficult-to-reach locations.


Scraping the Bottom of the Data

Web scraping is an important technique for getting data in a wide range. This is the process of automatically extracting specific information through a website. We create a program, or the "bot," that crawls the backends of the website and extracts data in a usable format.

The dataset obtained has been used in several sectors. For example, a bot can be developed that retrieves stock prices of a specific date, monitoring temperature daily, or the commuters found on London Underground. Data scraping allows to add some more characteristics to the data and build better datasets quickly.

The medicine industry found potential in data scraping services as it has huge data that needs to be worked on. For example, collecting data on genetic variants from the internet or datasets with side effects of medication.

The Non-Technical Overview

In this, web scraping is done by the Python Selenium module as an example. Have a look at how it works using the below-mentioned situation:

Generate code that obtains automated prescribing data and retrieves details of the drug from the NHS website.

Prescribed Medication is the code.

Beginning by importing libraries:

#import library
import pandas as PD 
import NumPy as np
#!pip install selenium
from selenium import webdriver

Then the data can be imported in the required format.

The list of medications prescribed to a patient in the National Health and Nutrition Examination Survey. It is a cross-sectional health dataset that studies the fitness and nutritional condition of samples from the US population and is open to the public.

#import drug chart
df = PD.read_csv('drug_chart.CSV)

It's worth noting that the medications are brought as a string which makes it easy to handle the list of prescribed drugs for each patient.


#prescription list formation 
def prescription_list(row):     """ This function returns a list of all the prescription       medication an individual is prescribed"""    if row['Prescriptions'] is np.nan: 
        return(np.nan)
    else: 
        drugs = row['Prescriptions'].split(", ")
        
        drugs_list = []
        
        for i in drugs: 
                drugs_list.append(i)      
           
        return(drugs_list)        
    
df['prescription_list'] = df.apply(prescription_list, axis=1)#display
df.head()

One of the libraries of web scraping is selenium of python which works through a mechanized Google page. Then it can be controlled programmatically for the search. let us create an example to use this function:

driver = web driver.Chrome("/usr/local/bin/chromedriver")

you can now instruct the driver to look for a certain page. Here, it’s looking for metformin which is an anti-diabetic medicine on the NHS website.

driver.get("https://www.nhs.uk/medicines/metformin/")

selenium has now opened the metformin page on google chrome. Select the info you wish the data to be extracted from the page, use right-click button, and copy the path of export. It copies a link to an appropriate HTML code and allows you to visit the section you want on the website. It can be termed as scraping from a website.

for example, if you would like to retrieve all the information for metformin through the NHS website. That information can be retrieved by following the given path:

The following syntax is used to return:

The returned text can then be cleaned up with regex and can also make other modifications with replacing functions.


You can now use this information to extract data from several sections of the NHS website. drug details of patients for all prescribed medications in the database are returned by the nhs_ details function.

The important thing is to know that Selenium assumes that the page structures of the website are the same. So, if the website's HTML structure changes, the found element by the XPath function will fail. There are many methods to deal with this problem like methods for locating drug information, including multiple attempts and articles. So basically, this method is found from trial-error so it also helps in understanding the basic HTML structure of the NHS website.


The following are the components of the code:


  • lower() standardizes the input, and the web driver
  • f-strings allow putting any medication into the URL.
  • The found element by the XPath method returns interested data in JSON object from HTML.
  • The object is converted into text and cleaned up to eliminate escape characters.


Now, building a function that helps to return NHS website assistance for prescribed drugs in the NHANES extract.


It can be concluded that web scraping tools are very useful in converting a large amount of data available in the healthcare industry into a readable and usable format.

Are you looking for healthcare data scraping service? Contact 3i data scraping now!

Request for a quote!


collect
0
avatar
3i Data Scraping
guide
Zupyak is the world’s largest content marketing community, with over 400 000 members and 3 million articles. Explore and get your content discovered.
Read more