Python BeautifulSoup: Extract all the URLs from the webpage python.org that are nested within <li> tags from
BeautifulSoup: Exercise-8 with Solution
Write a Python program to extract all the URLs from the webpage python.org that are nested within <li> tags from.
Sample Solution:
Python Code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.python.org/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'lxml')
urls = []
for h in soup.find_all('li'):
a = h.find('a')
urls.append(a.attrs['href'])
print(urls)
Sample Output:
['/', '/psf-landing/', 'https://docs.python.org', 'https://pypi.python.org/', '/jobs/', '/community/', '#', 'javascript:;', 'javascript:;', 'javascript:;', '#', 'https://www.facebook.com/pythonlang?fref=ts', 'https://twitter.com/ThePSF', '/community/irc/', '/about/', '/about/apps/', '/about/quotes/', '/about/gettingstarted/', '/about/help/', 'http://brochure.getpython.info/', '/downloads/', '/downloads/', '/downloads/source/', '/downloads/windows/', '/downloads/mac-osx/', '/download/other/', 'https://docs.python.org/3/license.html', '/download/alternatives', '/doc/', '/doc/', '/doc/av', 'https://wiki.python.org/moin/BeginnersGuide', 'https://devguide.python.org/', 'https://docs.python.org/faq/', 'http://wiki.python.org/moin/Languages', 'http://python.org/dev/peps/', 'https://wiki.python.org/moin/PythonBooks', '/doc/essays/', '/community/', '/community/survey', '/community/diversity/', '/community/lists/', '/community/irc/', '/community/forums/', '/community/workshops/', '/community/sigs/', '/community/logos/', 'https://wiki.python.org/moin/', '/community/merchandise/', '/community/awards', 'https://www.python.org/psf/codeofconduct/', '/success-stories/', '/success-stories/category/arts/', '/success-stories/category/business/', '/success-stories/category/education/', '/success-stories/category/engineering/', '/success-stories/category/government/', '/success-stories/category/scientific/', '/success-stories/category/software-development/', '/blogs/', '/blogs/', 'http://planetpython.org/', 'http://pyfound.blogspot.com/', 'http://pycon.blogspot.com/', '/events/', '/events/python-events', '/events/python-user-group/', '/events/python-events/past/', '/events/python-user-group/past/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', '/shell/', '//docs.python.org/3/tutorial/controlflow.html#defining-functions', '//docs.python.org/3/tutorial/introduction.html#lists', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', '//docs.python.org/3/tutorial/', '//docs.python.org/3/tutorial/controlflow.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/NXMcoIchkxY/2018-in-review.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/t_DSEH1vASY/python-core-developer-mentorship.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/v7pD576k9iA/mariatta-wijaya-lets-use-github-issues.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/mnSfdQZDRUM/petr-viktorin-extension-modules-and.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/-JcoXQeMgsQ/scott-shawcroft-history-of-circuitpython.html', '/events/python-events/809/', '/events/python-user-group/848/', '/events/python-user-group/838/', '/events/python-events/827/', '/events/python-events/826/', 'http://www.djangoproject.com/', 'http://wiki.python.org/moin/TkInter', 'http://www.scipy.org', 'http://buildbot.net/', 'http://www.ansible.com', '/about/', '/about/apps/', '/about/quotes/', '/about/gettingstarted/', '/about/help/', 'http://brochure.getpython.info/', '/downloads/', '/downloads/', '/downloads/source/', '/downloads/windows/', '/downloads/mac-osx/', '/download/other/', 'https://docs.python.org/3/license.html', '/download/alternatives', '/doc/', '/doc/', '/doc/av', 'https://wiki.python.org/moin/BeginnersGuide', 'https://devguide.python.org/', 'https://docs.python.org/faq/', 'http://wiki.python.org/moin/Languages', 'http://python.org/dev/peps/', 'https://wiki.python.org/moin/PythonBooks', '/doc/essays/', '/community/', '/community/survey', '/community/diversity/', '/community/lists/', '/community/irc/', '/community/forums/', '/community/workshops/', '/community/sigs/', '/community/logos/', 'https://wiki.python.org/moin/', '/community/merchandise/', '/community/awards', 'https://www.python.org/psf/codeofconduct/', '/success-stories/', '/success-stories/category/arts/', '/success-stories/category/business/', '/success-stories/category/education/', '/success-stories/category/engineering/', '/success-stories/category/government/', '/success-stories/category/scientific/', '/success-stories/category/software-development/', '/blogs/', '/blogs/', 'http://planetpython.org/', 'http://pyfound.blogspot.com/', 'http://pycon.blogspot.com/', '/events/', '/events/python-events', '/events/python-user-group/', '/events/python-events/past/', '/events/python-user-group/past/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', '/dev/', 'https://devguide.python.org/', 'https://bugs.python.org/', 'https://mail.python.org/mailman/listinfo/python-dev', '/dev/core-mentorship/', '/news/security/', '/about/help/', '/community/diversity/', 'https://github.com/python/pythondotorg/issues', 'https://status.python.org/']
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Python program to find the text of the first <a> tag of a given html text.
Next: Write a Python program to find all the h2 tags and list the first four from the webpage python.org.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://www.w3resource.com/python-exercises/BeautifulSoup/python-beautifulsoup-exercise-8.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics