Norconex Web Crawler
Norconex Web Crawler is a free and open-source web crawling and web scraping Software written in Java and released under an Apache License. It can export data to many repositories such as Apache Solr, Elasticsearch,[1] Microsoft Azure Cognitive Search, Amazon CloudSearch and more.[2][3][4]
Other names | Norconex HTTP Collector |
---|---|
Developer(s) | Norconex Inc. |
Initial release | 2016 |
Stable release | 3.0.2
/ 2022-01-05 |
Repository | GitHub Repository |
Written in | Java |
Operating system | Cross-platform |
License | Apache License |
Website | Norconex Web Crawler |
The Crawler can be run on its own or embedded in your own Java application.[5][6]
Some key features are:
- Multi-threaded
- Extract text from a variety of file formats (HTML, PDF, Word, etc.)
- Extract metadata associated with documents
- Supports pages rendered with JavaScript
- Incremental crawls
- Supports external commands to parse or manipulate documents
- Send extracted data to a variety of repositories
Some well-known companies and products using Norconex Web Crawler are: Apache Solr Ecosystem, Department of National Defence, Universities Canada, U.S. Department of Education, Department of National Defence.[7] [8]
History
Norconex Web Crawler was released as free and open-source software in 2013.[9]
References
- "Enhance Your Search Capabilities with Norconex Web Crawler: Indexing Data to Elasticsearch". Medium. Apr 12, 2024.
- "Committers". opensource.norconex.com.
- Hoppa, Jocelyn (10 February 2020). "Importing Data from the Web with Norconex & Neo4j". Graph Database & Analytics.
- "Deploy a Norconex HTTP Collector Indexer Plugin | Cloud Search". Google for Developers.
- Valcheva, Silvia (11 February 2018). "10 Best Open Source Web Crawlers: Web Data Extraction Software". Blog For Data-Driven Business.
- "Norconex HTTP Collector". Softpedia. 9 July 2023. Retrieved 25 September 2023.
- "SolrEcosystem - Solr - Apache Software Foundation". cwiki.apache.org.
- "Norconex Crawler Users". opensource.norconex.com.
- "Norconex Gives Back to Open-Source – Norconex Inc". Retrieved 2023-09-25.
Mentions in Academic Research
- Kancherla, Vinay (1 December 2014). "A Smart Web Crawler for a Concept Based Semantic Search Engine (pg. 18)". Master's Projects. doi:10.31979/etd.ubfy-s3es. Retrieved 28 September 2023.
- Horváth, Balázs (28 August 2017). "Recommendation Techniques for smart cities (pg. 12)". Aalto University. Retrieved 28 September 2023.
- Wani, Mudasir Ahmad; Agarwal, Nancy; Jabin, Suraiya; Hussain, Syed Zesahn (2018). "Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users". arXiv:1802.09566 [cs.SI].
- Abbasi, Vahid. "Phonetic Analysis and Searching with Google Glass API". uub.primo.exlibrisgroup.com. Retrieved 28 September 2023.
See also
- Mitchell, Pete (8 April 2022). "25 Best Free Web Crawler Tools". TechCult. Retrieved 2023-09-05.
- "19 Best Web Crawling Tools for Efficient Data Extraction". Crawlbase. Retrieved 2024-05-10.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.